Member of The Internet Defense League Últimos cambios
Últimos Cambios
Blog personal: El hilo del laberinto Geocaching

Berkeley DB Backend Storage Engine for DURUS

Última Actualización: 03 de mayo de 2007 - Jueves

This code provides a new storage engine for DURUS, an excellent persistence system for the Python programming language.

The README included in the distribution:

$Id: README 332 2007-02-20 20:23:49Z jcea $

WHAT IS "DURUS-BERKELEYDBSTORAGE"?

"Durus-berkeleydbstorage" is a backend storage module for Durus, a persistence system for Python. As its name indicates,"Durus-berkeleydbstorage" uses Berkeley DB as the storage technology.

Some advantages compared to Durus standard FileStorage:

  • Startup time is negligible.

  • You don't need an in-memory index, so your repository size is only limited by storage space, not RAM.

  • If you change existing objects, your storage size doesn't increase.

  • If you delete objects, those objects are garbage collected in background, slowly, without performance degradation.

  • You can still do a full fast collection, if you need it. While this collection is in progress, Durus still serves objects.

  • Garbage collection doesn't increase storage size. Neither RAM usage.

  • Garbage collection deletes objects using nondurable transactions, very efficiently. If the collection is aborted abruptly (program or machine crashes), the collection will continue where it was working. If the GC finishes without problems, that state is durable.

    Any object store in the storage will commit a durable transaction, including all objects released in the background garbage collector, along the way.

  • Garbage collection time is proportional to garbage, not repository size.

There are some disadvantages, nevertheless:

  • IMPORTANT: This backend uses reference counting to decide when an object is garbage and can be collected. So, if you have cycles in your data structures, you **MUST*BREAK** them before releasing the objects.

    Failing to do that will leak diskspace. It is possible that in a future release we can collect cycles, but try to avoid that pattern.

    Leaking objects will grow the diskspace, but **NO** corruption or malfunction will happen. No other secondary effect.

  • Although this code could work on Windows, I haven't checked it. Absolutely no garantee!.

  • Don't use this storage backend over NFS, at least you know what is going on.

  • RAM usage is proportional to the number of garbage objects still not collected. This size should be small, actually.

  • Since we are using Berkeley DB as the backend:

    • You should be experienced with Berkeley DB deployments.

    • Beware when updating Python or Berkeley DB. In particular, Berkeley DB is known by breaking (but they DOCUMENT!) binary compatibility between versions. In this case, they ALWAYS document the procedure to do a controlled upgrade, so don't worry. But take note of the risk.

    • To do a trustable backup, you should follow instructions in Berkeley DB documentation:

      http://www.sleepycat.com/docs/ref/transapp/reclimit.html
      http://www.sleepycat.com/docs/ref/transapp/archival.html
      http://www.sleepycat.com/docs/utility/db_hotbackup.html

    • In Python you can use the standard "bsddb" or the up-to-date "bsddb3" bindings (which will be included in new python versions). This product will try to use always the more recent Berkeley DB bindings. Be careful about Berkeley DB version changes when you update the bindings.

    • Since Berkeley DB files are binary structures, a corrupt database can be unrecoverable. Be diligent and careful with your backups.

  • This Storage Backend doesn't support the HistoryConnection class added in Durus 3.4.

You can use this product both as a normal (local) filestorage, or a server (remote) storage system, just like the usual Durus FileStorage.

HOW IS "DURUS-BERKELEYDBSTORAGE" USED?

IMPORTANT: The PATH you specify in the storage MUST BE an already existant directory. The database files will be created inside.

You can use this engine in two ways:

  1. Local access:

    The program is the only user of the storage, since local access is exclusive.

    In your code simply put:

    from berkeleydb_storage import BerkeleyDBStorage
    connection = Connection(BerkeleyDBStorage("PATH"))
    

    where "PATH" is the path to already existant directory. The database will reside inside that directory.

    After doing that, you use the connection like any other Durus connection. See "test3.py" as a reference implementation:

    from durus.btree import BTree
    from durus.connection import Connection
    from berkeleydb_storage import BerkeleyDBStorage
    
    connection = Connection(BerkeleyDBStorage("db"))
    
    root=connection.get_root()
    root[0]=BTree()
    connection.commit()
    
    for i in xrange(65536) :
      root[0][i]=0
    connection.commit()
    
    for i in xrange(0,65536,2) :
      del root[0][i]
    connection.commit()
    
    print len(root[0])
    

  2. Remote access:

    Clients are "normal" Durus clients.

    The Durus server must use this engine. The file "server.py" is a reference implementation. Example:

    import sys
    import socket
    from optparse import OptionParser
    from durus.storage_server import DEFAULT_PORT, DEFAULT_HOST, StorageServer
    from durus.logger import log, logger, direct_output
    
    def start() :
      logfile = sys.stderr
      direct_output(logfile)
    
      from berkeleydb_storage import BerkeleyDBStorage
      storage = BerkeleyDBStorage("db")
      host=DEFAULT_HOST
      port=DEFAULT_PORT
      log(20, 'host=%s port=%s', host, port)
      StorageServer(storage, host=host, port=port).serve()
    
    if __name__=="__main__" :
      start()
    

Additionally, you can specify options when you instanciate a Berkeley DB Storage. The options are passed as optional parameters in the instance constructor:

  • directoryname

    The directory path where the storage resides. This directory MUST exist before trying to create a new storage.

    This parameter is MANDATORY.

  • type (default: "hash")

    When creating a new storage database, we can choose between "hash" or "btree" schema.

    Performance differences are noticeable only when the database size is in the multigigabyte/terabyte range, and depend also of access patterns in the application (for example, reference locality).

    Remember also that client caching will bias considerably the access patterns that hit the disk, so your best bet would be to test both configurations in your environment under your normal usage.

    When the database already exists, this parameter has no effect.

  • cachesize (default: 16MB)

    Berkeley DB can use a cache MMAPed file to improve performance. You don't usually touch this value.

    You can see the hit ratio of the cache with the "db_stat" Berkeley DB tool.

    This value is only used when creating the storage first time, or after a "db_recover". In other cases, just ignored.

  • log_buffer_size (default: 256KB)

    Berkeley DB uses a log buffer to store in-progress transaction data. If your buffer size is big, you waste RAM. If the size is too low, Berkeley DB will need to flush it frequently, affecting performance.

    So you can tune this parameter according to your transaction (write) size. You can see if the buffer is too small using the "db_stat" Berkeley DB tool.

    This value is only used when creating the storage first time, or after a "db_recover". In other cases, just ignored.

  • do_recover (default: related to "read_only". See description)

    Using this parameter, you can request a "db_recover" operation explicitly. Remember, nevertheless, that this Storage backend could decide to do an implicit "db_recover" if it thinks that it is neccesary.

    "do_recover" constructor parameter default value changes according to "read_only" parameter. If the storage is opened read/write, "do_recover" default value will be True. If the storage is opened "read only", the "do_recover" default value will be False.

    You can't do a database recovery if you opened it as "read only".

  • durable (default: True)

    This Storage backend complies with ACID semantics (Atomic, Consistent, Isolated and Durable). The Durable part incurs in a performance penalty because it requires a syncronous disk write.

    In some environments, it may be desirable to trade off durability vs. write performance. You risk, nevertheless, losing your last committed transactions. In some environments that might be acceptable.

    You lose committed transactions if your machine crashes or it is rebooted without warning, or if the storage application crashes in some critical functions. You can also lose latest committed transactions if you do a database "recover", or it is done internally because the storage feels it is necessary.

    In any case, you are garanteed to keep always ACI semantic. That is, any transaction will be applied entirely or no changes at all, no data corruption, etc. Also, no transaction X will be visible if transaction X-1 was "lost" because durability was disabled, so your data will be chronologically correct.

  • async_log_write (default: True)

    If the storage instance has durable=True, this flag is ignored.

    If durable=False, this flag allow:

    • If False: Transaction log will be keep in memory. The transaction log will be flushed when doing a checkpoint, data cache full, transaction log full, etc.

    • If True: When a transaction commits, the storage backend will write the transaction log to disk lazily, in an asynchronous way. We can't be sure that the log reached the disk, if something goes wrong, but durability is improved paying a fairly small penalty: an asynchronous write. This flag can be very useful in "non durable" environments, when doing a database recovering, since you will "lose" fewer transactions.

    Remember that this flag is only considered if you explicitelly demanded a "non durable" storage. So you get what you asked for.

  • read_only (default: False)

    Open the Storage database in "read only" mode. The Storage must be initialized previously, and should have some data on it.

    In this mode, write attempts raise an exception.

    In this mode, several applications can share the storage simultaneously.

  • checkpoint_method (default: related to "read_only". See description)

    This parameter defines the policy used to do database checkpointing.

    The default value depends of "read_only" flags:

    • If False: The default value is "berkeleydb_storage.checkpoint_thread". This policy does DB checkpointing in background, in a separate thread. If True: The default value is "berkeleydb_storage.checkpoint_noop". This policy does nothing. It is a "no operation".

  • garbage_collection_method (default: related to "read_only". See description)

    This parameter defines the policy used to do garbage collection.

    The default value depends of "read_only" flags:

    • If False: The default value is "berkeleydb_storage.garbage_collection_inline_prefetch". This policy does the garbage collection inline, but being sure the data is prefetched first. So, the deletion operation will be fast.

    • If True: The default value is "berkeleydb_storage.garbage_collection_noop". This policy does nothing. It is a "no operation". Actually, this policy doesn't do any garbage collection, but it updates the garbage info if necessary, for other policy objects activable in the future.

  • validate_garbage (default: False)

    Opening the storage with this flag set can be a fairly slow operation, with time proportional to the number of objects in the storage. This option should be done only when upgrading this storage and the previous version had any issue related to garbage collection.

    Remember also that this option can takes a lot of memory, proportional to the number of objects in the storage.

    If set, this parameter instructs the storage to do, at init time, a reachability analysis to determine garbage consistency and possible leaks. We print the number of previously unknown unreachable objects.

    Reference cycles are not detected, nor collected.

    This parameter use reduces the "to be collected" objects to the a minimum garbage root set. From there, the garbage collector should be able to detect and collect all garbage not involved in reference cycles.

    If the storage was opened in "read only" mode, no change will be actually done to the storage.

    NOTE: It is usual to find new garbage when initiating a storage several times with this parameter set. This is normal, and do not indicate any kind of problem. The usual garbage collection process would discover those unreachable objects automatically. In general, only use this parameter once, when upgrading this storage engine and previous version had any kind of problem with garbage collection.

CHECKPOINT POLICY

Releases since 20061016 allow to specify an object in the storage constructor to set a database checkpointing policy.

Programmers can create new arbitrary policy objects/factories using "berkeleydb_storage.checkpoint_interface" interface-like class as a blueprint. Any future interface change will be documented in the UPGRADING document.

Currently implemented policies are:

  • "berkeleydb_storage.checkpoint_thread": Do DB checkpointing in a separate thread, in background.

    This is the default policy.

    Since this policy does DB checkpointing while the storage is doing additional transactions, we could have temporary more database logging files in the environment that necessary. This is a temporal issue, automatically resolved.

  • "berkeleydb_storage.checkpoint_thread_final_checkpoint": Same than previous one, but doing a "forced" checkpoint when closing the storage.

    So, this policy slowdowns storage shutdowns but the storage initializacion will be faster if we do a database recover.

  • "berkeleydb_storage.checkpoint_inline": Do DB inlining checkpointing, like previous backend releases. You shouldn't use it unless you actually need it.

  • "berkeleydb_storage.checkpoint_noop": Do not do DB checkpointing. Useful if checkpointing is managed externally.

GARBAGE COLLECTION POLICY

Releases since 20070220 allow to specify an object in the storage constructor to set a garbage collection policy.

Currently, the interface used as a blueprint is subject to change. DO NOT IMPLEMENT new policy objects. They can break without notice when upgrading this storage backend. When the API be stable enough to freely implement new policies, you will be notified.

Currently implemented policies are:

  • "berkeleydb_storage.garbage_collection_inline": Collect a garbage object everytime we access the storage. This was the policy used when this backend storage didn't support configurable garbage collection policies. You shouldn't use it unless you actually need it.

  • "berkeleydb_storage.garbage_collection_inline_prefetch": Like the previous policy, this one collects a garbage object per storage access. But here, a separate thread "prefetches" the garbage object and the reference counters that it updates, to be sure the deletion is going to be fast, without stopping the storage waiting for the disk.

    This is the default policy.

  • "berkeleydb_storage.garbage_collection_noop": Do not do garbage collection. If the storage knows about new garbage, the "to be deleted" metadata will be updated, nevertheless.

DATASET MIGRATION

If you already have data in a traditional Durus FileStorage, you can migrate it to this Storage Backend using the standard migration procedure: iterate over the objects in the source storage, send them to the destination storage and do a "commit" when you are done.

This procedure is useable if your dataset fits in RAM, but even if your machine is full of RAM or you have a huge SWAP space, you will hit addressable space limitations, if your machine is 32 bits, in the order of 2^30 bytes.

You can't do "bit sized" commits because this backend does garbage collection in background, and it could free stored but still not referenced objects.

To solve these issues, releases since 20060509 have a "migrate()" method. It has an iterable parameter, giving the original objects.

Example:

from durus.file_storage import FileStorage
from berkeleydb_storage import BerkeleyDBStorage

source=FileStorage("source",readonly=True)
destination=BerkeleyDBStorage("destination")

destination.migrate(source.gen_oid_record())


DOWNLOADS:

"durus-berkeleydbstorage" is released under GNU Public License, version 2.

  • durus-berkeleydbstorage-20070503.tar.gz (34Kbytes) (Digital Signature)
    MD5: 8a34eb0a9c9ab3950ceb3d5672873567

    This release is known to work with Berkeley DB releases 4.5.*, 4.4.* and 4.3.*.

    • Upgrade Instructions:

      • This release REQUIRES Durus 3.7 or higher.

    • Changes:

      • 20070426 - r343 - jcea@argo.es

        Since this point, this Storage Engine requires Durus 3.7 and up.

  • durus-berkeleydbstorage-20070220.tar.gz (34Kbytes) (Digital Signature)
    MD5: bd87baf3abe1ea7bc368ba25917c531d

    This release is known to work with Berkeley DB releases 4.5.*, 4.4.* and 4.3.*.

    • Upgrade Instructions:

      • This release does garbage collection with the help of a "prefetch" thread. The point is to avoid stopping the storage waiting for the disk, where possible.

        The performance improvement is huge when deleting "cold" objects (objects not in the DB cache), specially if there are many inter-object references.

        If you like or need (for example, your python deployment doesn't support multithreading) the old behaviour (pure inline garbage collection), you can activate it passing "berkeleydb_storage.garbage_collection_inline" to the storage constructor.

    • Changes:

      • 20070220 - r333 - jcea@argo.es

        "UPGRADING" and "README" documentation updates.

      • 20070126 - r326 - jcea@argo.es

        If a storage is opened in read only mode, just ignore the "pack" attempts.

      • 20070126 - r323 - jcea@argo.es

        Some regression bugfixes for "read only" mode combined con current garbage collection policy objects.

      • 20070126 - r322 - jcea@argo.es

        The lock file shouldn't have +x mode set.

      • 20070126 - r321 - jcea@argo.es

        When doing a full and explicit "pack", and the checkpoint is done in background, we could get a lot of DB "log" files, since we could generate transactions too fast for the cleanup thread.

      • 20070126 - r316 - jcea@argo.es

        If using the prefetch garbage collection policy, a full and explicit "pack" burned CPU for no purpose.

        Solved.

      • 20070122 - r310 - jcea@argo.es

        Patch to avoid (temporal) GC starvation if we get new garbage before finishing the GC prefetch.

        Also, this patch avoids multiple prefetch of the same object when new garbage arrives.

        Finally, this simple patch should solve an "assert()" failure in the prefetch thread. Hope so :-).

      • 20070108 - r308 - jcea@argo.es

        Update the "TODO" document.

      • 20070104 - r294 - jcea@argo.es

        First implementation of a garbage collection policy object doing garbage collection inline, but with background prefetch.

      • 20061219 - r291 - jcea@argo.es

        Solve a potential corruption if upgrading Storage backend from version 0 to version 2 directly.

      • 20061219 - r289 - jcea@argo.es

        Avoid to raise spurious exceptions with policy objects, if we can't instantiate correctly the storage. For example, because the storage is already locked.

      • 20061219 - r287 - jcea@argo.es

        Compatibility check-in in the checkpoint policy objects for the old Berkeley DB 4.3.

      • 20061219 - r286 - jcea@argo.es

        Better traceback if the checkpoint thread dies.

      • 20061205 - r283 - jcea@argo.es

        First implementation of the locking protocol in the garbage collection policy interface.

  • durus-berkeleydbstorage-20061121.tar.gz (31Kbytes) (Digital Signature)
    MD5: 5f3425882bf28e14124fe7b6ea31d13e

    • Upgrade Instructions:

      • This release REQUIRES Durus 3.6 or higher.

      • Storage databases created with this release are not compatible with previous releases.

        First time you use this release to open a storage database created by previous releases, it will be transparently "upgraded" to current format, so:

        • The storage will become incompatible with previous releases.

        • In order to be able to upgrade the storage, you can't open it in "read only" mode. Once upgraded, you can use "read only" mode freely.

        • The upgrade process doesn't take any RAM.

        • If the upgrade process is aborted (program quits, crashes, machine reboot, etc), the database will be stable and clean. That is, the upgrade process is transactional and SAFE.

        • The upgrade process will be "instantaneous".

      • A new checkpoint policy object: "berkeleydb_storage.checkpoint_thread_final_checkpoint".

        This checkpoint policy does a forced checkpoint when closing the storage. This would slowdown storage shutdown, but speed up storage initializacion.

    • Changes:

      • 20061117 - r275 - jcea@argo.es

        Shy first implementation of "sync" feedback feature of Durus 3.6.

        Since this point, this Storage Engine requires Durus 3.6 and up.

      • 20061117 - r274 - jcea@argo.es

        Full implementations of "garbage_collection_noop" and "garbage_collection_inline" policy objects.

      • 20061117 - r264 - jcea@argo.es

        A late compatibility fix for Durus 3.6.

        This fix requires a (instantaneous) storage upgrade.

      • 20061117 - r262 - jcea@argo.es

        Initial support for a garbage collection policy objects.

      • 20061116 - r257 - jcea@argo.es

        A new checkpoint policy object: "berkeleydb_storage.checkpoint_thread_final_checkpoint".

      • 20061116 - r251 - jcea@argo.es

        More gentle database closing if the program closes the storage handle and then dies without giving an oportunity to the garbage collector.

      • 20061116 - r250 - jcea@argo.es

        The storage did a database recover even if asked to not do it.

      • 20061116 - r246 - jcea@argo.es

        Do some minor changes for compatibility with just released Durus 3.6.

      • 20061116 - r245 - jcea@argo.es

        "KNOW_HOW-DURUS" updated to Durus 3.6.

  • durus-berkeleydbstorage-20061023.tar.gz (30Kbytes) (Digital Signature)
    MD5: ebf73a26b377b1cdf9f9398d5751e8b3

    • Upgrade Instructions:

      • This release does Berkeley DB database checkpointing in a separate thread by default. This backend no more become irresponsible for a couple of seconds while it is busy doing a checkpoint to recycle database logging space.

        If you like or need (for example, you python deployment doesn't support multithreading) the old behaviour (inline checkpointing), you can activate it passing "berkeleydb_storage.checkpoint_inline" to the storage constructor.

      • "do_recover" constructor parameter default value changes according to "read_only" parameter. If the storage is opened read/write, "do_recover" default value will be True. If the storage is opened "read only", the "do_recover" default value will be False.

        Of course, you can overwrite these defaults if you wish and you know what are you doing.

    • Changes:

      • 20061023 - r238 - jcea@argo.es

        Document "KNOW_HOW-DURUS" updated to Durus 3.5.

      • 20061018 - r235 - jcea@argo.es

        UPGRADING and README documentation updates.

      • 20061016 - r233 - jcea@argo.es

        If the checkpointing thread dies, notify the calling thread.

      • 20061010 - r230 - jcea@argo.es

        Be able to do the database checkpointing in the background.

      • 20061005 - r229 - jcea@argo.es

        "do_recover" constructor parameter default value changes according to "read_only" parameter. If the storage is opened read/write, "do_recover" default value will be True. If the storage is opened "read only", the "do_recover" default value will be False.

  • durus-berkeleydbstorage-20061005.tar.gz (29Kbytes) (Digital Signature)
    MD5: e1e5d4f5327a717e3e3bbe6b5cd7b20e

    NOTE: Although documentation is not yet updated, this release works just fine under Durus 3.5 release.

    • Upgrade Instructions:

      • "do_recover" constructor parameter default value changed from "False" to "True". Better play safe than sorry. Of course you can do "do_recover=False" if you know what are you doing.

        If you are opening the storage "read_only", you must set this flag to False.

      • This release adds an optional "async_log_write" parameter to storage instance constructor. This option is only relevant when you use "non durable" storages. You can now choose between "in memory" and "async write" transaction logging.

        If you are using "non durable" storages, and you want to keep the previous behaviour, you MUST use "async_log_write=False". The new default will use asynchronous writes for the transactional logging.

    • Changes:

      • 20061005 - r217 - jcea@argo.es

        "do_recover" constructor parameter default value changed from "False" to "True".

      • 20061003 - r215 - jcea@argo.es

        "KNOW_HOW-DURUS" updated to document a sort of persistent weak reference pattern.

      • 20061003 - r214 - jcea@argo.es

        "KNOW_HOW-DURUS" updated to document BTree abilities.

      • 20060927 - r205 - jcea@argo.es

        New "in use" flag inside the database storage. If a storage is opened and that flag is set, a database recovery is done.

        That flag is cleared when the storage instance destructor is called.

        This flag is not used if the database is opened in read only mode.

      • 20060927 - r202 - jcea@argo.es

        When the storage is opened read/write and non durable, the storage instance destructor will try to (synchronously) flush the transaction log.

        This last flush can't be garanteed, nevertheless.

      • 20060927 - r200 - jcea@argo.es

        Add a new optional parameter to the constructor: "async_log_write".

  • durus-berkeleydbstorage-20060629.tar.gz (27Kbytes) (Digital Signature)
    MD5: ef40b75e661cb622969e69dd2d693cb3

    • Upgrade Instructions:

      Storage databases created with this release are not compatible with previous releases.

      First time you use this release to open a storage database created by previous releases, it will be transparently "upgraded" to current format, so:

      • The storage will become incompatible with previous releases.

      • In order to be able to upgrade the storage, you can't open it in "read only" mode. Once upgraded, you can use "read only" mode freely.

      • The upgrade process can be a bit slow, since the backend needs to analyze the entire database. So, if your database size is 1 Gigabyte and your hard disk can push 50MB/s, the upgrade process will take about 20 seconds to finish.

      • While upgrading, the database will NOT serve requests.

      • The upgrade process doesn't take any RAM.

      • If the upgrade process is aborted (program quits, crashes, machine reboot, etc), the database will be stable and clean. That is, the upgrade process is transactional and SAFE.

      Also, in previous backend releases there was a bug in the garbage collection code that would skip over "to be deleted" objects, leaving some garbage behind.

      So with this release you have three options:

      • Ignore the issue: This release will not leak new garbage, but garbage already leaked will remain there. You lose some diskspace. No other side effect.

      • Do a dump+load migration, using the storage backend "migrate" method. Your diskspace requirement will double (temporaly), since you need space for the source and the destination databases. The migration will copy garbage, but this new backend release can cope with it and it will be freed. You don't need extra RAM to do this process, even if your database is in the petabyte range :-).

      • Initialize the storage with the "validate_garbage" parameter sets to "True": This option will force a full sweep over the database, to examine cross-object links and locate garbage. This scan is done in-place, so you don't need extra diskspace. It is transactional, so it is SAFE if something goes wrong.

        It takes RAM proportional to object count in your storage, so beware if you have a huge database.

        Of course you only need to pass this parameter once, to catch garbage leaked by previous releases.

    • Changes:

      • 20060629 - r191 - jcea@argo.es

        Document the upgrade process.

      • 20060620 - r188 - jcea@argo.es

        "get_size()" is very costly. With current implementation, the storage must access all the database. I can implement a manual counter or migrate to btree and use DB_RECNUM:

        http://www.sleepycat.com/docs/api_c/db_stat.html
        http://www.sleepycat.com/docs/api_c/db_set_flags.html#DB_RECNUM

        Finally we implement a manually managed counter. Now "get_size()" is instantaneous, but it require an storage upgrade. So, you can't use previous backend versions.

      • 20060620 - r184 - jcea@argo.es

        When creating a new storage database, be able to choose between "btree" and "hash" schema.

      • 20060620 - r183 - jcea@argo.es

        Document "KNOW_HOW-DURUS" updated to Durus 3.4.

      • 20060524 - r157 - jcea@argo.es

        We add a new "validate_garbage" optional parameter to the storage constructor. If that parameter is True, the storage will do a garbage check. Read the documentation in the README file for details.

      • 20060523 - r156 - jcea@argo.es

        If the storage was stopped before garbage collection was completed, the storage could leak some unreachable objects.

        Solving this issue, we make sure also that the garbage collection makes progress even if the storage is stopped repeatly before garbage collection completion. That is, we store partial state to avoid to try to clean already collected objects.

  • durus-berkeleydbstorage-20060522.tar.gz (24Kbytes) (Digital Signature)
    MD5: 546d5e8fb57e612f659ce493f3c4ad41

    • Upgrade Instructions: Nothing special.

    • Changes:

      • 20060509 - r140 - jcea@argo.es

        Since some foreign code seems to depent on it, I implement a "pack()" method in the storage.

      • 20060509 - r138 - jcea@argo.es

        Implements a "migrate()" method to migrate huge storages without hitting memory in a hard way. With this method you only need memory for an object, not the entire repository in RAM. So, this is imprescindible if your address space (in 32 bits) is small compared with your storage size.

      • 20060509 - r137 - jcea@argo.es

        File "KNOW_HOW-DURUS" contains a lot of info about Durus internals and advanced usage.

  • durus-berkeleydbstorage-20060418.tar.gz (15Kbytes)

    • Upgrade Instructions: Nothing special.

    • Changes:

      • 20060418 - r90 - jcea@argo.es

        Solved an object leak when you commit a transaction having unreferenced objects. That is, you commit objects that are ALREADY garbage.

        You usually don't do that, but it can be a very helpful pattern to break object cycles when deleting objects.

        You could hit this bug if you keep around references to persistent objects between transactions, a big NO NO. You could "resurrect" a deleted object, and that object, and the object graph from there, would became inmortal.

      • 20060418 - r85 - jcea@argo.es

        "gen_oid_record()" implementation. This implementation has the same limitations that the standard Durus one. In particular, you shouldn't commit transactions while iterating over the storage, and this method can return already deleted objects, not yet collected. These "issues" are already present in the stock Durus implementation.

        The usual use of this method is migrating class names of already stored objects.

      • 20060411 - r79 - jcea@argo.es

        "Read Only" mode implemented.

      • 20060411 - r74 - jcea@argo.es

        "D" in ACID is optional, if allowable, to improve performance.

  • durus-berkeleydbstorage-20060328.tar.gz (14Kbytes)

    • Upgrade Instructions: Nothing special. The Durus server was renamed from "z.py" to "server.py".

    • Changes:

      • 20060328 - r56 - jcea@argo.es

        Hint from Thomas Guettler.

        There is a race condition between a client deleting an object and other client trying to fetch it. Previous versions of the code would kill the server with an assertion.

        Current code will signal the issue to the caller. If we are using the server storage, the client will receive a "ReadConflictError" exception, just like stock Durus, if you do a garbage collection between the object deletion in a client and the object retrieval in the other one.

      • 20060328 - r55 - jcea@argo.es

        When instancing a BerkeleyDBStorage, we run Berkeley DB "DB_RECOVER" if:

        1. The constructor caller requests it explicitelly.
        2. The environment opening requires a recovery.
        3. If previous use of the storage left unfinished transactions behind. For example, machine reboot.

      • 20060323 - r48 - jcea@argo.es

        Improved checkpointing, especially when forcing a full collection. But remember that you don't need to do a full collection unless you REALLY need it. The backend automatically does garbage collection in background. With this change we have a 10% speed penalty, more or less, when doing a full collection.

        Very improved database logging files removal. Now you can expect a maximum of three logging files (30 Mbytes) in the storage.

      • 20060323 - r47 - jcea@argo.es

        When we create a new Berkeley DB storage, first time, we can configure log buffer size and cache size using constructor parameters. If the storage is already created, those parameters are ignored.

      • 20060323 - r39 - jcea@argo.es

        When doing a commit, instead of loading the reference counting of all referenced objects, do the incremental adjust without database access, all in RAM, and then access and modify ONLY the objects with changed reference counters.

        The code is simpler, faster and eats less RAM, when you update heavily linked objects.

      • 20060317 - r31 - jcea@argo.es

        Cope with different combinations of bssdb/bssdb3 instalations and missing instalations.

  • durus-berkeleydbstorage-20060316.tar.gz (11Kbytes)

    First public release. Production ready.


Procesos a realizar cuando se publica una actualización

Esto es solo útil para mí :-).

  • Poner el número de versión correcto en "CHANGES", "UPGRADING" y "berkeleydb_storage.py".

  • Mover el "release" a un tag SVN propio.

  • Dar acceso de lectura público a dicho tag en el SVN, para proyectos como cpif. Es conveniente añadir el "release", dejando los anteriores activos también. De esta forma evitamos retirar el acceso a directorios que ya son públicos.

  • Exportar ese "release" concreto del SVN.

  • Crear el "tar.gz" y la firma digital, y subirlo a esta web.

  • Actualizar la documentación en esta web.

  • Actualizamos la sindicación de contenidos (RSS) de esta web.

  • Publicar la actualización en freshmeat.

  • Publicar la actualización en Python's CheeseShop Pypi.

  • Enviar notificaciones sobre la librería y la documentación "KNOW_HOW-DURUS" a las siguientes listas de correo: Durus Users, pybsddb users and python-es.


Historia

  • 22/mar/06: Primera versión de esta página.



Python Zope ©2006-2007 jcea@jcea.es

Más información sobre los OpenBadges

Donación BitCoin: 19niBN42ac2pqDQFx6GJZxry2JQSFvwAfS