Introduction
Although the ZODB (Zope Object Database) is very robust, it can happen that data structures get corrupted. This can have various reasons and in most cases the corruption will be caused by a faulty filesystem. You will be noticed of such a corrupted database by either a POSKeyError or errors like "data record does not point to transaction header" or a CorruptedError.
What is a POSKeyError?
To understand the meaning of this error you first have to know that each object in your database as a unique id (called OID). Example: You have uploaded a new image (ZMI->Add new Image) to your ZODB. Beside being reachable by an unique path this image object will get an OID upon storage. This OID is a binary number that looks like this: 0x40A90L (binary). A storage like FileStorage essentially only holds a mapping between an OID and the serialized object. This OID also ensures that multiple references to the same object will always map to the same object in database.
Now lets lets go back to the POSKeyError: Translated a POSKeyError is a PoSition Key Error (this is a translation by me - originally POS comes from BoboPOS the first version of the ZODB contained in Principia) and means that for a given OID no serialized data can be found.
Example: In practice there can be a folderish object (inheriting OFS.ObjectManager) storing child data into the _objects attribute. This list contains a list of child objects and on storage level is translated into a list of OIDs. Upon load (objectValues() or such) one of these references will be recognized corrupt and will make loading stop with a POSKeyError.
What does a CorruptedError mean?
This error can have multiple causes like bad transaction length (caused by full disks or such) or bad transaction time (caused by time or date jumps and filesystem problems) or others. These errors always mean that you have to perform step 1 from next section.
How to recover?
As of my current knowledge there is no way of finding the correct serialized data (if still existing) for an OID if a POSKeyError occurs for it. To make it clear: If you have a non-reachable (lost) object then you won't bring it back even if there is a chance that binary data is still there. So you the best thing is to cut out broken references from database (and to replace them with good ones from backup).
- Repair your database with fsrecover.py and also pack database to remove old data. This tool will remove all corrupted data (Will remove any transaction with corrupted data). Always back up your data before performing this step.
fsrecover.py -P 0 Data.fs Data.fs.repaired &> logrecover.txt - Open logrecover.txt and check how much data are lost. This tool will *not* check or repair POSKeyErrors! It will only check the integrity of transactions and should be the remedy for CorruptedError errors. Be aware of the fact that using fsrecover can lead to (more) POSKeyErrors because if a faulty transaction is removed that contains a referenced object this will produce a dangling reference.
- Now open you database and let's search for (the cause) of the POSKeyErrors. I've written a very dumb script that tries to load each object and attribute it finds. This will show you each object containing attributes with dangling references. Put this script (named recover.py) into your PYTHONPATH and fire up
zopectl debugattached to the faulty database and then callimport recover; recover.check(app).It will tell you each object that contains a dangling reference and also tell you the target that is not reachable. This is easier than to use fsrefs and to search for the container and to load the container via ._p_jar[containeroid].#!/bin/python
import os, sys
from ZODB.POSExceptions import POSKeyError
fderr = open('/tmp/logcheck_errors.txt', 'w')
fdout = open('/tmp/logcheck_clean.txt', 'w')
def _out(msg):
fdout.write(msg + '\n')
def _err(msg):
fderr.write(msg + '\n')
def _checkAttributes(obj):
# very dumb checks for list and dict (like) attributes
# is very slow but ensures that all attributes are checked
for k,v in obj.__dict__.items():
if hasattr(v, 'values') and hasattr(v, 'keys'):
try:
data = [val for val in v.values()]
data = [val for val in v.keys()]
except POSKeyError, ex:
_err('Error %s on DICT-LIKE attribute %s (%s)' \
% (str(ex), k, '/'.join(obj.getPhysicalPath())))
if hasattr(v, 'append'):
try:
data = [val for val in v]
except POSKeyError, ex:
_err('Error %s on LIST-LIKE attribute %s (%s)' \
% (str(ex), k, '/'.join(obj.getPhysicalPath())))
def _sub(master):
for oid in master.objectIds():
try:
obj = getattr(master, oid)
_out('%s->%s' % ('/'.join(master.getPhysicalPath()), obj.getId()))
if hasattr(obj, 'objectIds') and obj.getId() != 'Control_Panel':
_sub(obj)
# check catalog explicitly
if obj.meta_type in ['ZCatalog', 'Catalog', 'Plone Catalog Tool'] \
or hasattr(obj, '_catalog'):
for idxid in obj._catalog.indexes.keys():
try:
index = obj._catalog.indexes.get(idxid)
_out('%s->INDEX: %s' \
% ('/'.join(obj.getPhysicalPath()), idxid))
_checkAttributes(index)
except POSKeyError, ex:
_err('Error %s on INDEX %s (%s)' \
% (str(ex), idxid, '/'.join(obj.getPhysicalPath())))
# support for lexicon
for lexid in obj.objectIds():
_checkAttributes(getattr(obj, lexid))
except POSKeyError, ex:
_err('Error %s on %s (%s)' % (str(ex), oid, '/'.join(master.getPhysicalPath())))
def check(app):
sys.setrecursionlimit(20000)
_sub(app)
fderr.close()
fdout.close() - You should now have a list of all corrupted data. Now you can cut these objects out of the database.
a) You have an OFS.Folder with corrupted child
>>> app.folder # folder that contains a corrupted child named subfolder
>>> app.folder._objects = tuple([o for o in app._folder._objects \
if o['id'] != 'subfolder'])
>>> import transaction
>>> transaction.commit()
b) You have a dictionary (or BTree) that contains a corrupted value
(the None assignment also holds for attributes)
>>> obj.btreefolderattribute[corruptedkey] = None
>>> import transaction
>>> transaction.commit() - Now you should have a clean database. Now you can try to replace cut'd data with ones from a backup database. The simplest way (if you've cut folderish objects) is to export them as ZEXP files and to simply import them. If you deleted BTrees or attributes then this is more complicated but you can try to extract pickled data by hand using the handy script by Jim: http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil. The main purpose of this script is to get pickles for dangling references but this goes far beyond this blog entry.
- Useful ZODB-Tools
analyze.py - Show information about objects in database (size, number)
fstest.py - Checks database against corrupt transaction data
fsrecover.py - Repair databases containing transactional errors
fsrefs.py - Try to load each object from database to discover dangling references
checkbtrees.py - Loads all BTrees from database and check their integrity (_check()) - Interesting Links (Unsorted)
http://www.mail-archive.com/zodb-dev@zope.org/msg02535.html
http://www.zopelabs.com/cookbook/1114086617
http://www.python.org/workshops/2000-01/proceedings/papers/fulton/zodb3.html
http://blogs.nuxeo.com/sections/blogs/lennart_regebro/2006_06_28_finding-last-changed-object-in-zodb


3 Kommentare:
Pretty cool posting
neat indeed!
After you do app.folder._objects = tuple([o for o in app._folder._objects \
if o['id'] != 'subfolder']), be sure to also do app.folder._delOb('subfolder'). Otherwise, you can still get unpickling errors. See https://weblion.psu.edu/trac/weblion/ticket/1442#comment:17 for the whole gory story.
Post a Comment