Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show all objects that reference a oid #8

Merged
merged 2 commits into from
Jul 6, 2022
Merged

Conversation

pbauer
Copy link
Member

@pbauer pbauer commented May 16, 2020

First I build a dict of all references for reverse-lookup.
Then I recursively follow the trail of references to referencing items up to the root.
To prevent irrelevant entries I abort after level 10 and at some root-objects because these are usually references a lot and would clutter the result with irrelevant information.

The output should give a pretty good idea where in the object-tree a item is actually located, how to access and fix it.

Currently the output is like this:

$ ./bin/zodbverify -f var/filestorage/Data.fs -o 0x3384dc

INFO:zodbverify:Building a reference-tree of ZODB...
INFO:zodbverify:Objects: 5000
[...]
INFO:zodbverify:Objects: 120000
INFO:zodbverify:Created a reference-dict for 121106 objects.

INFO:zodbverify:
This oid is referenced by:

INFO:zodbverify:0x11c284 BTrees.OOBTree.OOBTree at level 1
INFO:zodbverify:0x11c278 z3c.relationfield.index.RelationCatalog at level 2
INFO:zodbverify:0x1e five.localsitemanager.registry.PersistentComponents at level 3
INFO:zodbverify:0x11 Products.CMFPlone.Portal.PloneSite at level 4
INFO:zodbverify:0x01 OFS.Application.Application at level 5
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f6 persistent.mapping.PersistentMapping at level 6
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f7 zope.component.persistentregistry.PersistentAdapterRegistry at level 7
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f5 plone.app.redirector.storage.RedirectionStorage at level 5
INFO:zodbverify:0x02fa zope.ramcache.ram.RAMCache at level 6
INFO:zodbverify:0x02fd plone.contentrules.engine.storage.RuleStorage at level 7
INFO:zodbverify:0x338f13 plone.app.contentrules.rule.Rule at level 8
INFO:zodbverify:0x0303 BTrees.OOBTree.OOBTree at level 9
INFO:zodbverify:0x346961 plone.app.contentrules.rule.Rule at level 9
INFO:zodbverify:0x346b59 plone.app.contentrules.rule.Rule at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x02fe plone.app.viewletmanager.storage.ViewletSettingsStorage at level 8
INFO:zodbverify:0x034d plone.keyring.keyring.Keyring at level 9
INFO:zodbverify:0x02fb persistent.mapping.PersistentMapping at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x3b1a32 plone.keyring.keyring.Keyring at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x3c1ab3 BTrees.OOBTree.OOBucket at level 2
INFO:zodbverify:0x3c2df1 BTrees.OOBTree.OOBucket at level 3
INFO:zodbverify:0x3c2d38 BTrees.OOBTree.OOBucket at level 4
INFO:zodbverify:0x3c23e0 BTrees.OOBTree.OOBucket at level 5
INFO:zodbverify:0x3c1e2c BTrees.OOBTree.OOBucket at level 6
INFO:zodbverify:0x3c1aaa BTrees.OOBTree.OOBucket at level 7
INFO:zodbverify:0x3eb91c BTrees.OOBTree.OOBucket at level 8
INFO:zodbverify:0x3ec134 BTrees.OOBTree.OOBucket at level 9
INFO:zodbverify:0x3ebeca BTrees.OOBTree.OOBucket at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

This will show that the object in question exists in the Relation-Catalog and can be accessed, and also deleted using the api of this tool.

I'd like some feedback on how to improve this further.

@pbauer pbauer requested review from icemac and mauritsvanrees May 16, 2020 13:01
@rafaelbco
Copy link
Member

rafaelbco commented May 16, 2020

I wrote something very similar in collective.zodbdebug.

I see that you ran into the same problem as I regarding the loops and other irrelevant chains of references between objects.

I did not have the insight to use the depth of the reference chain to break loops, which is very clever.

So I ended up with a complex and ad-hoc system of heurustics to choose between which reference to follow in a greedy algorithm fashion 😅

One good idea I had, however, was to cache the references dict in disk, keyed by the last transaction ID. This saves a lot of time when dealing with large databases.

Nice work!

@pbauer
Copy link
Member Author

pbauer commented May 17, 2020

@rafaelbco Wow, I did not know collective.zodbdebug yet. At a quick glance it looks pretty nifty and can offer much more that what I put together.
Sadly it does not support python 3 yet 😢

Especially get_attr_name is a beauty and I need to include it. Storing the tree on disk by the last transaction ID is also a great approach I was wondering how to do it myself.

_get_reference_score is pretty scary indeed. My experience so far is that usually the first path returned by my code is also the most direct one but my experience is still limited.

I'd love to get together at the next conference and combine what we learned into some best-practice documentation that developers can use.

pbauer added 2 commits August 24, 2020 11:50
… to referencing items up to the root.

This should give a idea where in the object-tree a item is actually located, how to access and fix it.
@pbauer
Copy link
Member Author

pbauer commented Aug 24, 2020

Inspired by collective.zodbdebug I added the file-cache and more information about inspected objects.

@pbauer
Copy link
Member Author

pbauer commented Aug 24, 2020

@mauritsvanrees @icemac I use this frequently and think it is good to be merged.

@pbauer pbauer requested a review from jensens August 28, 2020 09:38
Copy link
Member

@jensens jensens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall it LGTM, details see comments.

src/zodbverify/verify_oid.py Show resolved Hide resolved
oid_refs = get_refs(data)
if oid_refs:
for referenced_oid, class_info in oid_refs:
self.refs[oid_repr(referenced_oid)].append(oid_repr(oid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use the oid_repr as key? IMO this is overhead. The oid should be hashable as well, has a smaller memory footprint and lots of CPU cycles would be saved as well.

logger.debug('The oid {} does not exist!'.format(oid))
return
child_pickle, state = self.storage.load(repr_to_oid(oid))
child_class_info = '%s.%s' % get_pickle_metadata(child_pickle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mix .format and % notation which is kind of confusing.

name = self.get_id_or_attr_name(oid=oid, parent_oid=ref)

if name:
msg = '{} ({}) is {} for {} ({}) at level {}'.format(oid, child_class_info, name, ref, class_info, level)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that many parameters kwargs are a better way to pass parameters to .format.

logger.info('Save reference-cache as {}'.format(path))

def _get_reference_cache_path(self):
cache_dir = os.path.join(os.path.expanduser('~'), '.cache', 'zodbverify')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is no clean-up? This can add up to some data given large databases. IMO keeping old caches is not needed, just wipe them if a unknown transaction id is coming in.

@icemac icemac removed their request for review September 3, 2020 06:27
@icemac
Copy link

icemac commented Sep 3, 2020

I currently do not have the energy to review this PR.

@pbauer
Copy link
Member Author

pbauer commented Sep 7, 2020

@jensens thanks for reviewing. I'll address the issues you mentioned next when I have the time.

@mauritsvanrees
Copy link
Member

I used this a couple of times today, and seems to work well.
In ~/.cache/zodbverify/ I now have three json files of 6.7 MB each, but I removed most of the data from the site that I used it on, to have a smaller database to test a migration with.
This does need a fix from Philip in ZODB from earlier this year, which was merged but is not in a release yet. But not everyone may hit that code path.

@vernans
Copy link

vernans commented Feb 2, 2021

I also used it some weeks ago and it was very useful. Nevertheless, I just wanted to let you know while reading about zc.zodbdcg on zodb.org I noticed this package also implements a similar thing, doesn't it? "multi-zodb-check-refs" it is called. It seems to collect references of oids and stores them in a seperate (zodb)-database (which could take hours it states). I haven't tried it yet, its tests run python 3 though.

I would have also needed Philip's zodb fix on a mildly customized plone installation, luckily it was not too hard to see what's wrong there.

@jensens
Copy link
Member

jensens commented Feb 3, 2021

Since this works for you, in order to to get this merged, I propose one of you creates issues for the remaining minor open tasks and then we merge this?

@ale-rt
Copy link
Member

ale-rt commented Feb 3, 2021

I also used this branch with success in the past.

@ale-rt
Copy link
Member

ale-rt commented Feb 3, 2021

One good idea I had, however, was to cache the references dict in disk, keyed by the last transaction ID. This saves a lot of time when dealing with large databases.

+1

@icemac
Copy link

icemac commented Apr 6, 2022

I was successfully able to use this PR to find out where a broken object was referenced.
The result seems reasonable although I did not look into the code for a review.

@pbauer Is there a change to get this PR ready to be merged?

Copy link
Member

@mauritsvanrees mauritsvanrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for me, as indicated earlier.
I have created issues for a few open comments/suggestions.
I will merge and make a release. It is high time.

@mauritsvanrees mauritsvanrees merged commit 8e8a3b1 into master Jul 6, 2022
@mauritsvanrees
Copy link
Member

I have released version 1.2.0.

@icemac icemac deleted the show_references branch July 7, 2022 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants