show all objects that reference a oid #8

pbauer · 2020-05-16T13:00:48Z

First I build a dict of all references for reverse-lookup.
Then I recursively follow the trail of references to referencing items up to the root.
To prevent irrelevant entries I abort after level 10 and at some root-objects because these are usually references a lot and would clutter the result with irrelevant information.

The output should give a pretty good idea where in the object-tree a item is actually located, how to access and fix it.

Currently the output is like this:

$ ./bin/zodbverify -f var/filestorage/Data.fs -o 0x3384dc

INFO:zodbverify:Building a reference-tree of ZODB...
INFO:zodbverify:Objects: 5000
[...]
INFO:zodbverify:Objects: 120000
INFO:zodbverify:Created a reference-dict for 121106 objects.

INFO:zodbverify:
This oid is referenced by:

INFO:zodbverify:0x11c284 BTrees.OOBTree.OOBTree at level 1
INFO:zodbverify:0x11c278 z3c.relationfield.index.RelationCatalog at level 2
INFO:zodbverify:0x1e five.localsitemanager.registry.PersistentComponents at level 3
INFO:zodbverify:0x11 Products.CMFPlone.Portal.PloneSite at level 4
INFO:zodbverify:0x01 OFS.Application.Application at level 5
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f6 persistent.mapping.PersistentMapping at level 6
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f7 zope.component.persistentregistry.PersistentAdapterRegistry at level 7
INFO:zodbverify: 8< --------------- >8 Stop at root objects

INFO:zodbverify:0x02f5 plone.app.redirector.storage.RedirectionStorage at level 5
INFO:zodbverify:0x02fa zope.ramcache.ram.RAMCache at level 6
INFO:zodbverify:0x02fd plone.contentrules.engine.storage.RuleStorage at level 7
INFO:zodbverify:0x338f13 plone.app.contentrules.rule.Rule at level 8
INFO:zodbverify:0x0303 BTrees.OOBTree.OOBTree at level 9
INFO:zodbverify:0x346961 plone.app.contentrules.rule.Rule at level 9
INFO:zodbverify:0x346b59 plone.app.contentrules.rule.Rule at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x02fe plone.app.viewletmanager.storage.ViewletSettingsStorage at level 8
INFO:zodbverify:0x034d plone.keyring.keyring.Keyring at level 9
INFO:zodbverify:0x02fb persistent.mapping.PersistentMapping at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x3b1a32 plone.keyring.keyring.Keyring at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify: 8< --------------- >8 Stop after level 10!

INFO:zodbverify:0x3c1ab3 BTrees.OOBTree.OOBucket at level 2
INFO:zodbverify:0x3c2df1 BTrees.OOBTree.OOBucket at level 3
INFO:zodbverify:0x3c2d38 BTrees.OOBTree.OOBucket at level 4
INFO:zodbverify:0x3c23e0 BTrees.OOBTree.OOBucket at level 5
INFO:zodbverify:0x3c1e2c BTrees.OOBTree.OOBucket at level 6
INFO:zodbverify:0x3c1aaa BTrees.OOBTree.OOBucket at level 7
INFO:zodbverify:0x3eb91c BTrees.OOBTree.OOBucket at level 8
INFO:zodbverify:0x3ec134 BTrees.OOBTree.OOBucket at level 9
INFO:zodbverify:0x3ebeca BTrees.OOBTree.OOBucket at level 10
INFO:zodbverify: 8< --------------- >8 Stop after level 10!

This will show that the object in question exists in the Relation-Catalog and can be accessed, and also deleted using the api of this tool.

I'd like some feedback on how to improve this further.

rafaelbco · 2020-05-16T19:33:37Z

I wrote something very similar in collective.zodbdebug.

I see that you ran into the same problem as I regarding the loops and other irrelevant chains of references between objects.

I did not have the insight to use the depth of the reference chain to break loops, which is very clever.

So I ended up with a complex and ad-hoc system of heurustics to choose between which reference to follow in a greedy algorithm fashion 😅

One good idea I had, however, was to cache the references dict in disk, keyed by the last transaction ID. This saves a lot of time when dealing with large databases.

Nice work!

pbauer · 2020-05-17T09:47:47Z

@rafaelbco Wow, I did not know collective.zodbdebug yet. At a quick glance it looks pretty nifty and can offer much more that what I put together.
Sadly it does not support python 3 yet 😢

Especially get_attr_name is a beauty and I need to include it. Storing the tree on disk by the last transaction ID is also a great approach I was wondering how to do it myself.

_get_reference_score is pretty scary indeed. My experience so far is that usually the first path returned by my code is also the most direct one but my experience is still limited.

I'd love to get together at the next conference and combine what we learned into some best-practice documentation that developers can use.

… to referencing items up to the root. This should give a idea where in the object-tree a item is actually located, how to access and fix it.

…stly stolen from collective.zodbdebug

pbauer · 2020-08-24T10:29:25Z

Inspired by collective.zodbdebug I added the file-cache and more information about inspected objects.

pbauer · 2020-08-24T10:30:08Z

@mauritsvanrees @icemac I use this frequently and think it is good to be merged.

jensens

overall it LGTM, details see comments.

src/zodbverify/verify_oid.py

jensens · 2020-08-30T10:01:48Z

src/zodbverify/verify_oid.py

+            oid_refs = get_refs(data)
+            if oid_refs:
+                for referenced_oid, class_info in oid_refs:
+                    self.refs[oid_repr(referenced_oid)].append(oid_repr(oid))


Why do you use the oid_repr as key? IMO this is overhead. The oid should be hashable as well, has a smaller memory footprint and lots of CPU cycles would be saved as well.

jensens · 2020-08-30T10:03:02Z

src/zodbverify/verify_oid.py

+            logger.debug('The oid {} does not exist!'.format(oid))
+            return
+        child_pickle, state = self.storage.load(repr_to_oid(oid))
+        child_class_info = '%s.%s' % get_pickle_metadata(child_pickle)


You mix .format and % notation which is kind of confusing.

jensens · 2020-08-30T10:03:46Z

src/zodbverify/verify_oid.py

+                name = self.get_id_or_attr_name(oid=oid, parent_oid=ref)
+
+            if name:
+                msg = '{} ({}) is {} for {} ({}) at level {}'.format(oid, child_class_info, name, ref, class_info, level)


With that many parameters kwargs are a better way to pass parameters to .format.

jensens · 2020-08-30T10:14:16Z

src/zodbverify/verify_oid.py

+        logger.info('Save reference-cache as {}'.format(path))
+
+    def _get_reference_cache_path(self):
+        cache_dir = os.path.join(os.path.expanduser('~'), '.cache', 'zodbverify')


So there is no clean-up? This can add up to some data given large databases. IMO keeping old caches is not needed, just wipe them if a unknown transaction id is coming in.

icemac · 2020-09-03T06:28:20Z

I currently do not have the energy to review this PR.

pbauer · 2020-09-07T07:58:34Z

@jensens thanks for reviewing. I'll address the issues you mentioned next when I have the time.

mauritsvanrees · 2020-12-22T17:46:21Z

I used this a couple of times today, and seems to work well.
In ~/.cache/zodbverify/ I now have three json files of 6.7 MB each, but I removed most of the data from the site that I used it on, to have a smaller database to test a migration with.
This does need a fix from Philip in ZODB from earlier this year, which was merged but is not in a release yet. But not everyone may hit that code path.

vernans · 2021-02-02T10:32:09Z

I also used it some weeks ago and it was very useful. Nevertheless, I just wanted to let you know while reading about zc.zodbdcg on zodb.org I noticed this package also implements a similar thing, doesn't it? "multi-zodb-check-refs" it is called. It seems to collect references of oids and stores them in a seperate (zodb)-database (which could take hours it states). I haven't tried it yet, its tests run python 3 though.

I would have also needed Philip's zodb fix on a mildly customized plone installation, luckily it was not too hard to see what's wrong there.

jensens · 2021-02-03T08:11:24Z

Since this works for you, in order to to get this merged, I propose one of you creates issues for the remaining minor open tasks and then we merge this?

ale-rt · 2021-02-03T08:14:00Z

I also used this branch with success in the past.

ale-rt · 2021-02-03T08:14:47Z

One good idea I had, however, was to cache the references dict in disk, keyed by the last transaction ID. This saves a lot of time when dealing with large databases.

+1

icemac · 2022-04-06T10:05:03Z

I was successfully able to use this PR to find out where a broken object was referenced.
The result seems reasonable although I did not look into the code for a review.

@pbauer Is there a change to get this PR ready to be merged?

mauritsvanrees

It works for me, as indicated earlier.
I have created issues for a few open comments/suggestions.
I will merge and make a release. It is high time.

mauritsvanrees · 2022-07-06T22:00:39Z

I have released version 1.2.0.

pbauer requested review from icemac and mauritsvanrees May 16, 2020 13:01

pbauer added 2 commits August 24, 2020 11:50

show all objects that reference a oid. follow the trail of references…

80e292a

… to referencing items up to the root. This should give a idea where in the object-tree a item is actually located, how to access and fix it.

Add file-cache of reftree and add more information on refs (name). Mo…

c38a209

…stly stolen from collective.zodbdebug

pbauer force-pushed the show_references branch from b5af30e to c38a209 Compare August 24, 2020 09:50

pbauer requested a review from jensens August 28, 2020 09:38

jensens reviewed Aug 30, 2020

View reviewed changes

icemac removed their request for review September 3, 2020 06:27

jensens added 04 type: enhancement New feature or request 13 prio: normal 22 status: in-progress labels Feb 23, 2021

This was referenced Jul 6, 2022

Why do you use the oid_repr as key? #10

Open

You mix .format and % notation which is kind of confusing. #11

Open

With that many parameters kwargs are a better way to pass parameters to .format. #12

Open

So there is no clean-up? #13

Open

mauritsvanrees approved these changes Jul 6, 2022

View reviewed changes

mauritsvanrees merged commit 8e8a3b1 into master Jul 6, 2022

icemac deleted the show_references branch July 7, 2022 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

show all objects that reference a oid #8

show all objects that reference a oid #8

pbauer commented May 16, 2020

rafaelbco commented May 16, 2020 •

edited

Loading

pbauer commented May 17, 2020

pbauer commented Aug 24, 2020

pbauer commented Aug 24, 2020 •

edited

Loading

jensens left a comment

jensens Aug 30, 2020

jensens Aug 30, 2020

jensens Aug 30, 2020

jensens Aug 30, 2020

icemac commented Sep 3, 2020

pbauer commented Sep 7, 2020

mauritsvanrees commented Dec 22, 2020

vernans commented Feb 2, 2021 •

edited

Loading

jensens commented Feb 3, 2021

ale-rt commented Feb 3, 2021

ale-rt commented Feb 3, 2021

icemac commented Apr 6, 2022

mauritsvanrees left a comment

mauritsvanrees commented Jul 6, 2022

show all objects that reference a oid #8

show all objects that reference a oid #8

Conversation

pbauer commented May 16, 2020

rafaelbco commented May 16, 2020 • edited Loading

pbauer commented May 17, 2020

pbauer commented Aug 24, 2020

pbauer commented Aug 24, 2020 • edited Loading

jensens left a comment

Choose a reason for hiding this comment

jensens Aug 30, 2020

Choose a reason for hiding this comment

jensens Aug 30, 2020

Choose a reason for hiding this comment

jensens Aug 30, 2020

Choose a reason for hiding this comment

jensens Aug 30, 2020

Choose a reason for hiding this comment

icemac commented Sep 3, 2020

pbauer commented Sep 7, 2020

mauritsvanrees commented Dec 22, 2020

vernans commented Feb 2, 2021 • edited Loading

jensens commented Feb 3, 2021

ale-rt commented Feb 3, 2021

ale-rt commented Feb 3, 2021

icemac commented Apr 6, 2022

mauritsvanrees left a comment

Choose a reason for hiding this comment

mauritsvanrees commented Jul 6, 2022

rafaelbco commented May 16, 2020 •

edited

Loading

pbauer commented Aug 24, 2020 •

edited

Loading

vernans commented Feb 2, 2021 •

edited

Loading