-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
show all objects that reference a oid #8
Conversation
I wrote something very similar in collective.zodbdebug. I see that you ran into the same problem as I regarding the loops and other irrelevant chains of references between objects. I did not have the insight to use the depth of the reference chain to break loops, which is very clever. So I ended up with a complex and ad-hoc system of heurustics to choose between which reference to follow in a greedy algorithm fashion 😅 One good idea I had, however, was to cache the references dict in disk, keyed by the last transaction ID. This saves a lot of time when dealing with large databases. Nice work! |
@rafaelbco Wow, I did not know collective.zodbdebug yet. At a quick glance it looks pretty nifty and can offer much more that what I put together. Especially
I'd love to get together at the next conference and combine what we learned into some best-practice documentation that developers can use. |
… to referencing items up to the root. This should give a idea where in the object-tree a item is actually located, how to access and fix it.
…stly stolen from collective.zodbdebug
b5af30e
to
c38a209
Compare
Inspired by collective.zodbdebug I added the file-cache and more information about inspected objects. |
@mauritsvanrees @icemac I use this frequently and think it is good to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall it LGTM, details see comments.
oid_refs = get_refs(data) | ||
if oid_refs: | ||
for referenced_oid, class_info in oid_refs: | ||
self.refs[oid_repr(referenced_oid)].append(oid_repr(oid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use the oid_repr
as key? IMO this is overhead. The oid
should be hashable as well, has a smaller memory footprint and lots of CPU cycles would be saved as well.
logger.debug('The oid {} does not exist!'.format(oid)) | ||
return | ||
child_pickle, state = self.storage.load(repr_to_oid(oid)) | ||
child_class_info = '%s.%s' % get_pickle_metadata(child_pickle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mix .format
and %
notation which is kind of confusing.
name = self.get_id_or_attr_name(oid=oid, parent_oid=ref) | ||
|
||
if name: | ||
msg = '{} ({}) is {} for {} ({}) at level {}'.format(oid, child_class_info, name, ref, class_info, level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that many parameters kwargs are a better way to pass parameters to .format
.
logger.info('Save reference-cache as {}'.format(path)) | ||
|
||
def _get_reference_cache_path(self): | ||
cache_dir = os.path.join(os.path.expanduser('~'), '.cache', 'zodbverify') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is no clean-up? This can add up to some data given large databases. IMO keeping old caches is not needed, just wipe them if a unknown transaction id is coming in.
I currently do not have the energy to review this PR. |
@jensens thanks for reviewing. I'll address the issues you mentioned next when I have the time. |
I used this a couple of times today, and seems to work well. |
I also used it some weeks ago and it was very useful. Nevertheless, I just wanted to let you know while reading about zc.zodbdcg on zodb.org I noticed this package also implements a similar thing, doesn't it? "multi-zodb-check-refs" it is called. It seems to collect references of oids and stores them in a seperate (zodb)-database (which could take hours it states). I haven't tried it yet, its tests run python 3 though. I would have also needed Philip's zodb fix on a mildly customized plone installation, luckily it was not too hard to see what's wrong there. |
Since this works for you, in order to to get this merged, I propose one of you creates issues for the remaining minor open tasks and then we merge this? |
I also used this branch with success in the past. |
+1 |
I was successfully able to use this PR to find out where a broken object was referenced. @pbauer Is there a change to get this PR ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works for me, as indicated earlier.
I have created issues for a few open comments/suggestions.
I will merge and make a release. It is high time.
I have released version 1.2.0. |
First I build a dict of all references for reverse-lookup.
Then I recursively follow the trail of references to referencing items up to the root.
To prevent irrelevant entries I abort after level 10 and at some root-objects because these are usually references a lot and would clutter the result with irrelevant information.
The output should give a pretty good idea where in the object-tree a item is actually located, how to access and fix it.
Currently the output is like this:
This will show that the object in question exists in the Relation-Catalog and can be accessed, and also deleted using the api of this tool.
I'd like some feedback on how to improve this further.