Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up at_addrs with a dict. #38

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rainwoodman
Copy link

The old loop was O(mn). This is O(m+n).

The old loop was O(mn). This is O(m+n).
Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to make decisions about this function since I don't use it myself. I except that for small address sets naive iteration can be faster (and should use less memory), but I don't know what the expected average size of the address set is.

Also the KeyError thing -- perhaps it makes sense, but old behavior was to suppress errors, and changing it this way would probably require a major version bump according to SemVer. Again, I don't know what makes more sense to users, as I'm not one.

objgraph.py Outdated
id_to_obj = dict((id(o), o) for o in gc.get_objects())

for i in address_set:
o = id_to_obj[i]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can raise KeyError.

@rainwoodman
Copy link
Author

As there is no hard copy, the memory used by the dict should be close to 16 * number of GC objects. GC itself probably consumes as much memory as this.

I did not benchmark this in typical use cases (I only ran objgraph on small cases), though I can imagine it is difficult to find cases where O(m+n) is slower than O(mn).

For a stress test, one can probably collect new IDs with debug level of SAVEALL, and run this on the full id list.

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, LGTM.

Would you care to add a small changelog note in CHANGES.rst?

for o in gc.get_objects():
if id(o) in address_set:
res.append(o)
id_to_obj = dict((id(o), o) for o in gc.get_objects())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest

id_to_obj = {id(o): o for o in gc.get_objects()}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

res.append(o)
id_to_obj = dict((id(o), o) for o in gc.get_objects())

for i in address_set:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change these lines to

return [
    id_to_obj[i] 
    for i in address_set
    if i in id_to_obj
]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 except I'd put it all on a single line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants