-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up at_addrs with a dict. #38
base: master
Are you sure you want to change the base?
Conversation
The old loop was O(mn). This is O(m+n).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to make decisions about this function since I don't use it myself. I except that for small address sets naive iteration can be faster (and should use less memory), but I don't know what the expected average size of the address set is.
Also the KeyError thing -- perhaps it makes sense, but old behavior was to suppress errors, and changing it this way would probably require a major version bump according to SemVer. Again, I don't know what makes more sense to users, as I'm not one.
objgraph.py
Outdated
id_to_obj = dict((id(o), o) for o in gc.get_objects()) | ||
|
||
for i in address_set: | ||
o = id_to_obj[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can raise KeyError.
As there is no hard copy, the memory used by the dict should be close to 16 * number of GC objects. GC itself probably consumes as much memory as this. I did not benchmark this in typical use cases (I only ran objgraph on small cases), though I can imagine it is difficult to find cases where O(m+n) is slower than O(mn). For a stress test, one can probably collect new IDs with debug level of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, LGTM.
Would you care to add a small changelog note in CHANGES.rst?
for o in gc.get_objects(): | ||
if id(o) in address_set: | ||
res.append(o) | ||
id_to_obj = dict((id(o), o) for o in gc.get_objects()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest
id_to_obj = {id(o): o for o in gc.get_objects()}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
res.append(o) | ||
id_to_obj = dict((id(o), o) for o in gc.get_objects()) | ||
|
||
for i in address_set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest change these lines to
return [
id_to_obj[i]
for i in address_set
if i in id_to_obj
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 except I'd put it all on a single line
The old loop was O(mn). This is O(m+n).