-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve efficiency of object type detection #537
Conversation
@@ -148,9 +148,11 @@ def self.with_metadata(cocina_object, lock, created: nil, modified: nil) | |||
end | |||
|
|||
def self.type_for(dyn) | |||
dyn.with_indifferent_access.fetch('type') | |||
rescue KeyError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least one full copy of the hash is made in memory here, which is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop a comment in the code to this effect around here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd be happy to leave a code comment, though if this change is merged, the part of the code that makes the copy of the hash will be gone, so I'm not sure what the comment would say
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like
# Intentionally checking both string- and symbol-type keys in the hash via `#[]` instead of `#with_indifferent_access` (and/or `#fetch`) in order to be more memory-efficient
would be good, though I don't love my wording. Why? Lest we outsmart ourselves, or Rubocop tries the same, later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. Added the comment. thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve pending consideration of comment
0350658
to
43090e7
Compare
Thanks, @peetucket! |
Why was this change made? 🤔
Part of sul-dlss/common-accessioning#1005, which is timeouts on a large object.
Experimentation as shown in that ticket has demonstrated that for very large cocina objects (which in DSA are manipulated as in memory hashes), the detection of object type may not be very efficient due to the use of both the rails method
with_indifferent_access
(this makes a full copy of the giant hash in memory and returns a new object) andfetch
(also possibly makes a copy of the hash based on console output). Neither of these are really necessary, we can simply inspect the hash directly for the key and raise as needed.While I did not characterize the actual performance/memory improvements achieved by this refactor (may only be a few seconds even for very large object), this method is also well tested, so seems low risk.
How was this change tested? 🤨
Existing specs (which cover this method well).