Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve efficiency of object type detection #537

Merged
merged 1 commit into from
Oct 6, 2022
Merged

Conversation

peetucket
Copy link
Member

@peetucket peetucket commented Oct 5, 2022

Why was this change made? 🤔

Part of sul-dlss/common-accessioning#1005, which is timeouts on a large object.

Experimentation as shown in that ticket has demonstrated that for very large cocina objects (which in DSA are manipulated as in memory hashes), the detection of object type may not be very efficient due to the use of both the rails method with_indifferent_access (this makes a full copy of the giant hash in memory and returns a new object) and fetch (also possibly makes a copy of the hash based on console output). Neither of these are really necessary, we can simply inspect the hash directly for the key and raise as needed.

While I did not characterize the actual performance/memory improvements achieved by this refactor (may only be a few seconds even for very large object), this method is also well tested, so seems low risk.

How was this change tested? 🤨

Existing specs (which cover this method well).

@@ -148,9 +148,11 @@ def self.with_metadata(cocina_object, lock, created: nil, modified: nil)
end

def self.type_for(dyn)
dyn.with_indifferent_access.fetch('type')
rescue KeyError
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least one full copy of the hash is made in memory here, which is not necessary

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop a comment in the code to this effect around here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd be happy to leave a code comment, though if this change is merged, the part of the code that makes the copy of the hash will be gone, so I'm not sure what the comment would say

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like

# Intentionally checking both string- and symbol-type keys in the hash via `#[]` instead of `#with_indifferent_access` (and/or `#fetch`) in order to be more memory-efficient

would be good, though I don't love my wording. Why? Lest we outsmart ourselves, or Rubocop tries the same, later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. Added the comment. thanks

Copy link
Member

@mjgiarlo mjgiarlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve pending consideration of comment

@peetucket peetucket force-pushed the object-type-detection branch from 0350658 to 43090e7 Compare October 6, 2022 04:34
@mjgiarlo mjgiarlo merged commit 865ff04 into main Oct 6, 2022
@mjgiarlo mjgiarlo deleted the object-type-detection branch October 6, 2022 14:15
@mjgiarlo
Copy link
Member

mjgiarlo commented Oct 6, 2022

Thanks, @peetucket!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants