-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove collection size feature #4207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add logic to deprecate the #size
method usage? (e.g. have a #size
method for collections that returns n/a
)
567cf77
to
c57f998
Compare
While looking into #4100, we noticed that `CharacterizeJob` not only saves a file set's `original_file`, but also reindexes the file set, *and* reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works. Why is this problematic? At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in `CharacterizeJob` introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives). We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.
c57f998
to
84b4016
Compare
💯 to this! thanks. i do think it would be nice to make the public methods emit deprecation warnings and return hard-coded & compatible null-ish values |
db58889
to
2b2b9c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting some small tweaks, and some questions.
2b2b9c5
to
623292f
Compare
I know this is already merged, but I just want to add my kudos to @mjgiarlo (and any others working on this PR). Always happy to see performance enhancements. |
While looking into #4100, we noticed that
CharacterizeJob
not only saves a file set'soriginal_file
, but also reindexes the file set, and reindexes every collection to which the file set's work belongs. It appears that this feature was added way back in the Sufia days in order to support the use case of displaying on the collection show page the total size of all files uploaded to a collection's works.Why is this problematic?
At large scale, this means touching a potentially huge portion of the repository—iterating over all files in all works in a collection—in order to display an arguably useful string to a subset of repository users who care about such information. Doing this in
CharacterizeJob
introduces significant potential delay in the ingest pipeline, increasing how long it takes for a deposit to finish and render in the Hyrax UI (e.g., by delaying the creation of derivatives).We propose removing this feature for now and encourage folks who need this feature to contribute it back with a better-performing, more streamlined design.