-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IQSS/7068 Reserve File Pids #7334
IQSS/7068 Reserve File Pids #7334
Conversation
(analogous to dataset pid register if needed) Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java src/main/java/propertyFiles/Bundle.properties
Adding this to code review, because I think it is desirable and non-controversial functionality for those who have turned on File PIDs. I'd be interested in thoughts from the Harvard Dataverse team as we've seen some issues with long running jobs around PID registration, and I wonder if this would potentially make it worse since there's now two steps (reservation and flipping to public) instead of one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks fine. I'm only choosing "request changes" because I think we need a release note. I'm also suggesting a couple code changes but I don't see either one as showstoppers.
String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, ""); | ||
String currentGlobalAuthority = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Authority, ""); | ||
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT"); | ||
boolean shouldRegister = ctxt.systemConfig().isFilePIDsEnabled() // We use file PIDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thinking about how we iterate though all the files below. Couldn't we short circuit that iteration and simply return early if file level PIDs are not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulled the test of shouldRegister out of the loop.
logger.fine("IsFilePIDsEnabled: " + ctxt.systemConfig().isFilePIDsEnabled()); | ||
logger.fine("RegWhenPub: " + !idServiceBean.registerWhenPublished()); | ||
logger.fine("OK provider: " + ((currentGlobalIdProtocol.equals(protocol) && currentGlobalAuthority.equals(authority)) // the dataset PID is a protocol/authority Dataverse can create new PIDs in | ||
|| dataFilePIDFormat.equals("INDEPENDENT"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's more code but rather than having hardcoded strings like "INDEPENDENT" my preference is to reference the place where the string comes from like this: SystemConfig.DataFilePIDFormat.INDEPENDENT.toString()
That way if you do a "find usages" in your editor, it easily comes up. And if we ever change the string (which is doubtful), it's easier to find.
This might just be a me think. Please feel free to get opinions from others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have constants if you don't use them - changed "INDEPENDENT" and "DEPENDENT" everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, sorry, I just noticed merge conflicts. @qqmyers can you please fix? |
IQSS/7068-Reserve_file_PIDs Conflicts: src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
…itativeDataRepository/dataverse into IQSS/7068-Reserve_file_PIDs
Thanks @qqmyers and @pdurbin for talking with me about this. I made release note additions and API doc edits because I thought we recommended reservation for file PIDs similar to how we did reservations for Dataset PIDs when we moved to the new pre-publish verification workflow in 5.0 (https://github.com/IQSS/dataverse/blob/develop/doc/release-notes/5.0-release-notes.md#dataverse-installations-using-datacite-upgrade-action-recommended). I backed these changes out as we don't need to reserve PIDs for existing files and the API does not currently support this. This does not result in a failure during publishing, it just means that those File PIDs will be registered at that time. |
IQSS/7068-Reserve_file_PIDs
* @param datafileId | ||
* @param storageLocation | ||
* @return {@code true} iff the global identifier is unique. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the PR - this just doesn't fit the method below which doesn't involve PIDs
Looks good overall. What is the back story, was there a specific reason why it got stuck and sat for 4 years? Was it just because of the concerns about file uploads becoming slower? |
IQSS/7068-Reserve_file_PIDs
As far as I know, performance was the only potential issue - could still be an issue (although QDR uses it), but easier to manage overall since file PIDs can be enabled/disabled as desired now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(re-)approving, in order to move the PR along into "ready for QA".
What this PR does / why we need it: Reserves file PIDs at creation time, if enabled, analogous to how datasets are now handled.
Which issue(s) this PR closes:
Closes #7068
Special notes for your reviewer:
Suggestions on how to test this: Should be able to see file PIDs/DOIs after file creation in the datacite console (e.g. doi.test.datacite.org). Deleting a file, or the dataset, prior to publication should remove the draft PID. Nominally this code calls whatever PID provider you use, so it should work with other providers capable of reserving PIDs. I'm not sure it is necessary to test them though - if the PID provider works, and this code works with DOIs, it should also work for those others.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: no
Is there a release notes update needed for this change?:
Additional documentation: