Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/7068 Reserve File Pids #7334

Merged

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Oct 16, 2020

What this PR does / why we need it: Reserves file PIDs at creation time, if enabled, analogous to how datasets are now handled.

Which issue(s) this PR closes:

Closes #7068

Special notes for your reviewer:

Suggestions on how to test this: Should be able to see file PIDs/DOIs after file creation in the datacite console (e.g. doi.test.datacite.org). Deleting a file, or the dataset, prior to publication should remove the draft PID. Nominally this code calls whatever PID provider you use, so it should work with other providers capable of reserving PIDs. I'm not sure it is necessary to test them though - if the PID provider works, and this code works with DOIs, it should also work for those others.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: no

Is there a release notes update needed for this change?:

Additional documentation:

(analogous to dataset pid register if needed)

Conflicts:
	src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java
	src/main/java/propertyFiles/Bundle.properties
@coveralls
Copy link

coveralls commented Oct 16, 2020

Coverage Status

coverage: 20.796% (-0.008%) from 20.804%
when pulling 0fd4ee1 on QualitativeDataRepository:IQSS/7068-Reserve_file_PIDs
into 53f9b45 on IQSS:develop.

@djbrooke
Copy link
Contributor

Adding this to code review, because I think it is desirable and non-controversial functionality for those who have turned on File PIDs.

I'd be interested in thoughts from the Harvard Dataverse team as we've seen some issues with long running jobs around PID registration, and I wonder if this would potentially make it worse since there's now two steps (reservation and flipping to public) instead of one.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks fine. I'm only choosing "request changes" because I think we need a release note. I'm also suggesting a couple code changes but I don't see either one as showstoppers.

String currentGlobalIdProtocol = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Protocol, "");
String currentGlobalAuthority = ctxt.settings().getValueForKey(SettingsServiceBean.Key.Authority, "");
String dataFilePIDFormat = ctxt.settings().getValueForKey(SettingsServiceBean.Key.DataFilePIDFormat, "DEPENDENT");
boolean shouldRegister = ctxt.systemConfig().isFilePIDsEnabled() // We use file PIDs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinking about how we iterate though all the files below. Couldn't we short circuit that iteration and simply return early if file level PIDs are not enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulled the test of shouldRegister out of the loop.

logger.fine("IsFilePIDsEnabled: " + ctxt.systemConfig().isFilePIDsEnabled());
logger.fine("RegWhenPub: " + !idServiceBean.registerWhenPublished());
logger.fine("OK provider: " + ((currentGlobalIdProtocol.equals(protocol) && currentGlobalAuthority.equals(authority)) // the dataset PID is a protocol/authority Dataverse can create new PIDs in
|| dataFilePIDFormat.equals("INDEPENDENT")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's more code but rather than having hardcoded strings like "INDEPENDENT" my preference is to reference the place where the string comes from like this: SystemConfig.DataFilePIDFormat.INDEPENDENT.toString()

That way if you do a "find usages" in your editor, it easily comes up. And if we ever change the string (which is doubtful), it's easier to find.

This might just be a me think. Please feel free to get opinions from others.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have constants if you don't use them - changed "INDEPENDENT" and "DEPENDENT" everywhere

@pdurbin pdurbin assigned qqmyers and unassigned pdurbin Oct 30, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good as of d76a738 so I'm moving this to QA. I hear @djbrooke will add a release note.

@pdurbin
Copy link
Member

pdurbin commented Oct 30, 2020

Sorry, sorry, I just noticed merge conflicts. @qqmyers can you please fix?

IQSS/7068-Reserve_file_PIDs

Conflicts:
	src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
@djbrooke
Copy link
Contributor

Thanks @qqmyers and @pdurbin - adding the release note now.

@djbrooke djbrooke assigned djbrooke and unassigned qqmyers Oct 30, 2020
@djbrooke
Copy link
Contributor

djbrooke commented Nov 3, 2020

Thanks @qqmyers and @pdurbin for talking with me about this. I made release note additions and API doc edits because I thought we recommended reservation for file PIDs similar to how we did reservations for Dataset PIDs when we moved to the new pre-publish verification workflow in 5.0 (https://github.com/IQSS/dataverse/blob/develop/doc/release-notes/5.0-release-notes.md#dataverse-installations-using-datacite-upgrade-action-recommended). I backed these changes out as we don't need to reserve PIDs for existing files and the API does not currently support this. This does not result in a failure during publishing, it just means that those File PIDs will be registered at that time.

* @param datafileId
* @param storageLocation
* @return {@code true} iff the global identifier is unique.
*/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the PR - this just doesn't fit the method below which doesn't involve PIDs

@qqmyers qqmyers removed their assignment Mar 27, 2024
@landreev landreev self-requested a review July 19, 2024 18:05
@landreev landreev self-assigned this Jul 19, 2024
@landreev
Copy link
Contributor

Looks good overall.
Could you please sync the branch and resolve the conflict(s).

What is the back story, was there a specific reason why it got stuck and sat for 4 years? Was it just because of the concerns about file uploads becoming slower?

@qqmyers
Copy link
Member Author

qqmyers commented Jul 19, 2024

As far as I know, performance was the only potential issue - could still be an issue (although QDR uses it), but easier to manage overall since file PIDs can be enabled/disabled as desired now.

Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(re-)approving, in order to move the PR along into "ready for QA".

@landreev landreev removed their assignment Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 3 A percentage of a sprint. 2.1 hours.
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Reserve file-level DOIs?
9 participants