Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4597 Fix invalid identify response. #7062

Merged
merged 1 commit into from
Aug 3, 2020
Merged

4597 Fix invalid identify response. #7062

merged 1 commit into from
Aug 3, 2020

Conversation

JingMa87
Copy link
Contributor

@JingMa87 JingMa87 commented Jul 7, 2020

What this PR does / why we need it: Changes the Identify response for a OAI PMH request so it's XSD valid.

Which issue(s) this PR closes: 4597

Closes #4597

Special notes for your reviewer: It's looking good

Suggestions on how to test this: Example call: https://dataverse.harvard.edu/oai?verb=Identify

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Perhaps

Additional documentation:

@coveralls
Copy link

coveralls commented Jul 7, 2020

Coverage Status

Coverage decreased (-0.0006%) to 19.602% when pulling 089415c on JingMa87:4597-fix-invalid-identify-response into a6f580f on IQSS:develop.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JingMa87 and I are pretty sure we want some of the values, such as "repositoryIdentifier" to be configurable.

scheme.appendChild(doc.createTextNode("oai"));
oaiIdentifier.appendChild(scheme);
Element repositoryIdentifier = doc.createElement("repositoryIdentifier");
repositoryIdentifier.appendChild(doc.createTextNode("dataverse.org"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #4597 @JingMa87 and I have been discussing making "repositoryIdentifier" (and possibly other identifiers configurable).

@JingMa87
Copy link
Contributor Author

@pdurbin The coverage decreased, do you need a test for all the functions people add?

@pdurbin
Copy link
Member

pdurbin commented Jul 10, 2020

@JingMa87 no, that's not necessary. I see you made some changes. Are you ready for more code review?

@JingMa87
Copy link
Contributor Author

@pdurbin Go ahead.

@pdurbin pdurbin self-assigned this Jul 21, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments in my review. In addition, please resolve the merge conflicts.

@@ -282,7 +282,7 @@

<p:commandButton id="updateEditDataFilesButtonsForUpload" action="#{EditDatafilesPage.uploadFinished()}" update="datasetForm:editDataFilesButtons" rendered="#{showFileButtonUpdate}" style="display:none"/>
<p:commandButton id="updateEditDataFilesButtonsForDelete" action="#{EditDatafilesPage.deleteFilesCompleted()}" update="datasetForm:editDataFilesButtons" rendered="#{showFileButtonUpdate}" style="display:none"/>
<p:commandButton id="AllUploadsFinished" action="#{EditDatafilesPage.uploadFinished()}" update="@form,datasetForm:fileUpload,datasetForm:dropBoxUserButton,datasetForm:uploadMessage,datasetForm:rsyncPanel,datasetForm:filesCounts,datasetForm:filesTable" oncomplete="javascript:bind_bsui_components();javascript:uploadWidgetDropMsg();" style="display:none"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change to @form in this pull request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was testing some new functionality but inadvertently pushed these changes so I'll revert them. Same applies to immediate="true".

@@ -1589,7 +1589,7 @@
</ui:fragment>
<div class="button-block">
<p:commandButton styleClass="btn btn-default" value="#{bundle.continue}"
onclick="PF('publishDataset').hide();PF('blockDatasetForm').hide();" action="#{DatasetPage.releaseDataset}" immediate="true"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change to immediate="true" true` in this pull request?

@@ -0,0 +1 @@
Added the following database options: Scheme, RepositoryIdentifier, Delimiter, SampleIdentifier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change all these (in the code too, of course) to start with "Harvesting"? That is:

:HarvestingScheme, :HarvestingRepositoryIdentifier, :HarvestingDelimiter, :HarvestingSampleIdentifier

Otherwise, they're too generic, especially :Scheme or :Delimiter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I jumped the gun. Can you please have these start with "Oiapmh" instead? Like this:

:OaipmhScheme, :OaipmhRepositoryIdentifier, :OaipmhDelimiter, :OaipmhSampleIdentifier

Also, can you please document these in doc/sphinx-guides/source/installation/config.rst? Thanks!

String scheme = settingsService.getValueForKey(SettingsServiceBean.Key.Scheme, "oai");
String repositoryIdentifier = settingsService.getValueForKey(SettingsServiceBean.Key.RepositoryIdentifier, "dataverse.org");
String delimiter = settingsService.getValueForKey(SettingsServiceBean.Key.Delimiter, ":");
String sampleIdentifier = settingsService.getValueForKey(SettingsServiceBean.Key.SampleIdentifier, "doi:10.7910/DVN/1HE30F");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In http://oval.base-search.net this DOI is giving an error:

ERROR: Identify response well-formed but invalid: Element '{http://www.openarchives.org/OAI/2.0/oai-identifier}sampleIdentifier': [facet 'pattern'] The value 'doi:10.7910/DVN/1HE30F' is not accepted by the pattern 'oai:[a-zA-Z][a-zA-Z0-9-](.[a-zA-Z][a-zA-Z0-9-])+:[a-zA-Z0-9-_.!~*'();/?:@&=+$,%]+'., line 5

In context:

Screen Shot 2020-07-21 at 12 23 12 PM

@pdurbin
Copy link
Member

pdurbin commented Jul 21, 2020

As of 7ebdb42 one validator was looking good:

before

valBefore

after

valAfter

But another one was still showing errors:

before

ovalBefore

after

ovalAfter

If it's possible to fix all of these, great! If not, no problem, we're happy for any improvement! Thanks!

@pdurbin pdurbin assigned JingMa87 and unassigned pdurbin Jul 21, 2020
@JingMa87
Copy link
Contributor Author

JingMa87 commented Jul 29, 2020

@pdurbin The reason for the error is that the sampleIdentifier has to start with "oai", which is actually not correct for Dataverse since a typical identifier starts with "doi". If this is a problem, we don't necessarily have to use these elements like scheme and sampleIdentifier. The parent <description> element is both optional and extensible and the content depends on the XSD schema you add as an attribute to the child element, which in our case is the <oai-identifier> element with a schema from the OAI website: http://www.openarchives.org/OAI/2.0/oai-identifier.xsd. So if we don't want to add a sample identifier which is not representative for Dataverse, we can leave out the description element all together since the minOccurs of that element is 0 and we'd still pass all validation tests. How do you feel about removing the description element? It would also make the code and the DB less bloated.

image

@pdurbin
Copy link
Member

pdurbin commented Jul 30, 2020

@JingMa87 I love the idea of removing the description element and having fewer database options.

@JingMa87
Copy link
Contributor Author

@pdurbin You can review the changes now. That XOAI library is really not user friendly tbh. 😦

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To goal is for the response to the OAI-PMH "Identify" verb to be compliant with the spec, to validate.

What this code seems to do is remove the "description" element, or make it empty, I haven't tested the code.

From looking at demo, there doesn't seem to be anything interesting in this element, just a line about the Java library we use. Here's a screenshot:

Screen Shot 2020-07-31 at 10 19 13 AM

I guess I have a slight concern that this is a backward incompatible change but again, since there seems to be nothing of value in "description" blanking it out or removing it seems like an ok approach to me so I'm moving it to QA.

@kcondon you might also want to get opinions from @landreev and/or @jggautier since this is harvesting related.

@kcondon
Copy link
Contributor

kcondon commented Jul 31, 2020

@pdurbin Would you mind checking with Leonid and Julian on the validity of that approach? I think that should happen before it gets to QA.

@pdurbin
Copy link
Member

pdurbin commented Jul 31, 2020

@kcondon sure. Can do. For now I parked this in code review.

@pdurbin pdurbin self-assigned this Jul 31, 2020
@landreev
Copy link
Contributor

That <description> element isn't used for anything; so it should be safe to remove. We weren't putting it there on purpose either, it was just the default behavior of the XOAI library.

I'm surprised this was causing the response to be invalid though; I thought the spec basically said there could be any free style xml inside that <description>...</description> block, as long as all the tags were properly closed, etc. (And I do remember the our site was passing the OAI validator, after we implemented it). But if any specific software doesn't like it as is, sure, let's remove it.

@pdurbin pdurbin removed their assignment Jul 31, 2020
@pdurbin
Copy link
Member

pdurbin commented Jul 31, 2020

Thanks for the comment from @landreev and the 👍 from @jggautier. I'm moving this to QA.

@JingMa87
Copy link
Contributor Author

@landreev I don't know if they changed it in the meantime but the OAI-PMH XSD says that the content of the description element should have an XSD which determines the child elements (http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd).

image

@kcondon kcondon self-assigned this Aug 3, 2020
@kcondon kcondon merged commit 3158c86 into IQSS:develop Aug 3, 2020
@JingMa87 JingMa87 deleted the 4597-fix-invalid-identify-response branch August 4, 2020 09:03
@djbrooke djbrooke added this to the Dataverse 5 milestone Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OAI PMH: Invalid Identify-Response
6 participants