Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8868 fix json import #8927

Merged
merged 7 commits into from
Aug 27, 2022
Merged

8868 fix json import #8927

merged 7 commits into from
Aug 27, 2022

Conversation

sekmiller
Copy link
Contributor

@sekmiller sekmiller commented Aug 22, 2022

What this PR does / why we need it: this PR makes sure that the json export of a dataset does not include metadataLanguage if it is undefined

Which issue(s) this PR closes:

Closes #8868 dataverse_json with "metadataLanguage":"undefined" cannot be imported

Special notes for your reviewer:
I moved the test for isMetadataLanguageSet to the DVObjectContainer so that it may be used in other situations where similar code was in place.

Suggestions on how to test this:
make sure that a dataset exported where the metadataLanguage is undefined may be imported.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No
Is there a release notes update needed for this change?:
Included to remind users to Re-export all to effect this change.
Additional documentation:
None

@coveralls
Copy link

coveralls commented Aug 22, 2022

Coverage Status

Coverage decreased (-0.003%) to 20.043% when pulling 4fdc499 on 8868-fix-json-import into eed4775 on develop.

@landreev landreev self-requested a review August 23, 2022 22:40
@landreev landreev self-assigned this Aug 23, 2022
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great/non-controversial, but could you please add a 1-line release note saying that this requires a re-export. So that we don't forget to include it into the 5.12 release instructions.


Run ReExportall to update Exports

Following the directions in the [Admin Guide](http://guides.dataverse.org/en/5.12/admin/metadataexport.html#batch-exports-through-the-api)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@landreev, I added a quick release note. Please feel free to edit as you wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect.

Copy link
Contributor

@landreev landreev Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to take you up on that "feel free" - adding a mention of the duplicate keyword in the DDI under bugfixes in the note, per Jim's suggestion. (I want to merge this PR tonight!)

@landreev
Copy link
Contributor

landreev commented Aug 25, 2022

@sekmiller @qqmyers
This looked like a most straightforward PR, but I managed to find something slightly puzzling in it while testing.
It definitely fixes the json export.
But since the DDI export is nominally touched in the PR, I checked on that too, just in case, and I'm seeing the following:
The dataset has a single Subject.
In the DDI export in the develop branch:

 <keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>
 <keyword>Medicine, Health and Life Sciences</keyword>

In this branch:

 <keyword xml:lang="en">Medicine, Health and Life Sciences</keyword>

I can't see immediately how the changes in the branch would result in this. (?)
Also, which is the correct behavior? (to me the latter looks more correct)
Jim, I added you because languages in metadata exports is your area.

@qqmyers
Copy link
Member

qqmyers commented Aug 25, 2022

Two guesses: I think the design is for the keywords to show in the platform default lang, and, if different, the metaata lang as well. So two possibilities - one machine used for testing has a non-english language (i.e. C.utf8 can be the default instead) so it creates english and a default, or possibility 2: the metadatalang=default in the json, which is used as the starting point for the DDI export was tricking the code into adding the second entry somehow. Looking at the code, my guess would be the former, i.e. that the machine locale is not english. (I've seen this on Ubuntu, not sure what else does it by default). i.e. !defaultLocale.getLanguage().equals(the metadatalanguage for the dataset) at

if (lang != null && !defaultLocale.getLanguage().equals(lang)) {

Not sure though...

@landreev
Copy link
Contributor

Both branches were tested on the same system (my dev. box), on the same dataset (by deleting the cached export, asking for ddi again). Tested develop -> switched to branch -> switched back to develop - the behavior is consistent as described above. (will switch back to the branch one more time).
While on the same branch, no difference between the ddi produced by the export API, vs. the export performed as part of publishing - that crossed my mind too.

@landreev
Copy link
Contributor

(the locale jumping between different settings between redeployments, unrelated to what's in the branch - not entirely impossible)

@landreev
Copy link
Contributor

landreev commented Aug 26, 2022

@qqmyers It is in fact the result our DDI being produced from our json.
The new behavior - just one subject/keyword entry - appears to be an improvement, correct?
So, ok to merge the branch?

@pdurbin
Copy link
Member

pdurbin commented Aug 26, 2022

Improvement or not, if we're going with this change, please document it in the release note snippet.

@landreev
Copy link
Contributor

@pdurbin

Improvement or not, if we're going with this change, please document it in the release note snippet.

Actually, no, I'm not sure this needs to be documented. I don't think the "old" behavior was ever documented, or intentional. To me it looks like it was only there since 5.11, and, again, as the result of that "metadataLanguage":"undefined" entry in the json; which was not added on purpose, and which this PR is reversing.
But I'll wait for an authoritative answer from Jim.

@landreev
Copy link
Contributor

Let me pull the fix for the harvesting test from develop, which will run the integration test...

@landreev
Copy link
Contributor

"terminated abnormally", hmm.
I'll try again.

@qqmyers
Copy link
Member

qqmyers commented Aug 26, 2022

re:improvement - yes, I think this is just part of the bug fix - when the no metadatalanguage value changed from null to undefined, it made it get exported and caused an extra keyword line in the DDI. Fixing that fixes the exports in general and the extra keyword lines that we presumably didn't notice. So - if there's any release text change, I'd suggest something about it fixing a ~duplicate keyword tag in the ddi export.

@landreev
Copy link
Contributor

OK, sure, makes sense to mention it under "bug fixes". And we are already recommending a reexport.

@landreev
Copy link
Contributor

@pdurbin You were right - makes sense to document, as a bug fix.

@landreev
Copy link
Contributor

Oh my, the last Jenkins run must have gotten caught by the 7pm auto-kill again. This PR just can't catch a break!

@landreev landreev merged commit 094c90a into develop Aug 27, 2022
@landreev landreev deleted the 8868-fix-json-import branch August 27, 2022 04:41
@pdurbin pdurbin added this to the 5.12 milestone Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dataverse_json with "metadataLanguage":"undefined" cannot be imported
5 participants