-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iqss/6497 semantic api #7414
Iqss/6497 semantic api #7414
Conversation
Conflicts: src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
This reverts commit 8596ac8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is looking really good and I'm excited to have a smaller, more compact JSON format for creating a dataset. That was the dream of #3068 (which focused on output, initially).
I left comments and questions throughout. I'd be happy to get on a call to discuss them. I haven't run the code locally yet but I might like to do that too.
doc/sphinx-guides/source/developers/dataset-semantic-metadata-api.rst
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/util/json/JsonLDTerm.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Philip Durbin <philipdurbin@gmail.com>
…api.rst Co-authored-by: Philip Durbin <philipdurbin@gmail.com>
@pdurbin - thanks for reviewing! I think I've addressed the straight-forward issues. The one thing that is tbd would be to decide on whether to leave the metadataOnOrig functionality in or not. It could probably be pulled out for now and treated as another PR (that could sit for a while) if there's concern about having it in the code now. (FWIW: I think DANS has created a custom metadata block for the metadata they had in their old system that doesn't match and therefore isn't relying on the metadataOnOrig functionality.) If I missed something, let me know. I'm also happy to just talk through things as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the changes are looking good. A few things on my mind:
- As suggested by @qqmyers, I vote we remove the metadataOnOrig code and tsv if we really don't need it. It's the kind of thing we can add in later with more discussion, if necessary. If we decided to keep it, it definitely needs to be documented.
- In the guides, I'd be happy to have a little more of an intro to JSON-LD and examples of how to create a dataset with one author vs. multiple, for example. We can live without this though, so please consider it a "nice to have".
- Under "Suggestions on how to test this" it says "see examples in comments below" but I'm confused about what this means. Perhaps it means the examples in the guides. I'd suggest clarifying this.
- I'm still slightly weirded out that we have URLs like https://dataverse.org/schema/citation/Description in examples but they're 404s.
- I left a comment about the case being used in the JSON-LD examples. At first glance it doesn't seem consistent.
@@ -0,0 +1,5 @@ | |||
#metadataBlock name dataverseAlias displayName blockURI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all non-obvious to me from a quick look at the code. If it's on the table to remove the metadataOnOrig functionality, I think we should. If you get other opinions that we should keep it, please document how it works.
"http://purl.org/dc/terms/creator": { | ||
"https://dataverse.org/schema/citation/author#Name": "Finch, Fiona", | ||
"https://dataverse.org/schema/citation/author#Affiliation": "Birds Inc." | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I figured. When I was working on the Schema.org JSON-LD output I was shocked by how loosey goosey JSON-LD is.
At some point it might behoove us to add a little crash course on JSON-LD to the guides, or at least plenty of examples so that users get the hang of it. Otherwise, I think we can anticipate questions like "How do I specify multiple authors?"
"https://dataverse.org/schema/citation/datasetContact#E-mail": "finch@mailinator.com", | ||
"https://dataverse.org/schema/citation/datasetContact#Name": "Finch, Fiona" | ||
}, | ||
"https://dataverse.org/schema/citation/Description": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capitalization seems inconsistent.
Contact and Description (title case) but also author, datasetContact and dsDescription (camel case)?
What are the rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These come from the citation.tsv block with the pattern <block name>/<field title> for single fields and <block name>/<field title>/<child field name> being used. That choice was made back when OAI_ORE/BagIt/archiving was introduced. The use if title was an attempt to use URIs that mirrored what users see in the UI. I'm less sure why I used name for child fields - not sure if there was a conflict or if it was an issue with originally just trying a flatter <blockname>/<childfield title> and realizing that there are multiple child fields with the title 'Name' for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Sounds like a preexisting condition to me. 😄 It would be nice to have more consistency but oh well. I assume we don't want to revisit decisions made during BagIt export.
I've removed the metadataOnOrig logic (and created https://github.com/GlobalDataverseCommunityConsortium/dataverse/tree/RDA-with_metadataOnOrig to keep that code around as a proof-of-concept for RDA/if/when needed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still haven't run the code myself but this seems ready for QA. Thanks for the pull request and for hanging there with changes I asked for, @qqmyers!
"http://purl.org/dc/terms/creator": { | ||
"https://dataverse.org/schema/citation/author#Name": "Finch, Fiona", | ||
"https://dataverse.org/schema/citation/author#Affiliation": "Birds Inc." | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for more docs over time. I think we can live with what we have now, especially since it's experimental.
What this PR does / why we need it: This adds an API call that allows metadata to be added using json-ld - in the same format as the OAI-ORE export, and without having to retrieve the existing metadata and having to understand the way Dataverse is storing things internally. It's one step in supporting dataset migration via RDA Bags (see #6497) that might be useful independent of additional api calls to migrate a dataset. For example, it allows submission of new terms and conditions.
Which issue(s) this PR closes:
Closes #
Special notes for your reviewer:
Suggestions on how to test this: see examples in comments below
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: