-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iqss/6497 semantic api #7414
Iqss/6497 semantic api #7414
Changes from 107 commits
4d70971
b2befca
fb6421b
8bc3df6
5b828aa
f472b6c
bab11f0
d305867
98d978e
2443702
d442643
fdeac97
554e620
bee7731
f4cecd3
a4189de
e0de1db
928a88e
b78aed1
3a47630
3f8534b
64af0e8
04ee08a
c7c2573
fc77f92
e287644
8596ac8
ffbc05a
0226b0d
fad62f4
c6b19a9
a014fc4
5bb5e68
6a47fad
f34da09
6b8bbc7
8af0938
7904844
e4ceee3
f176387
2724d9e
7d5006d
925070b
005db97
f336cfd
0de70dd
395bb71
61c0349
1753257
e642c65
df66f22
3445daa
2e1d914
9ad779a
8abd55e
1a35ed2
7b1512e
e5b54df
578790f
acee4df
9185126
e8698dc
901efe8
34a28a3
1b98b2c
55a8b30
966394a
b83f7b2
1e08f10
780630f
cc7e69c
e0ea36e
6d0c615
51f8f78
353644a
2382fef
243769a
9bfa7c3
60f8a99
464832a
e931149
2b8189a
e8f737c
1c93260
a85c1d6
1476a61
a52353b
6f405ab
e866ae0
d5b8b45
56acda8
f19a199
a7c6b3f
33fb8de
87c581f
f47b268
6d73b61
4714ea6
82a5b23
10ef9ff
e159003
cf8b2b5
d5ff955
61627d1
1d54c68
bc82180
4c1d31a
a5a745d
bd37e30
0138ebb
13a7841
86a08e3
0c64c68
8e9f2f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Release Highlights | ||
|
||
### Dataset Semantic API (Experimental) | ||
|
||
Dataset metadata can be retrieved/set/updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata (#5899). | ||
|
||
This development was supported by the [Research Data Alliance](https://rd-alliance.org), DANS, and Sciences PO and follows the recommendations from the [Research Data Repository Interoperability Working Group](http://dx.doi.org/10.15497/RDA00025). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"http://purl.org/dc/terms/title": "Darwin's Finches", | ||
"http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences", | ||
"http://purl.org/dc/terms/creator": { | ||
"https://dataverse.org/schema/citation/author#Name": "Finch, Fiona", | ||
"https://dataverse.org/schema/citation/author#Affiliation": "Birds Inc." | ||
}, | ||
"https://dataverse.org/schema/citation/Contact": { | ||
"https://dataverse.org/schema/citation/datasetContact#E-mail": "finch@mailinator.com", | ||
"https://dataverse.org/schema/citation/datasetContact#Name": "Finch, Fiona" | ||
}, | ||
"https://dataverse.org/schema/citation/Description": { | ||
"https://dataverse.org/schema/citation/dsDescription#Text": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds." | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
Dataset Semantic Metadata API | ||
============================= | ||
|
||
The OAI_ORE metadata export format represents Dataset metadata using json-ld (see the :doc:`/admin/metadataexport` section). As part of an RDA-supported effort to allow import of Datasets exported as Bags with an included OAI_ORE metadata file, | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
an experimental API has been created that provides a json-ld alternative to the v1.0 API calls to get/set/delete Dataset metadata in the :doc:`/api/native-api`. | ||
|
||
You may prefer to work with this API if you are building a tool to import from a Bag/OAI-ORE source or already work with json-ld representations of metadata, or if you prefer the flatter json-ld representation to Dataverse software's json representation (which includes structure related to the metadata blocks involved and the type/multiplicity of the metadata fields.) | ||
You may not want to use this API if you need stability and backward compatibility (the 'experimental' designation for this API implies that community feedback is desired and that, in future Dataverse software versions, the API may be modified based on that feedback). | ||
|
||
Note: The examples use the 'application/ld+json' mimetype. For compatibility reasons, the APIs also be used with mimetype "application/json-ld" | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Get Dataset Metadata | ||
-------------------- | ||
|
||
To get the json-ld formatted metadata for a Dataset, specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID), and, for specific versions, the version number. | ||
|
||
.. code-block:: bash | ||
|
||
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ||
export DATASET_ID='12345' | ||
export DATASET_PID='doi:10.5072/FK2A1B2C3' | ||
export VERSION='1.0' | ||
export SERVER_URL=https://demo.dataverse.org | ||
|
||
Example 1: Get metadata for version '1.0' | ||
|
||
curl -H X-Dataverse-key:$API_TOKEN -H 'Accept: application/ld+json' "$SERVER_URL/api/datasets/$DATASET_ID/versions/$VERSION/metadata" | ||
|
||
Example 2: Get metadata for the latest version using the DATASET PID | ||
|
||
curl -H X-Dataverse-key:$API_TOKEN -H 'Accept: application/ld+json' "$SERVER_URL/api/datasets/:persistentId/metadata?persistentId=$DATASET_PID" | ||
|
||
You should expect a 200 ("OK") response and JSON-LD mirroring the OAI-ORE representation in the returned 'data' object. | ||
|
||
|
||
Add Dataset Metadata | ||
-------------------- | ||
|
||
To add json-ld formatted metadata for a Dataset, specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID). Adding '?replace=true' will overwrite an existing metadata value. The default (replace=false) will only add new metadata or add a new value to a multi-valued field. | ||
|
||
.. code-block:: bash | ||
|
||
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ||
export DATASET_ID='12345' | ||
export DATASET_PID='doi:10.5072/FK2A1B2C3' | ||
export VERSION='1.0' | ||
export SERVER_URL=https://demo.dataverse.org | ||
|
||
Example: Change the Dataset title | ||
|
||
curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"Title": "Submit menu test", "@context":{"Title": "http://purl.org/dc/terms/title"}}' "$SERVER_URL/api/datasets/$DATASET_ID/metadata?replace=true" | ||
|
||
Example 2: Add a description using the DATASET PID | ||
|
||
curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"citation:Description": {"dsDescription:Text": "New description"}, "@context":{"citation": "https://dataverse.org/schema/citation/","dsDescription": "https://dataverse.org/schema/citation/dsDescription#"}}' "$SERVER_URL/api/datasets/:persistentId/metadata?persistentId=$DATASET_PID" | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated. | ||
|
||
|
||
Delete Dataset Metadata | ||
----------------------- | ||
|
||
To delete metadata for a Dataset, send a json-ld representation of the fields to delete and specify the Dataset ID (DATASET_ID) or Persistent identifier (DATASET_PID). | ||
|
||
.. code-block:: bash | ||
|
||
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ||
export DATASET_ID='12345' | ||
export DATASET_PID='doi:10.5072/FK2A1B2C3' | ||
export VERSION='1.0' | ||
export SERVER_URL=https://demo.dataverse.org | ||
|
||
Example: Delete the TermsOfUseAndAccess 'restrictions' value 'No restrictions' for the latest version using the DATASET PID | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
curl -X PUT -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -d '{"https://dataverse.org/schema/core#restrictions":"No restrictions"}' "$SERVER_URL/api/datasets/:persistentId/metadata/delete?persistentId=$DATASET_PID" | ||
|
||
Note, this example uses the term URI directly rather than adding an '@context' element. You can use either form in any of these API calls. | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
You should expect a 200 ("OK") response indicating whether a draft Dataset version was created or an existing draft was updated. | ||
|
||
|
||
Create a Dataset | ||
---------------- | ||
|
||
Specifying the Content-Type as application/ld+json with the existing /api/dataverses/{id}/datasets API call (see :ref:`create-dataset-command`) supports using the same metadata format when creating a Dataset. | ||
|
||
With curl, this is done by adding the following header: | ||
|
||
.. code-block:: bash | ||
|
||
-H 'Content-Type: application/ld+json' | ||
pdurbin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. code-block:: bash | ||
|
||
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ||
export SERVER_URL=https://demo.dataverse.org | ||
export DATAVERSE_ID=root | ||
export PERSISTENT_IDENTIFIER=doi:10.5072/FK27U7YBV | ||
|
||
curl -H X-Dataverse-key:$API_TOKEN -H 'Content-Type: application/ld+json' -X POST $SERVER_URL/api/dataverses/$DATAVERSE_ID/datasets --upload-file dataset-create.jsonld | ||
|
||
An example jsonld file is available at :download:`dataset-create.jsonld <../_static/api/dataset-create.jsonld>` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,4 +35,5 @@ Developer Guide | |
big-data-support | ||
aux-file-support | ||
s3-direct-upload-api | ||
dataset-semantic-metadata-api | ||
workflows |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#metadataBlock name dataverseAlias displayName blockURI | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this migration.tsv? It isn't loaded by default on new installations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not required and is not auto-loaded. It's a bit of a complex situation: The RDA grant had, as one goal, the idea of being able to import Bags created by other repos. An expectation there is that the other repo (a Dataverse with different blocks or some other repo software entirely) could have metadata that doesn't fit into the receiving Dataverse's schema (including blocks). So - the metadataOnOrig field is a json structured field that can store any/all metadata that doesn't match a known field. So a transfer can be done without losing metadata. That said, putting metadata in this field makes it less useful than if there were a matching field. The current code will ignore metadata that doesn't match if this block is not installed and will use it if it is. I can explain that, or remove the code from the PR. For migration cases, I think even this limited functionality could be useful, but I suspect that most people using the 'experimental' API here won't want to enable it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was all non-obvious to me from a quick look at the code. If it's on the table to remove the metadataOnOrig functionality, I think we should. If you get other opinions that we should keep it, please document how it works. |
||
migration Migrated Metadata https://dataverse.org/schema/migration/ | ||
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id termURI | ||
metadataOnOrig Metadata on the original source of migrated datasets. textbox 1 FALSE FALSE FALSE FALSE FALSE FALSE migration https://dataverse.org/schema/core#metadataOnOrig | ||
#controlledVocabulary DatasetField Value identifier displayOrder |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
|
||
{ | ||
"http://purl.org/dc/terms/title": "Darwin's Finches", | ||
"http://purl.org/dc/terms/subject": "Medicine, Health and Life Sciences", | ||
"http://purl.org/dc/terms/creator": { | ||
"https://dataverse.org/schema/citation/author#Name": "Finch, Fiona", | ||
"https://dataverse.org/schema/citation/author#Affiliation": "Birds Inc." | ||
}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do I specify multiple authors? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An array of objects with the name/affiliation keys as the value of the creator key. This is the same way the OAI-ORE export shows multiple authors. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's what I figured. When I was working on the Schema.org JSON-LD output I was shocked by how loosey goosey JSON-LD is. At some point it might behoove us to add a little crash course on JSON-LD to the guides, or at least plenty of examples so that users get the hang of it. Otherwise, I think we can anticipate questions like "How do I specify multiple authors?" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having a human-readable term label (that probably isn't globally unique/machine interpretable) while still supporting strict machine readability adds some complexity, but its pretty straight forward to either never use an @context or to always use a standard/static one. Regardless, it wouldn't be hard to add documentation over time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 for more docs over time. I think we can live with what we have now, especially since it's experimental. |
||
"https://dataverse.org/schema/citation/Contact": { | ||
"https://dataverse.org/schema/citation/datasetContact#E-mail": "finch@mailinator.com", | ||
"https://dataverse.org/schema/citation/datasetContact#Name": "Finch, Fiona" | ||
}, | ||
"https://dataverse.org/schema/citation/Description": { | ||
"https://dataverse.org/schema/citation/dsDescription#Text": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds." | ||
}, | ||
"@type": [ | ||
"http://www.openarchives.org/ore/terms/Aggregation", | ||
"http://schema.org/Dataset" | ||
], | ||
"http://schema.org/version": "DRAFT", | ||
"http://schema.org/name": "Darwin's Finches", | ||
"https://dataverse.org/schema/core#fileTermsOfAccess": { | ||
"https://dataverse.org/schema/core#fileRequestAccess": false | ||
}, | ||
"http://schema.org/includedInDataCatalog": "Root" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capitalization seems inconsistent.
Contact and Description (title case) but also author, datasetContact and dsDescription (camel case)?
What are the rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These come from the citation.tsv block with the pattern <block name>/<field title> for single fields and <block name>/<field title>/<child field name> being used. That choice was made back when OAI_ORE/BagIt/archiving was introduced. The use if title was an attempt to use URIs that mirrored what users see in the UI. I'm less sure why I used name for child fields - not sure if there was a conflict or if it was an issue with originally just trying a flatter <blockname>/<childfield title> and realizing that there are multiple child fields with the title 'Name' for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Sounds like a preexisting condition to me. 😄 It would be nice to have more consistency but oh well. I assume we don't want to revisit decisions made during BagIt export.