Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datacite metadata in schema.org/JSON-LD format to dataset page via DataCite service #3793

Closed
scolapasta opened this issue Apr 25, 2017 · 13 comments

Comments

@scolapasta
Copy link
Contributor

scolapasta commented Apr 25, 2017

While we'll probably want to cook our own schema.org/JSON-LD support, (in that way, handles and other persistent identifier services would also be supported), a simplified way to get something in sooner would be to use DataCite service. From Martin at DataCite:

one of the recommendations for repositories in our DCIP repositories preprint is for landing pages to include citation metadata in schema.org/JSON-LD format. I have now (almost) completed two steps that will make this much easier for DataCite members:
we will soon relaunch DataCite content negotiation with much better support for schema.org. The service is now in beta, and an example with a Dataverse DOI would be https://data.test.datacite.org/application/vnd.schemaorg.ld+json/10.7910/DVN/DA1WTL
I have written a little javascript that fetches the schema.org metadata and embeds them in a page via a <script> tag: https://github.com/crosscite/doi-metadata-search/blob/master/public/javascripts/schemaorg.js. We do this now at DataCite Search, but every data center could do exactly the same on a landing page. The script gets the DOI from the page via a "DC.identifier" meta tag, another one of our recommendations in the preprint.
I think the above is a nice story to tell in our webinar in May, and in the data citation course in San DIego in August. And maybe Dataverse is interested in implementing this. An example of richer metadata, including a link to the associated article (relation_type isSupplementTo), would be https://data.test.datacite.org/application/vnd.schemaorg.ld+json/10.5517/CCDC.CSD.CC1NSQCJ.

@jggautier
Copy link
Contributor

jggautier commented Apr 26, 2017

Related to #2243, being a quicker, less inclusive solution to adding schema.org metadata to most dataset landing pages

@jggautier
Copy link
Contributor

The script gets the DOI from the page via a "DC.identifier" meta tag

@scolapasta Now that dataset pages will have these meta tags in 4.7, could we use DataCite's service as a first step toward getting metadata in schema.org/JSON-LD format in dataset pages?

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

The service is now in beta, and an example with a Dataverse DOI would be https://data.test.datacite.org/application/vnd.schemaorg.ld+json/10.7910/DVN/DA1WTL

@mfenner the URL above now returns "500 Internal Server Error". Is there a working example somewhere? Thanks!

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

@jggautier found https://data.datacite.org/application/vnd.schemaorg.ld+json/10.7910/dvn/icfngt as an example from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ICFNGT . Thanks!!

@jggautier also explained that the idea may be to download the JSON above from DataCite and simply present it for download from Dataverse. I'll play around with it.

@jggautier
Copy link
Contributor

Thanks Phil! (How did you know to remove the "test" from that URL?)

The idea is to take the JSON from DataCite and plug it into the header of dataset html, like the metatags with the Dublin Core metadata.

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

The idea is to take the JSON from DataCite and plug it into the header of dataset html

Ok, so I shouldn't make the "schema.org/JSON-LD" metadata available via the Export button like this which seems to be what #3700 is about:

screen shot 2017-10-31 at 10 55 52 am

What tools or services will consume the JSON in the header of the dataset HTML?

Also, how is this issue different than #2243?

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

What tools or services will consume the JSON in the header of the dataset HTML?

I think I found a good example:

https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fsearch.datacite.org%2Fworks%2F10.7910%2Fdvn%2Ficfngt

screen shot 2017-10-31 at 11 11 00 am

In short, I think I could try to use this as a model: https://search.datacite.org/works/10.7910/dvn/icfngt

@csarven
Copy link

csarven commented Oct 31, 2017

http://rdf.greggkellogg.net/distiller can probably check for JSON-LD in HTML. See also http://osds.openlinksw.com/

@borsna
Copy link

borsna commented Oct 31, 2017

What tools or services will consume the JSON in the header of the dataset HTML?

@pdurbin crawlers like google, bing, yahoo etc will look for it in the header. not sure if having it as an export target is necessary.

Its not different to have it inline (suggested in #2243) or declared as json-ld in the header.

The dateset json-ld example looks like a good example 👍

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

@pdurbin
Copy link
Member

pdurbin commented Oct 31, 2017

@jggautier I have metadata questions for you. Please see 1b62596 and the screenshot below:

screen shot 2017-10-31 at 2 57 57 pm

@jggautier
Copy link
Contributor

jggautier commented Oct 31, 2017

Thanks for stopping by Phil! From our brief discussion, it sounds like you're not using the DataCite service, which @scolapasta suggested only as an intermediate step, but writing code so that the application can transform the metadata itself, what we wanted in the end. Should we move this discussion to #2243 and close this issue?

pdurbin added a commit that referenced this issue Nov 1, 2017
- only show published versions
- show URL to DOI dynamically (was hard coded)
- show publication date
- show correct publisher
- show correct provider
@pdurbin
Copy link
Member

pdurbin commented Nov 1, 2017

At standup we decided we're doing #2243 rather than this issue. Closing.

@pdurbin pdurbin closed this as completed Nov 1, 2017
xibriz added a commit to uit-no-old/dataverse that referenced this issue Dec 4, 2017
commit e19a346
Author: Ruben Andreassen <rubean85@gmail.com>
Date:   Mon Dec 4 12:20:54 2017 +0100

    Forgot username

commit 0d478a7
Merge: 45288aa 8aa4150
Author: Ruben Andreassen <rubean85@gmail.com>
Date:   Mon Dec 4 10:56:10 2017 +0100

    Merge dataporten into 4334-oauth-dataporten

commit 45288aa
Merge: caf6371 4648b6a
Author: Ruben <rubean85@gmail.com>
Date:   Fri Dec 1 14:45:44 2017 +0100

    Merge pull request #1 from IQSS/develop

    test

commit 4648b6a
Merge: 0f36aa0 fff836c
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Thu Nov 30 18:44:35 2017 -0500

    Merge pull request IQSS#4331 from IQSS/4330-no-affiliation

    add null check for datasetAuthor.getAffiliation() IQSS#4330

commit fff836c
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 30 16:39:26 2017 -0500

    add null check for datasetAuthor.getAffiliation() IQSS#4330

commit 0f36aa0
Merge: e2878ce fad8669
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Thu Nov 30 15:07:54 2017 -0500

    Merge pull request IQSS#4325 from IQSS/4324-header-padding

    Fixed padding layout issue with dataverse name text link in header IQSS#4324

commit fad8669
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Thu Nov 30 10:14:53 2017 -0500

    Fixed padding layout issue with dataverse name text link in header. [ref IQSS#4324]

commit e2878ce
Merge: d785c5c cb9647f
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Wed Nov 29 18:22:53 2017 -0500

    Merge pull request IQSS#4305 from IQSS/4304-navbar-search

    use "?" (`&IQSS#63;`) rather than "&" (`&IQSS#38;`) before "q" IQSS#4304

commit d785c5c
Merge: a881f36 3cc02d0
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Wed Nov 29 18:19:25 2017 -0500

    Merge pull request IQSS#4302 from IQSS/3700-export-schema.org

    implement export of schema.org JSON-LD IQSS#3700

commit 3cc02d0
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 29 12:53:04 2017 -0500

    have dataset page get cached JSON-LD, if available IQSS#3700

commit 84224bd
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 29 12:45:53 2017 -0500

    guard against null terms.getTermsOfUse() IQSS#3700

commit ba9c6bd
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 29 12:28:16 2017 -0500

    API: document "schema.org" as a supported export format IQSS#3700

commit e5c2528
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 29 12:11:17 2017 -0500

    capitalize Schema.org in guides IQSS#3700

commit 086824d
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 29 10:57:32 2017 -0500

    note that we know "affliation" throws a warning IQSS#3700

commit a881f36
Merge: b20ab14 23b865c
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Tue Nov 28 16:28:04 2017 -0500

    Merge pull request IQSS#4312 from IQSS/4197-bundle-error

    Fixed bundle reference to "parent" dataverse for Theme + Widget pg IQSS#4197

commit 34859e7
Merge: 2f278cc b20ab14
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 28 16:24:56 2017 -0500

    Merge branch 'develop' into 3700-export-schema.org IQSS#3700

commit 23b865c
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Tue Nov 28 14:42:12 2017 -0500

    Fixed bundle reference to "parent" dataverse for Theme + Widget pg. [ref IQSS#4197]

commit b20ab14
Merge: caf6371 8e6354a
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Tue Nov 28 14:01:39 2017 -0500

    Merge pull request IQSS#4277 from IQSS/4197-dv-header

    4197 dv header

commit 8e6354a
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Tue Nov 28 13:23:15 2017 -0500

    Changed references from "customization" to "theme" in Theme + Widgets pg. [ref IQSS#4197]

commit c312a85
Author: Derek Murphy <dlmurphy@g.harvard.edu>
Date:   Tue Nov 28 13:05:39 2017 -0500

    Doc rewrites [IQSS#4197]

    Rewrote some text on the config page for clarity, changed terminology
    usage in dataverse management page to make it more consistent

commit f68b81d
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Tue Nov 28 12:15:40 2017 -0500

    Removed commented out theme logic found in QA. [ref IQSS#4197]

commit 624922f
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 28 11:09:26 2017 -0500

    when adding row to dataversetheme, use white instead of gray IQSS#4197

commit cb9647f
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 27 10:27:30 2017 -0500

    use "?" (&IQSS#63;) rather than "&" (&IQSS#38;) before "q" IQSS#4304

commit d8028f1
Merge: 36d9228 caf6371
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 27 09:33:03 2017 -0500

    Merge branch 'develop' into 4197-dv-header IQSS#4197

commit 2f278cc
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 22 12:33:56 2017 -0500

    cleanup IQSS#3700

commit b00d4d6
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 22 12:28:25 2017 -0500

    capitalize "Schema.org" IQSS#3700

commit 8f52663
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 22 11:06:41 2017 -0500

    implement export of schema.org JSON-LD IQSS#3700

commit caf6371
Merge: c67a39f d80b9d1
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Tue Nov 21 16:29:07 2017 -0500

    Merge pull request IQSS#4297 from IQSS/orcid_v21

    orcid v2.1 changes (mainly https for profile page link)

commit c67a39f
Merge: 0918fae a756751
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Mon Nov 20 15:48:37 2017 -0500

    Merge pull request IQSS#4252 from IQSS/2243-schema.org-json-ld

    2243 schema.org json ld

commit d80b9d1
Author: Pete Meyer <pameyer@crystal.harvard.edu>
Date:   Mon Nov 20 14:32:09 2017 -0500

    orcid v2.1 changes (mainly https for profile page link)

commit 0918fae
Merge: 3013c0d dcfcbaf
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Mon Nov 20 14:31:41 2017 -0500

    Merge pull request IQSS#4276 from IQSS/4250-ingest-failed

    make it clear that file upload is complete IQSS#4250

commit 3013c0d
Merge: b4cea62 3f0f7e8
Author: kcondon <kcondon@hmdc.harvard.edu>
Date:   Mon Nov 20 14:21:37 2017 -0500

    Merge pull request IQSS#4275 from IQSS/4262-describe-method

    move `describe` from EjbDataverseEngine to Command interface IQSS#4262

commit 36d9228
Merge: d612189 b4cea62
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 17 16:38:34 2017 -0500

    Merge branch 'develop' into 4197-dv-header IQSS#4197

commit dcfcbaf
Merge: 268c3dc b4cea62
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 17 16:36:21 2017 -0500

    Merge branch 'develop' into 4250-ingest-failed IQSS#4250

commit 3f0f7e8
Merge: 633a19d b4cea62
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 17 16:33:37 2017 -0500

    Merge branch 'develop' into 4262-describe-method IQSS#4262

commit a756751
Merge: eec1163 b4cea62
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 17 16:32:43 2017 -0500

    Merge branch 'develop' into 2243-schema.org-json-ld IQSS#2243

    Conflicts (just imports:
    src/main/java/edu/harvard/iq/dataverse/DatasetPage.java

commit eec1163
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Fri Nov 17 15:58:38 2017 -0500

    Per conversation with jgautier stipped the '@type="person"' attribute in the author fragment;
    since it can be a person or an organization; this results in a warning from google validation tool
    (because "Thing" is not supposed to have an affiliation) but it appears to be ok to live with it.

commit 0801d56
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Fri Nov 17 15:36:04 2017 -0500

    ldjson should will only be embedded into the page if this is the LATEST PUBLISHED version (IQSS#2243)

commit a2742c5
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Fri Nov 17 15:08:40 2017 -0500

    latest changest to ld json formatting, making the fragment pass the google validation tool test. (IQSS#2243)

commit d612189
Author: Derek Murphy <dlmurphy@g.harvard.edu>
Date:   Fri Nov 17 13:01:55 2017 -0500

    Docs: extremely nitpicky word change [IQSS#4197]

    Changed a couple words in the config page.

commit d277669
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Thu Nov 16 16:21:29 2017 -0500

    Added tip to Installation Guide > Configuration > Custom Header related to disable root theme. [ref IQSS#4197]

commit 80219c5
Author: Derek Murphy <dlmurphy@g.harvard.edu>
Date:   Thu Nov 16 11:43:59 2017 -0500

    Syntax + typo fix

    Small edit, fixed a typo and a syntax error in (ironically) a header in
    the docs

commit e0399c1
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Wed Nov 15 19:50:54 2017 -0500

    ...and a quick fix for the "temporalCoverage" entry (IQSS#2243)

commit 67882ff
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Wed Nov 15 19:41:05 2017 -0500

    the ld json fragment should now be structured as specified in the issue IQSS#2243.

commit 8b8391f
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Wed Nov 15 13:24:22 2017 -0500

    added topicClassifications and kewords to JSONLD. (IQSS#2243)

commit 28f705c
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 15 12:58:11 2017 -0500

    implement :DisableRootDataverseTheme db setting IQSS#4197

commit 268c3dc
Author: Michael Heppler <mheppler@hmdc.harvard.edu>
Date:   Wed Nov 15 12:54:50 2017 -0500

    Revised ingest error popover message text. Fixed icon spacing issue. [ref IQSS#4250]

commit 7cd2fea
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 15 12:01:57 2017 -0500

    Revert "stub out UI for disabling root dataverse theme IQSS#4197 "

    This reverts commit b9c3c56.

    We're going to use a database setting instead.

commit b9c3c56
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 15 08:53:36 2017 -0500

    stub out UI for disabling root dataverse theme IQSS#4197

commit 1f938e9
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 15 08:18:25 2017 -0500

    Revert "only show header for non-root dataverses IQSS#4197 "

    This reverts commit 8eccacd.

commit 633a19d
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 14 19:02:10 2017 -0500

    affectedDvObjects is a better name for this field IQSS#4262

commit 9a3f4a3
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 14 17:10:06 2017 -0500

    add the role to the message IQSS#4262

commit 7cfc8ba
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 14 10:09:18 2017 -0500

    override `describe` in AssignRoleCommand IQSS#4262

commit 023cb8f
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 13 16:09:43 2017 -0500

    remove parameters since the Command has them IQSS#4262

commit 8eccacd
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 13 15:52:37 2017 -0500

    only show header for non-root dataverses IQSS#4197

commit 7795e70
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 13 15:22:08 2017 -0500

    change header background from gray to white IQSS#4197

commit e434dd0
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 13 14:28:23 2017 -0500

    make it clear that file upload is complete IQSS#4250

commit 26eb11d
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Mon Nov 13 14:18:57 2017 -0500

    move `describe` from EjbDataverseEngine to Command interface IQSS#4262

commit 7d03e70
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Nov 7 16:21:37 2017 -0500

    consistency between DC.subject and JSON-LD keywords IQSS#2243

commit 9f1d057
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Mon Nov 6 21:58:32 2017 -0500

    one more addition for IQSS#2243 - added temporalCoverage.

commit 8c74e37
Author: Leonid Andreev <leonid@hmdc.harvard.edu>
Date:   Mon Nov 6 21:28:06 2017 -0500

    A few quick fixes for getJsonLd() (and the corresponding test in DatasetVersionTest());
    (ref IQSS#2243)

commit c941781
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 3 12:21:12 2017 -0400

    explain why ui:insert lines are in the template IQSS#2243

commit 1aa323a
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 3 12:20:52 2017 -0400

    remove unused imports used in this branch IQSS#2243

commit f8ca59f
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 3 12:13:05 2017 -0400

    add tests for getJsonLd and getPublicationDateAsString IQSS#2243

commit b1db8ee
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 3 11:26:37 2017 -0400

    rename to publicationDateAsString and improve javadoc IQSS#2243

commit 8f3083c
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Fri Nov 3 11:14:13 2017 -0400

    delete cruft (unused method) IQSS#2243

commit 6c5f044
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 2 15:41:12 2017 -0400

    use dateModified and proper schemaVersion URL IQSS#2243

commit 171c8f3
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 2 15:29:35 2017 -0400

    move getJsonLd method to DatasetVersion entity IQSS#2243

commit 485a5ca
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 2 15:25:37 2017 -0400

    don't even try to figure out if the author is a person or not IQSS#2243

commit 80b5a88
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 2 15:19:49 2017 -0400

    limit to non-published, not just non-drafts IQSS#2243

    Also add helper method.

commit ad71c6a
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Thu Nov 2 15:17:32 2017 -0400

    use same date format as meta name="DC.date" IQSS#2243

commit 2cc958d
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 1 13:30:15 2017 -0400

    fix a number of issues (listed below) IQSS#3793 IQSS#2243

    - only show published versions
    - show URL to DOI dynamically (was hard coded)
    - show publication date
    - show correct publisher
    - show correct provider

commit 5ad88fc
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Wed Nov 1 13:15:00 2017 -0400

    better author name parsing (could be an org!) IQSS#3793 IQSS#2243

commit 1b62596
Author: Philip Durbin <philip_durbin@harvard.edu>
Date:   Tue Oct 31 14:57:01 2017 -0400

    stub out dataset in json-ld format IQSS#3793
@pdurbin pdurbin removed their assignment Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants