Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As an installation admin, I want my repository to export OpenAIRE-compliant metadata to improve discoverability, reusability of research data #4257

Closed
jggautier opened this issue Nov 3, 2017 · 80 comments

Comments

@jggautier
Copy link
Contributor

jggautier commented Nov 3, 2017

@philippconzett (Dataverse Network Norway) wrote in https://groups.google.com/forum/#!msg/dataverse-community/lgSTeI-0zkQ/R7W8CfzvAAAJ:

The EU-sponsored research infrastructure project openAIRE aims to promote open scholarship and substantially improve the discoverability and reusability of research publications and data. Their guidelines have by now gained status as de-facto standards for OA research publication and data providers. In their Guidelines for Data Archives, they state i.a. what kind of metadata information research data archives should provide.
...
For us, and I guess for other Dataverse installations/users in Europe, compliance with the openAIRE guidelines is important. So, I wonder whether information about access and license(s) could be complemented in a new version?

@juancorr shared in another issue about adding DataCite metadata to the Export Metadata pulldown that Dataverse e-cienciaDatos

has expanded its DataCite metadata to be compliant with the European OpenAIRE guidelines (https://guidelines.openaire.eu/en/latest/data/index.html)...

If you want develop this feature we can collaborate.

The definition of done for this issue will be a Dataverse admin being able to have OpenAIRE harvest OpenAIRE-compliant metadata from her installation.

@juancorr
Copy link

juancorr commented Nov 3, 2017

We have use some ugly tricks to have the OpenAIRE compatibility because Dataverse has not all metadata that need OpenAIRE. You can see them in the file https://github.com/Consorcio-Madrono/dataverse/blob/v4.6WithOpenAIRE/src/main/resources/templates/datacite_40.ftl .

@pdurbin
Copy link
Member

pdurbin commented Nov 3, 2017

datacite_40.ftl

This .ftl file must be an FreeMarker file. I see the dependency has been added to the pom.xml at https://github.com/Consorcio-Madrono/dataverse/blob/025df77e0a25a8ad9221fec61925af88ed09053a/pom.xml#L57 . Perhaps this would be better discussed at https://groups.google.com/forum/#!forum/dataverse-dev (please feel free to start a thread there if you like, @juancorr ) but I'm curious about why you've introduced FreeMarker into your branch and if there is any alternative that's already part of the Java EE standard. I'm not trying to criticize. I'm just curious. I've never used FreeMarker.

@juancorr
Copy link

juancorr commented Nov 3, 2017

We have used the sbgrid code as base (https://github.com/sbgrid/sbgrid-dataverse/tree/feature/datacite-xml). We only have patched these code, the Dataverse code and adapted the inital sbgrid FreeMarker file to have a valid DataCite XML code and accomplish OpenAIRE guidelines.
It is the first time that I use a FreeMarker file too, but it is easily adaptable to accomplish other institutions requirements and to have special cases out of the java code. This works very well with e-cienciaDatos, but we have 12 datasets. We have not tested it in a large Dataverse installation.
Sorry, I have not enough experience with this files to discuss about it.

@pdurbin
Copy link
Member

pdurbin commented Nov 3, 2017

@juancorr oh! So you weren't the one to add the FreeMarker dependency. It's from the SBGrid branch. Thanks. I understand now.

@juancorr
Copy link

juancorr commented Nov 3, 2017

Yes, I had said it in my first comment in #3697 , but I should have emphasized it.

@abollini
Copy link
Contributor

Dear all,
I’m glad to announce that our proposal to enhance the interoperability of several open source platforms has been awarded by OpenAIRE, see https://www.4science.it/en/2018/02/23/4science-awarded-by-openaire/
In our proposal, we have included the implementation of the Data Repository Guidelines in Dataverse, more specifically the support for the datacite schema 4.1, to be ready for the new version of the guidelines that are expected soon.
We have just found this thread, I’m really happy to see our assumptions about the benefit of this development confirmed by the community and I will be happy to contribute to develop a general solution that works for all and hopefully can be included by default in a next Dataverse version

@pdurbin
Copy link
Member

pdurbin commented Feb 27, 2018

@abollini that's great news! Can you please also start a new thread about this at https://groups.google.com/forum/#!forum/dataverse-community to spread the word? Thanks!

@pdurbin
Copy link
Member

pdurbin commented Mar 2, 2018

@abollini thanks for posting https://groups.google.com/d/msg/dataverse-community/OALTzINxkX0/v_WwJ4cvAwAJ ! Also, I mentioned your proposal in the Dataverse Community News yesterday: https://groups.google.com/d/msg/dataverse-community/AlZHT6tQM3U/0RrMUOv1AgAJ

Next it would be great to get a shared understanding of what you think the pull request will look like, what the scope of change will be. To get on the same page literally, it would be nice to have a Google doc or similar for what you have in mind. For now I'm linking to this issued in the "Dev Efforts by the Dataverse Community" spreadsheet at https://docs.google.com/spreadsheets/d/1pl9U0_CtWQ3oz6ZllvSHeyB0EG1M_vZEC_aZ7hREnhE/edit?usp=sharing but please feel free to create new issues as needed if you want to divide the work into smaller chunks. In our experience, smaller chunks move more easily across our kanban board at https://waffle.io/IQSS/dataverse

In short, please let us know if there is anything you need!

@abollini
Copy link
Contributor

We have created a PR with the result of our development: #4664
we will be happy to receive feedback and improve it as needed

@pdurbin
Copy link
Member

pdurbin commented May 14, 2018

@abollini hi! Thanks for the pull request! I just advanced it to Code Review at https://waffle.io/IQSS/dataverse and left you a review.

@juancorr are you interested in giving a review as well?

@juancorr
Copy link

juancorr commented May 14, 2018 via email

@djbrooke
Copy link
Contributor

Great! Thanks @abollini and team for the PR, @pdurbin for the feedback, and @juancorr for taking a look!! :) I'll move this to Inbox column on our Waffle board for now, as it's a large PR there's already some feedback and community review offers.

@pdurbin
Copy link
Member

pdurbin commented May 23, 2018

@abollini any news? Are you blocked? Do you need anything? @juancorr and I have been chatting a bit in IRC if you'd like to join us some day. 😄

@pdurbin
Copy link
Member

pdurbin commented May 29, 2018

In 4b28306 I added "DataCite OpenAIRE" to the list of export formats. @djbrooke and I just spoke about how tests would be nice but they're tricky for external developers to write so I went ahead and moved this issue (and #3697) to QA.

@kcondon
Copy link
Contributor

kcondon commented May 30, 2018

I haven't begun testing yet but during a test deployment, found that OpenAire was not appearing in export list and this error is in server log: Could not find key "dataset.exportBtn.itemLabel.dataciteOpenAIRE" in bundle file.

@kcondon kcondon self-assigned this May 30, 2018
pdurbin added a commit to 4Science/dataverse that referenced this issue May 30, 2018
Also add as a format in the API Guide.
@kcondon
Copy link
Contributor

kcondon commented May 3, 2019

@fcadili Now that the personal nameType issues seem to be resolved I focused on testing organizational type names. I've found some issues here (see below) but also wondered what the logic for detecting organizational types was.

  1. Simple one or two word names are not identified as either type, except when contact:
<creators>
<creator>
<creatorName>IBM</creatorName>
</creator>
<creator>
<creatorName>Harvard University</creatorName>
</creator>
</creators>
<contributors>
<contributor contributorType="ContactPerson">
<contributorName nameType="Organizational">Harvard University</contributorName>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Organizational">IBM</contributorName>
</contributor>
<contributor>
<contributorName>IBM</contributorName>
</contributor>
<contributor>
<contributorName>Harvard University</contributorName>
</contributor>
</contributors>
  1. More complex, multi word names always are identified as personal and the extraction of first, last names is unusual in some cases:
<creator>
<creatorName nameType="Personal">The Institute for Quantitative Social Science</creatorName>
<givenName>The</givenName>
<familyName>Institute for Quantitative Social Science</familyName>
</creator>
<creator>
<creatorName nameType="Personal">Council on Aging</creatorName>
<givenName>on</givenName>
<familyName>ncil on Aging</familyName>
</creator>
<creator>
<creatorName nameType="Personal">The Ford Foundation</creatorName>
<givenName>The Ford</givenName>
<familyName>Foundation</familyName>
</creator>
<creator>
<creatorName nameType="Personal">
United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)
</creatorName>
<givenName>Asia the</givenName>
<familyName>
tions Economic and Social Commission for Asia and the Pacific (UNESCAP)
</familyName>
</creator>
<creator>
<creatorName nameType="Personal">Michael J. Fox Foundation for Parkinson's Research</creatorName>
<givenName>Michael Fox</givenName>
<familyName>ox Foundation for Parkinson's Research</familyName>
</creator>

<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">Council on Aging</contributorName>
<givenName>on</givenName>
<familyName>ncil on Aging</familyName>
<affiliation>Dataverse.org</affiliation>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">The Institute for Quantitative Social Science</contributorName>
<givenName>The</givenName>
<familyName>Institute for Quantitative Social Science</familyName>
<affiliation>dataverse@mailinator.com</affiliation>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">The Ford Foundation</contributorName>
<givenName>The Ford</givenName>
<familyName>Foundation</familyName>
<affiliation>dataverse@mailinator.com</affiliation>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">
United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)
</contributorName>
<givenName>Asia the</givenName>
<familyName>
tions Economic and Social Commission for Asia and the Pacific (UNESCAP)
</familyName>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">Michael J. Fox Foundation for Parkinson's Research</contributorName>
<givenName>Michael Fox</givenName>
<familyName>ox Foundation for Parkinson's Research</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">
United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)
</contributorName>
<givenName>Asia the</givenName>
<familyName>
tions Economic and Social Commission for Asia and the Pacific (UNESCAP)
</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">The Institute for Quantitative Social Science</contributorName>
<givenName>The</givenName>
<familyName>Institute for Quantitative Social Science</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">Council on Aging</contributorName>
<givenName>on</givenName>
<familyName>ncil on Aging</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">
United Nations Economic and Social Commission for Asia and the Pacific (UNESCAP)
</contributorName>
<givenName>Asia the</givenName>
<familyName>
tions Economic and Social Commission for Asia and the Pacific (UNESCAP)
</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">The Ford Foundation</contributorName>
<givenName>The Ford</givenName>
<familyName>Foundation</familyName>
</contributor>
<contributor>
<contributorName nameType="Personal">Michael J. Fox Foundation for Parkinson's Research</contributorName>
<givenName>Michael Fox</givenName>
<familyName>ox Foundation for Parkinson's Research</familyName>
</contributor>
</contributors>

@kcondon kcondon removed their assignment May 3, 2019
@fcadili
Copy link

fcadili commented May 6, 2019

I released a new version which fixed Organizational nameType. I've checked all @kcondon examples and they work fine with the new version.

@djbrooke
Copy link
Contributor

djbrooke commented May 6, 2019

Thanks @fcadili - we'll review this and QA it.

@kcondon
Copy link
Contributor

kcondon commented May 6, 2019

@fcadili Thanks for fixing these. I've retested the org cases and they all work. We did test a few more actual organizations depositing on Dataverse and found a few that did not work. Is this because of the characters, ',' and '-'? Should we look for more examples?

Here are the failing cases:

1.Some org creators with nameType as personal, some with no nameType:
<creator>
<creatorName nameType="Personal">
Digital Archive of Massachusetts Anti-Slavery and Anti-Segregation Petitions, Massachusetts Archives, Boston MA
</creatorName>
</creator>
<creator>
<creatorName nameType="Personal">
U.S. Department of Commerce, Bureau of the Census, Geography Division
</creatorName>
</creator>
<creator>
<creatorName>Harvard Map Collection, Harvard College Library</creatorName>
<affiliation>Harvard University</affiliation>
</creator>
<creator>
<creatorName>Geographic Data Technology, Inc. (GDT)</creatorName>
</creator>
  1. Some contact fields with nameType as personal:
<contributorName nameType="Personal">
Digital Archive of Massachusetts Anti-Slavery and Anti-Segregation Petitions, Massachusetts Archives, Boston MA
</contributorName>
</contributor>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">
U.S. Department of Commerce, Bureau of the Census, Geography Division
</contributorName>
</contributor>
  1. Some contributor with nameType as personal, some no type:
<contributor contributorType="DataCollector">
<contributorName nameType="Personal">
Digital Archive of Massachusetts Anti-Slavery and Anti-Segregation Petitions, Massachusetts Archives, Boston MA
</contributorName>
</contributor>
<contributor contributorType="DataCollector">
<contributorName nameType="Personal">
U.S. Department of Commerce, Bureau of the Census, Geography Division
</contributorName>
</contributor>
<contributor contributorType="DataCollector">
<contributorName>Harvard Map Collection, Harvard College Library</contributorName>
</contributor>
</contributors>
<contributor>
<contributorName>Geographic Data Technology, Inc. (GDT)</contributorName>a
</contributor>

I think if these are fixed then we will be in good shape. Sorry I did not see these sooner. We'll discuss briefly with @djbrooke and @jggautier tomorrow morning to confirm.

@fcadili
Copy link

fcadili commented May 7, 2019

I released a new version which fixes Operational nameType when the name contains commas or dashes. I've checked the latest @kcondon examples and they work fine with the new version.

@kcondon
Copy link
Contributor

kcondon commented May 7, 2019

@fcadili @jggautier @djbrooke
I've retested the latest test cases and all are working now. I think this shows both org and personal nameType are functioning reasonably well. I'm ok to merge unless we want a more comprehensive test such as looking at all orgs in prod. I think there was a plan to iterate once we gain more experience in prod?

@djbrooke
Copy link
Contributor

djbrooke commented May 7, 2019

Thanks @kcondon, I'll briefly check in with @jggautier and give the go ahead here in a few.

Thanks @fcadili for the great responsiveness and quick fixes on this!

@fcadili
Copy link

fcadili commented May 7, 2019

Thank you.

@djbrooke
Copy link
Contributor

djbrooke commented May 7, 2019

@kcondon merge away!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants