Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize license configuration #9262

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

philippconzett
Copy link
Contributor

What this PR does / why we need it:
This PR is part of the work to make license information from Dataverse installations harvestable in a way that complies with DataCite recommendations as described in issue #8512 Standardize standard license configuration.

Which issue(s) this PR closes:
Together with other PRs, this PR closes #8512.

Special notes for your reviewer:
See my comment from January 5, 2023 to issue #8512.

Suggestions on how to test this:
In a test environmen:

  1. Install the JSON license files.
  2. Create a dataset.
  3. Select a standard license.
  4. Export DataCite metadata.
  5. Verify that the license information is as recommended by DataCite.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.

Is there a release notes update needed for this change?:
Yes. See my comment from January 5, 2023 to issue #8512.

Additional documentation:
The longer we wait implementing this change, the more installations will have to do clean-ups. Therefore, I hope this PR can be prioritized.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look. I like this data-driven approach. Create the JSON files first, fix the code later. However, the code need to be fixed. And the existing licenses need to be moved into new database columns that need to be added as part of this PR. I'm not sure what size to give this. Due to uncertainty, maybe an 80 (56 hours).

"schemeURI": "https://spdx.org/licenses/",
"rightsShortDescription": "Creative Commons Attribution 4.0 International.",
"rightsIconUrl": "https://licensebuttons.net/l/by/4.0/88x31.png",
"rightsActive": "true",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not surprisingly edu.harvard.iq.dataverse.api.Licenses.addLicense is failing because the field names have changed. In this stack trace, for example, the code is looking for uri rather than rightsURI: addlicense.txt

This pull request is great for showing a future direction, but the code needs to be adjust to add new fields to the database and we'd need to write SQL upgrade scripts to migrate the old licenses into the new fields.

@@ -0,0 +1,10 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philippconzett can you please add a release note for this PR? (I'm just adding a comment here because it's the first line under "files changed".

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docs as well.

@@ -0,0 +1,10 @@
{
"rightsName": "Apache 2.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docs, we should add all these new licenses to the list at https://github.com/philippconzett/dataverse/blob/8512-standardize-license-configuration/doc/sphinx-guides/source/installation/config.rst#adding-licenses so they are easily downloadable to installations that would like to use them.

@pdurbin
Copy link
Member

pdurbin commented Jan 5, 2023

@philippconzett hi! Just checking if you're open to adding additional JSON files (additional licenses) to this PR. Please see some comments from earlier today starting here: IQSS/dataverse.harvard.edu#193 (comment)

Alternatively, we could add them later, of course, in a new PR.

@mreekie mreekie added the Size: Queued PM has called this issue out specifically for sizing label Jan 23, 2023
Changed licenseURI to the be the SPDX URI, as requested in issue IQSS#9302.
@pdurbin pdurbin added Size: 80 A percentage of a sprint. 56 hours. and removed Size: Queued PM has called this issue out specifically for sizing labels Feb 28, 2024
@pdurbin pdurbin changed the title 8512 standardize license configuration standardize license configuration Mar 27, 2024
@pdurbin
Copy link
Member

pdurbin commented Apr 11, 2024

JP and I just wrote some guidance on adding licenses the future: #10426 (comment)

Please take a look and let us know what you think!

@jmurugan-fzj
Copy link

jmurugan-fzj commented May 23, 2024

@pdurbin Where can I find the up-to-date list of licenses which could be used for the dataset creation?
More specifically, what all values can be provided in the metadata "name" field below while creating the data set in dataverse
image

I tried "CC0 1.0" & "CC BY 4.0", both of them succeed, whereas the other versions for e.g. "CC BY 3.0" simply fails with the following server error: Reason: Bad Request, Info: {'status': 'ERROR', 'message': 'Error parsing Json: Invalid or unsupported license: CC BY 3.0'}

At-least it would be nice to see the list of presently supported values for "name" field, is this documented somewhere? I looked in the help pages, but could not find it anywhere quickly: complete list of presently supported license names!

Also I have noticed that the value set for "uri" field is not considered at all during the dataset creation, even if I give some wrong URI or invalid value too, the dataset is created with the license specified in the name field and appropriate URI is automatically populated from the name, is this a bug?

Metadata reference: dataset-create-new-all-default-fields.json

@pdurbin
Copy link
Member

pdurbin commented May 23, 2024

@jmurugan-fzj hi! Now that the following #10426 has been merged, there is new guidance coming in the next release (probably 6.3) on what to put for "name", "uri" etc. Here's a screenshot from a preview at https://dataverse-guide--10426.org.readthedocs.build/en/10426/installation/config.html#contributing-to-the-collection-of-standard-licenses-above

Screenshot 2024-05-23 at 1 33 01 PM

Does that help?

@qqmyers
Copy link
Member

qqmyers commented May 23, 2024

FWIW: /api/licenses tells you which licenses are installed and active (ones you can select for a new/draft dataset) for a give Dataverse installation.

@jp-tosca
Copy link
Contributor

jp-tosca commented May 23, 2024

Hi @jmurugan-fzj, regarding your 3 points:

  1. This message is a bit unclear, sorry about that. This is not a parsing error but indeed this happens because the license is not installed.
  2. As @qqmyers mentioned, there is an API to check the licenses on the installation so that should eliminate these errors.
  3. The URI parameter as documented on the code:

// If uri is provided, we'll try that first. This is an easier lookup
// method; the uri is always the same. The name may have been customized
// (translated) on this instance, so we may be dealing with such translated
// name, if this is exported json that we are processing. Meaning, unlike
// the uri, we cannot simply check it against the name in the License
// database table.

So it should be prioritized over the name to search for the license but it seems optional. Please feel free to open an issue if you think this has a different behavior. Let me know if you have any questions or if we can provide any additional help.

@jmurugan-fzj
Copy link

jmurugan-fzj commented May 24, 2024

/api/licenses

@qqmyers Thanks for the information, I see that more details available here in native API page

I have one more question related to the license support: We have an older version: 4.20 installed in our institute and does not seem to support this particular endpoint:

Presently I use "CC0 1.0" for creating the datasets, but the license information is populated as "None" in the created metadata: e.g. sample created dataset metadata
image
So I assume the license management is not supported at all in version 4.2?

Do you know from which version on wards is the license terms management properly supported? We would like to have the license terms also properly documented along with our datasets. If you could tell me a particular version, this will be really helpful and I can ask the administration here to upgrade to that particular one or above.

@jmurugan-fzj
Copy link

@jmurugan-fzj hi! Now that the following #10426 has been merged, there is new guidance coming in the next release (probably 6.3) on what to put for "name", "uri" etc. Here's a screenshot from a preview at https://dataverse-guide--10426.org.readthedocs.build/en/10426/installation/config.html#contributing-to-the-collection-of-standard-licenses-above

Screenshot 2024-05-23 at 1 33 01 PM

Does that help?

@pdurbin Thank you very much, this is exactly what I was looking for and very helpful (y)

@jmurugan-fzj
Copy link

Hi @jmurugan-fzj, regarding your 3 points:

  1. This message is a bit unclear, sorry about that. This is not a parsing error but indeed this happens because the license is not installed.
  2. As @qqmyers mentioned, there is an API to check the licenses on the installation so that should eliminate these errors.
  3. The URI parameter as I documented on the code

// If uri is provided, we'll try that first. This is an easier lookup
// method; the uri is always the same. The name may have been customized
// (translated) on this instance, so we may be dealing with such translated
// name, if this is exported json that we are processing. Meaning, unlike
// the uri, we cannot simply check it against the name in the License
// database table.

So it should be prioritized over the name to search for the license but it seems optional. Please feel free to open an issue if you think this has a different behavior. Let me know if you have any questions or if we can provide any additional help.

@jp-tosca Thanks again for the detailed response, sorry for being unclear in my previous question, I was not aware of the fact that the licenses need to be added before creating the datasets, with the help page links provided above, now the license terms usage is more clear to me :)

Regarding the URI field, I don't see this really as a critical issue at the moment, now that I am aware of the endpoint:/api/licenses, I can even restrict the possible values for the "name" & "url" field, so all good! Thanks again for the quick response and support (y)

@qqmyers
Copy link
Member

qqmyers commented May 24, 2024

@jmurugan-fzj - support for multiple licenses was added in v5.10 and there's a v5.10.1 that is probably the minimum you'd want. That said, we strongly recommend keeping current.

@jmurugan-fzj
Copy link

@jmurugan-fzj - support for multiple licenses was added in v5.10 and there's a v5.10.1 that is probably the minimum you'd want. That said, we strongly recommend keeping current.

@qqmyers That's very helpful, let me discuss with the team here and I would also like to go for the latest version, Thanks again :)

@pdurbin pdurbin added the Type: Feature a feature request label Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 80 A percentage of a sprint. 56 hours. Type: Feature a feature request
Projects
Status: No status
Status: Interesting/To keep an eye on
Status: 🔍 Interest
Development

Successfully merging this pull request may close these issues.

Feature Request/Idea: Standardize standard license configuration
6 participants