Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Storing label metadata for DataCite compatibility #17

Open
bondjimbond opened this issue Aug 11, 2022 · 3 comments
Open

Discussion: Storing label metadata for DataCite compatibility #17

bondjimbond opened this issue Aug 11, 2022 · 3 comments

Comments

@bondjimbond
Copy link

bondjimbond commented Aug 11, 2022

Local Contexts and DataCite Canada have been working together on a strategy to include TK Label metadata in DOI metadata. If TK Labels in Islandora 2 are to support DOIs (and I believe they should), this will require a lot more functionality to be built -- perhaps a submodule or a separate but related module, with a name like tk_labels_datacite_integration.

Background

As described in #7, the labels in Local Contexts Projects are subject to change, which is why the best approach to displaying labels is to fetch them dynamically from the API. They could be updated at any time, so by fetching them from the API on viewing, the labels displayed are always current. Making labels available to DataCite adds a new wrinkle to this.

DataCite Canada and Local Contexts are working on including TK Label metadata in DOI records. The strategy they agreed on requires that repositories using TK labels and minting DOIs include the individual label info in the metadata records they send to DataCite (not just the Project ID, although that should be included as well).

Implications

While this module's approach is only about displaying labels when viewing objects, DataCite requires that we both store the label metadata so it can be transferred to DataCite when a DOI is minted, and that we update label metadata when it changes, so that the DOI info can also be updated.

The main problems with that requirement which need to be solved are (1) storing metadata that may be out of date (and certainly would be out of date at the time of creation, as communities may take time to create/customize their labels in the Local Contexts Hub), and (2) discovering when updates are made in the Local Contexts Hub in an automated way, so that this metadata may (also automatically) be updated and sent to DataCite.

I see two options for how this could be managed... I'm thinking option 2 is the best approach.

Option 1: Save Label metadata on Repository Item

With this approach, TK Label metadata needs to be stored on the Drupal node alongside all the other fields, not just the Project ID. This could be accomplished with a function that queries the API when the metadata form is saved -- send the Project ID to the Local Contexts API, and write the retrieved Label metadata to the appropriate Drupal fields. When the metadata form is modified and saved again, it could (should) overwrite the existing Label metadata with new information.

If the object has a DOI, a DOI helper module (separate project) would then send the updated metadata to DataCite as normal.

Problems:

  1. This would require some deep integration with the Repository Item content type, in terms of field types and IDs. If the module created new fields, that could cause problems for maintaining systems in automated ways. There would have to be a lot of consultation with the community about how best to do this.

  2. We have no way of knowing when a label gets updated in LocalContexts unless we look for it. That would require some kind of regular scan on a cron job to check for label updates and rewrite the node metadata, which would create a fair bit of overhead in the system and possibly require some set-up that a repository administrator may not know how to do.

    • It may be possible to leverage the current functionality in this module and, when an object is viewed, compare its stored metadata to the metadata that is retrieved from the API -- if the metadata is different, that would trigger an update. This may or may not be feasible, but if it were, updates would then still rely on people viewing the object.

Option 2: Leave tk_labels alone, and add TK Labels functionality to a DOI module

Instead of saving the labels as metadata on the Repository Item node, leave this module as a Badge-Display-Only module with only the Project ID field to worry about. Instead, create a new helper module for a DOI minting module (e.g. https://github.com/mjordan/doi_datacite by @mjordan or the datacite submodules in https://github.com/roblib/islandora_rdm by @alxp and Alan Stanley). This new module would hook into the functions that send metadata to DataCite by making an API request, and injecting the retrieved label metadata into what gets sent to DataCite.

Problem: We still have no way of tracking when labels in a project get updated; it relies entirely on when metadata for the object is updated more generally. So the question of how to find label updates remains open.

Recommendation

My opinion is that Option 2 would suit all parties best: it requires no special metadata within Islandora, and lets DataCite have the metadata it needs. Tracking updates to labels remains a problem, but should be tracked as an Issue on the new helper module.

Other considerations

From the perspective of a repository and metadata management, our ideal would be if an integration were set up between DataCite Canada and the Local Contexts Hub via the API, so that DataCite could pull the metadata for the Project directly based on the Project ID, which would eliminate the complexities of keeping the labels up to date. Project IDs are static, but labels change. If the integration were on the DataCite side, then the DOI record would be assured of having the most current version of the labels. My understanding is that this is not possible, so the recommendation above is my best idea for the moment.

I welcome other ideas!

@bondjimbond
Copy link
Author

Potential approaches:

  • Send just the rightsURI and let the user follow a link
  • The option recommended above, with a "last updated" date somewhere indicated

@bondjimbond
Copy link
Author

Note Local Contexts say that typical behaviour in communities is that labels might be added and modified a lot during the first 4-6 months, and then drops to very low or nothing. But they could be changed at any time.

@bondjimbond
Copy link
Author

Conclusion from meeting on 2022-09-06:

  • DataCite will revise recommendations, probably to include the option to, instead of sending full label information, include a link to the project via the rightsURI field -- that would be https://localcontextshub.org/project/[project ID]/
    • That link will have to either be included in the repository item metadata or generated from the Project ID when sent to DataCite
  • DataCite prefers the full label title and text as metadata when possible, but they can display that link and then send people to the project page on the Local Contexts hub.
    • Having the label info as metadata is useful for them because it would allow someone to query DataCite and identify objects with a particular TK Label type (although they're all supposed to be custom labels, so would that be possible?)
  • The DOI should resolve to the repository where the DOI was minted from in the first place, so from the Local Contexts perspective, this approach is fine because the labels will be displayed in up-to-date form on the main source of truth (the repository).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant