Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Harvest metadata values that aren't from a list of controlled values #9992

Closed
DS-INRAE opened this issue Oct 10, 2023 · 6 comments · Fixed by #10323
Closed
Labels
Feature: Harvesting pm.GREI-d-2.4.1B NIH AIM:4 YR:2 TASK:1B | 2.4.1B | (started yr1) Resolve OAI-PMH harvesting issues Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.
Milestone

Comments

@DS-INRAE
Copy link
Member

DS-INRAE commented Oct 10, 2023

Overview of the Feature Request
When harvesting, the controlled values field (e.g. Subject) prove very limiting as the value expected for this field have to be in the list determined by the tsv.
It would be best if for harvested datasets these constraints were lifted.

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)

What inspired the request?
Errors harvesting other data repositories. Example (with KindOfData field) :
<message>Exception processing getRecord(), oaiUrl=https://repo-int.ortolang.fr/api/oai, identifier=oai:ortolang.fr:e5bb1019-df55-41b8-bfb1-657dd9c37e3a, edu.harvard.iq.dataverse.api.imports.ImportException, Failed to import harvested dataset: class edu.harvard.iq.dataverse.util.json.ControlledVocabularyException (Value 'corpus' does not exist in type 'kindOfData')</message>

What existing behavior do you want changed?
Allow harvested datasets to have values out of the controlled list from the tsv.

Any brand new behavior do you want to add to Dataverse?
no

Any open or closed issues related to this feature request?
no ?

@cmbz cmbz added the pm.GREI-d-2.4.1B NIH AIM:4 YR:2 TASK:1B | 2.4.1B | (started yr1) Resolve OAI-PMH harvesting issues label Oct 11, 2023
@cmbz cmbz moved this to SPRINT- NEEDS SIZING in IQSS Dataverse Project Dec 18, 2023
@jggautier jggautier changed the title Feature Request/Idea: Harvest metadata values out of controlled values Feature Request/Idea: Harvest metadata values that aren't from a list of controlled values Dec 19, 2023
@cmbz
Copy link

cmbz commented Dec 19, 2023

2023/12/19: Sized at 33. Some discussion about how best to implement the functionality. Plans to discuss this during a Tech Hours.

@cmbz cmbz added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Dec 19, 2023
@cmbz cmbz moved this from SPRINT- NEEDS SIZING to SPRINT READY in IQSS Dataverse Project Dec 19, 2023
@jp-tosca jp-tosca moved this from SPRINT READY to This Sprint 🏃‍♀️ 🏃 in IQSS Dataverse Project Jan 31, 2024
@stevenwinship stevenwinship self-assigned this Feb 8, 2024
@stevenwinship stevenwinship moved this from This Sprint 🏃‍♀️ 🏃 to In Progress 💻 in IQSS Dataverse Project Feb 8, 2024
@landreev
Copy link
Contributor

landreev commented Feb 9, 2024

This is my recollection of the recent discussion of this at a tech hour:

  • This is for harvesting only (i.e., a site should be able to configure this as an option - "accept values for such and such field even when they are not in the controlled vocab, when harvesting";
  • We already have this notion, of relaxing our metadata requirements when importing harvested content - in the method ImportServiceBean.doImportHarvestedDataset(), some constraint violations are relaxed compared to "normal" imports;
  • The way our metadata fields work, it should in fact be possible, to simply save an arbitrary field value for a field that has a controlled vocab., so this part should just work;
  • Important to note that harvested metadata fields are only used for indexing in solr and nothing else.

@stevenwinship
Copy link
Contributor

RE: > * The way our metadata fields work, it should in fact be possible, to simply save an arbitrary field value for a field that has a controlled vocab., so this part should just work;

I've tried this but one issue was that it created a new value that gets added to the UI dropdown list. The value wasn't added to the db table but it could be if needed. If we want to add it, it would require re-ordering the list or adding it to the end.
Another option is to simple remove the value and only save the values contained in the list. This keeps dv that is harvesting clean but then we loose the value for the field (possibly leaving it empty).

@qqmyers
Copy link
Member

qqmyers commented Feb 15, 2024

Did you save the value as a normal/non-controlled one or did you create a new controlled vocabulary value? I'd expect the latter to add a drop-down entry, but I think the hope was that the former would work.

@cmbz cmbz moved this to SPRINT READY in IQSS Dataverse Project Feb 22, 2024
@cmbz
Copy link

cmbz commented Feb 22, 2024

2024/02/22
Added to project; absence was an accidental oversight.

@cmbz cmbz moved this from SPRINT READY to Done 🧹 in IQSS Dataverse Project Mar 20, 2024
@DS-INRAE DS-INRAE moved this to Done in Recherche Data Gouv Jul 10, 2024
@pdurbin pdurbin added this to the 6.2 milestone Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting pm.GREI-d-2.4.1B NIH AIM:4 YR:2 TASK:1B | 2.4.1B | (started yr1) Resolve OAI-PMH harvesting issues Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) Type: Feature a feature request User Role: Depositor Creates datasets, uploads data, etc.
Projects
Status: Done
6 participants