-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GREI HDV Task: Determine whether/how Dataverse can support hierarchical vocabularies #236
Comments
Sonia and Julian met and discussed additional steps to getting this task done. See updates to "deliverables" above. Julian estimates he can devote time to this issue in mid September |
2024/08/05
|
I think I said the opposite - with the external vocab mechanism, Dataverse just stores a term URI, so there are no changes needed to support a hierarchical vocabulary - all the changes would be in the JavaScript (to be developed for a give vocabulary/service) where it should be simple to find a widget/mirror what other sites do, etc. to handle hierarchy or graph relations, etc.) |
Thanks @qqmyers for clarifying. @siacus and @scolapasta please see Jim's comment for an update to our understanding about hierarchical vocab support. |
2024/11/18: Ask @qqmyers to take a look, recommend next steps (e.g., development work needed), create relevant development issues. |
I'd suggest some ~non-dev work to start:
With the answers above, I think it should be straight-forward to scope the JavaScript work needed to support the input and display, identify whether there's work related to a new metadata block, whether updates are needed to exporters, etc. |
@jggautier and @sbarbosadataverse do you have suggestions for the first bullet points Jim suggested here: #236 (comment) ? |
Hmm, I'll try to think about it and reply later today |
2024/11/21: Placing On Hold until @sbarbosadataverse and @jggautier figure out which vocabulary they want to investigate. |
May I offer the idea that using a hierarchical vocabulary should help finding data even without additional visuals? E.g., if I tag a dataset "European politics", I should be able to find it when I search for the broader term "politics" (assuming the use of a vocabulary that includes those terms in a hierachical relationship). Just chiming in since this issue replaces ones of the oldest Dataverse issues, while it appears to not cover all of the old issue's contents. |
Thanks @bencomp. Could you write more about what additional visuals might mean? |
For a moment I thought this issue was only/mostly about the visual navigation of a hierarchy, but on a second read I see that was my mistake. |
Ah, okay, visuals like how depositors and curators might select terms from a hierarchical vocabulary. Thanks! Yeah we definitely mean to consider all aspects of "support", like what was discussed in the older GitHub issues that this issue replaced. |
Status: January 2025Julian and I met and discussed some tasks associated with Jim's plan:
In addition to what Jim outlined, and to happen in parallel:
|
Thanks @sbarbosadataverse and @jggautier looks like a great plan to me. Curious about @qqmyers thoughts? |
Not sure what to comment on: re: 4 - not sure why implementation in a new block can't available site-wide, but, if the idea is to have this in the citation block Keywords field - it would be required to be on for everyone (so non-medical collections would have to see any medical vocab). re the second: The way our ext. vocab service currently works is that there can be one script per field. That means that if you want to support one hierarchical vocab and free-text entries, the script has to support that (most of our current ones do) and if you want multiple vocabs, again the script has to support it (currently only our skosmos script does that and it requires both vocabs to be on the same server.) Same for multiple vocabs and free text - that would all be built into a single script. There is interest in the community in allowing multiple scripts on a given field and even allowing different scripts to be turned on for a given field in different collections. If/when that is designed/implemented, individual scripts could probably stop doing anything to handle free text or multiple vocabs. Which ~means that starting with single vocab per field is fine/it's extra work to support multiple vocabs and, until there's a clear design, work towards multiple vocabs in one field could end up being one-off/have to be redone later. (I don't have a good guess as to when redesign might get going - probably faster if Harvard is also interested due to GREI). |
|
I'm adding info in this comment and will continue updating it as I learn it about MeSH and UMLS. They include some assumptions that I can verify or correct as we learn from groups that we think would be interested in or benefit from using these vocabularies to describe what they publish in Dataverse repositories (MORU, HEAL, AfricArXiv) and as we learn from groups who have already used these vocabularies. An early assumption is that depositors, curators and other types of users are using Dataverse's Keyword and Topic Classification fields to use terms from MeSH and to use UMLS to describe deposits published in Dataverse repositories, so I'm including questions and assumptions about those two fields. MeSHPurpose of MeSH Size and complexity of MeSH
The MeSH model is also summarized on a page in the UMLS site. See https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MSH Where are the terms hosted?
How and how often are the terms used in Dataverse repositories? Here are the 10 installations that have published most of these deposits: I found those 683 datasets by looking for datasets where:
The Keyword fields are used more than the Topic Classification fields, and sometimes both fields are used in the same deposit. MeSH terms entered in Keyword fields in a dataset, https://doi.org/10.15139/S3/V8G3QG, published in UNC Dataverse MeSH terms entered in Topic Classification fields in a dataset,https://doi.org/10.18419/DARUS-4230, published in DaRUS MeSH terms entered in the Keyword and Topic Classification fields in a dataset, https://hdl.handle.net/20.500.12682/rdp/CN73BH, published in DOMUS Dados Why do people use MeSH terms and how do people choose terms? Julian's assumptions:
Who to contact to learn more?
UMLSPurpose of UMLS Size and complexity of UMLS For the full list of the approximately 100 vocabularies in UMLS, see the UMLS Vocabulary Documentation page UMLS includes relationships among the vocabularies' terms or concepts. Mappings between terms are described on the UMLS Metathesaurus - Mapping Projects page. Where are the terms hosted? Why do people use UMLS? Whereas hierarchical vocabulary support might include making it easier for users to add terms and improving search by relying on relationships between terms in the same vocabulary, such as parent-child and synonymous relationships, "support" for UMLS might include relying on the relationships between terms in different vocabularies. For example, if I find a dataset that's described using a SNOMED CT vocabulary term, can I find other datasets described using equivalent or narrower terms in the ICD-10-CM vocabulary and other related vocabularies? Whoever included UMLS might also have imagine "UMLS support" as a way to support multiple hierarchical vocabularies instead of only MeSH. Who to contact to learn more?
Keyword and Topic Classification metadata fieldsWhy do people decide to use the keyword or topic classification fields?
The keyword and topic classification fields are influenced by properties or elements in the DDI Codebook standard. How does that standard's maintainers define these properties? How do they think the two properties are different? How do they expect others to use them?
Topic Classification definition in Codebook 2.1 and Codebook 2.5:
|
Status: February 2025
|
Overview
Deliverables
Tasks:
Resources
The text was updated successfully, but these errors were encountered: