-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should WoRMS LSID be the value of dwc:taxonID or dwc:scientificNameID in Occurrence core/extension? #203
Comments
Basically I agree that WoRMS LSIDs are name identifiers and thus belong into DwC unfortunately still is inconsistent in its taxon/name/usage identifier documentation. dwc:taxonID is the primary key for the dwc:Taxon class, just as dwc:occurrenceID is for dwc:Occurrence. As the Taxon term name comes from the very early days of Darwin Core it was retained, although sth like NameUsage and nameUsageID would have been more appropriate. When it comes to occurrence datasets though, you do not want to use taxonID or scientificNameID as a primary key of a checklist that uses other terms as foreign keys. All you want is to refer to an external definition for the name OR taxon concept. For both these GBIF recommends to use scientificNameID or taxonConceptID (even though GBIF does still not make much use of those, but that will change at some point). @tucotuco I think the definition of dwc:Taxon class term really needs to be changed to align with the other taxonomic ID terms. |
Proposals for term changes are always welcome via the term change issue template. |
According to their definitions, there are elements of So let's just standardize on Both DwC and WoRMS might evolve during time, so we might need to update this. |
Thank you very much @mdoering and @bart-v !! I appreciate your comments on this.
Thanks @mdoering !! These are not obvious to me at all! How would you suggest to update the definition of dwc:Taxon class? On the other hand, is there a reason why scientificNameID is so restrictive (nomenclators only) ? I asked because I am thinking to propose a term change request for scientificNameID to broaden its scope to include identifiers for scientific name that are not from a nomenclator (e.g. WoRMS LSID). Does this sound sensible to you? |
|
I would like to get some input from the TCS Task Group on this subject (ping @nielsklazenga). When we wrote the DwC RDF Guide, it was understood that clarity was needed in defining what exactly a taxon was. The result was the normative content in Section 2.7.4 of the RDF Guide, which basically put off creating a robust definition of a taxon until TCS was revised as a current TDWG standard. The terms organized under the When the cleanup of dwc: and dwctype: classes took place in 2014,
In both of these situations, I think there was an assumption that the TCS revision would clear up the definitions. It is my understanding that the TCS task group (upon whom the task defining a "taxonomic entity as defined elsewhere" has fallen) is in the final stages of work. I am assuming that they will provide more clarity about the relationships among "taxa", "taxon name usages", and names. I think it would be best to learn more about their conclusions before proposing changes to Darwin Core terms (which I am assuming will ultimately be adjusted as necessary to be consistent with their model). |
It depends on whether the entries in WoRMS are meant to be taxa or names. For Catalogue of Life entries, I would use The two often get confounded though. Plants of the World Online uses the IPNI identifiers, but there they are |
@nielsklazenga Can you provide a reference that describes the difference between taxon and names within this context? |
@nielsklazenga I would like to see a reference/source, too. That would be most helpful. In addition, I just wish to lend my support and interest to this conversation. As a node manager who is publishing data to multiple aggregators, this issue of the use of the WoRMS LSID is of particular interest to my work and I would certainly appreciate some clarification to guide me and others to whom I provide training. |
For taxon, see dwc:Taxon; for taxon name, see dwc:scientificName and/or dwc:vernacularName. Darwin Core does not have a class for taxon names and dumps properties of both taxa and names in the Taxon class – which is why the Taxon class is not suitable for use with RDF (Darwin Core RDF Guide) – but TCS and the TDWG Ontology both have both TaxonConcept (which is equivalent to the Darwin Core Taxon) and TaxonName classes. To put it as simply as it really is, if a taxon is a We hope to bring TCS 2 to public review around the time of TDWG 2023. In the meantime, the Catalogue of Life Data Package (CoLDP) is really good and, if you use the schema with the Taxon table, completely TCS compliant. In the CoLDP schemas, Taxon is the Going back to Occurrence data, which I sort of missed yesterday you guys were talking about, and assuming that WoRMS is taxonomic data and not Occurrence data, the only appropriate field to provide the WoRMS LSID would actually be |
A few observations (and an alternative take),
|
I agree with everything Greg says. I still say that |
Thank you SO MUCH everyone who commented above!! I felt that there are some misunderstandings. Please allow me to clarify a couple of things: What happens to a record in WoRMS and its LSID when there is a name change?WoRMS = World Register of Marine Species (https://www.marinespecies.org/). When there is a name change, WoRMS creates a new record for the new name and the two records point to each other (please see figure above). Same for their children (if there is). Hence, the taxonomic status changed (accepted/unaccepted) but LSID associated with the name remains the same. In other words, WoRMS LSID has 1-1 relationship with each name. @nielsklazenga does this answer your question below?
Inconsistency between definition and application for dwc:TaxonI hope I illustrate @mdoering comment correctly in the figure below. I believe this is the reason why people are confused about how taxonID should be used and why we are looking for a dwc:Taxon definition. I thought the purpose of having data standard is so that everyone can use it in the same way, but I felt that this is not the case. If we data publishers are having difficulties with this, it will be even more challenging for end users who use the data we published.
@nielsklazenga I am sorry, I am not understanding what this sentence mean. It is the most appropriate but not appropriate? I am asking for the right field to use for WoRMS LSID for Occurrence core/extensionThank you @ghwhitbread ! My bad for not clarifying at the beginning. OBIS is an aggregator like GBIF which harvests Darwin Core Archive published via IPT. GBIF uses GBIF taxonomic backbone and OBIS uses WoRMS as its taxonomic backbone. I am NOT talking about species checklist, which OBIS does not deal with. I am talking about the Occurrence core/extension of primary observations data that I publish which I would like to use WoRMS LSID as an external identifier for the scientificName of an Occurrence record. I hope this is clear! Thanks again! |
Hi @ymgan, the fact that there is a one-to-one relationship between the WoRMS LSID and names does not make the LSID an identifier a for a Name, but an identifier for a Nominal Concept (Franz & Peet, 2009), basically a taxon for which you do not know the definition exactly. At any given point in time an entry (with an accepted name) in WoRMS is a Relational Concept, i.e. you can infer the definition from the context (its siblings), but over a longer period of time it is a Nominal Concept, because the context may have changed. This is not so much about what type of object an entry is, but all about how identifiers are managed, which is probably the biggest hurdle to making taxonomy really work. There was a question about this in the Catalogue of Life repo. (CatalogueOfLife/general/98) just yesterday. This is an issue for WoRMS (and a lot of other systems out there, including my own) though, not OBIS, or occurrence data. I am not sure if this answers your next question, but I like the way GBIF does it. In GBIF, |
FYI: WoRMS puts much effort into keeping the identifiers stable: we are managing them manually, so we will never create a new identifier for an existing name/concept/taxon in WoRMS. If it would happen anyway, we will keep the oldest identifier, and mark the duplicate as |
Rich Pyle has given two talks on the subject at TDWG meetings.
- 2008: https://youtu.be/vPVm5S0qIcs?si=-RBGxsgGyr4i8qhl 19 mins
- 2022: https://youtu.be/rmTvUUjBxrI?si=rtQFsJ5sqc7dktdK 56 mins
If you'd like to read about it instead, the first formal publication of the
ideas from a biodiversity informatics perspective is probably Berendsohn,
Walter G. 1995. The Concept of Potential Taxa in Databases. Taxon
44(2).
https://www.researchgate.net/publication/247816280_The_Concept_of_Potential_Taxa_in_Databases
…On Tue, Aug 1, 2023 at 11:09 AM Ben Norton ***@***.***> wrote:
@nielsklazenga <https://github.com/nielsklazenga> Can you provide a
reference that describes the difference between taxon and names within this
context?
—
Reply to this email directly, view it on GitHub
<#203 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACKZUDMIUU3WRC7IPILX6ULXTFA5JANCNFSM6AAAAAA2RO454Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I would argue the opposite. How can a WoRMS LSID refer to a Taxon if it links to a synonym? How can a taxon identifier change its content to from an accepted name to a synonym? If the taxonomic concept changes, but the identifier does not, it is a very bad taxon identifier. On the other hand if the identifier perfectly stays with the name over time, no matter where in the hierarchy it is placed or if it is accepted, then it is a pretty good name identifier and belongs into The vast majority of our taxonomic systems work with name based identifiers even though this is not clearly stated. This is true for GBIF, ITIS TSNs, Catalogue of Life, WoRMS, WFO, FaunaEuropaea and many more. Avibase, Dyntaxa and iNaturalist are exceptions. |
+1 for scientificNameID (again) … and whether or not it is supposed to be taxon or name (or both), OBIS still uses WoRMS as the source of nomenclatural details for a name. |
Proving that if a big enough user community is using a term some way, then that is how it should be used? I have to say this entire thread is nothing but confusing to me and I think it is because we have poor community-wide agreement on these terms and their meaning. However, if I were publishing anything to OBIS, I would put the WoRMS LSID in dwc:scientificNameID because if I don't, I might be surprised to find that I am not publishing to OBIS. |
Identifier | http://rs.tdwg.org/dwc/terms/scientificNameID |
Hi,
I have been on this quest for a while now because our project team is tasked to align OBIS quality checks with the Core Tests and Assertions from TDWG BDQ TG2. I talked to folks from GBIF Norway, GBIF Helpdesk, OBIS Secretariat, WoRMS, TDWG BDQ TG2 and a couple of GBIF/OBIS nodes specifically about this question, but the answers I got are all different. It is very frustrating to me when there is no consensus between the opinions. Hence I am opening this issue, summarizing what I understood and it would be great if we could find a consensus and solution together.
edit: This issue is talking specifically about the usage of WoRMS LSID in Occurrence
As some of the discussions take place through Slack, emails or in-person, I could not link all of the information as GitHub issue here. Please correct me if I am wrong in any sense.
Definitions
dwc:taxonID
dwc:Taxon
dwc:scientificNameID
Why OBIS recommends using dwc:scientificNameID field for WoRMS LSID?
From the response I received from WoRMS helpdesk (it was a thread migrated from OBIS Slack), there are a couple of reasons:
Can WoRMS LSID be used for dwc:taxonID?
Opinion from WoRMS
The response I received from WoRMS helpdesk (via email) is that taxonID is an identifier for a taxon concept and not a name. WoRMS does not have such concept. A remark about marine community links observations to names, not to concepts was also made.
This made me wonder if there is a confusion between dwc:taxonID and dwc:taxonConceptID?
Implementation concern from GBIF
GBIF Helpdesk once responded that it may not be a good idea to have WoRMS LSIDs as taxonIDs because they are not stable. TaxonIDs in the GBIF context should be identical between versions of the dataset, and they could potentially change if they come from unstable LSIDs.
The stability concern - I believe - is referring to WoRMS does not have stable identifiers for taxon concepts.
Please see more in the comments:
Why WoRMS LSID should be used for dwc:taxonID?
WoRMS is not an authoritative source of information on nomenclatural acts
This is perhaps the biggest argument I received when comes to WoRMS LSID should not be used for dwc:scientificNameID field. @chicoreus mentioned in the comment that the definition for dwc:scientificNameID is explicitly pointing at an authoritative source of information on nomenclatural acts, nomenclators. Since WoRMS is not an authoritative source of information on nomenclatural acts, it is not appropriate to use dwc:scientificNameID for WoRMS LSID. @mdoering also mentioned the concern in this comment.
dwc:taxonID is an identifier without a particular meaning to the instance of the Taxon class
Following @chicoreus comment which aligns well with the Darwin Core definition for dwc:taxonID:
My perspectives as a data manager for both GBIF and OBIS node
Difference in interpretation leads to difficulty in collaboration
It is VERY difficult for me as a data manager for both GBIF and OBIS node when there are differences in interpretation in whether WoRMS LSID should be populated under dwc:taxonID or dwc:scientificNameID. One example is I had this conversation when I attended a workshop organized by Nansen Legacy and GBIF Norway. GBIF Norway thinks that WoRMS LSID should be populated under dwc:taxonID, but OBIS and WoRMS insisted that it should be populated under dwc:scientificNameID with the reasons stated above. Furthermore, dwc:scientificNameID is a mandatory field for OBIS. I appreciate that @pieterprovoost was being pragmatic and mentioned that he will look for solution, such as parsing dwc:taxonID in OBIS data processing. The data could be interpreted better if there is a consensus here.
Implications for the future
The new unified data model
I really hope we could find a consensus now than having this carry over to the new data model (see screenshot below) Screenshot taken today 2023-07-20.
I am aware of this is an immature state of the model. Based on my email conversation with WoRMS, the same issue seems to persist - to WoRMS, it makes sense to add observedScientificNameID to the ReportedAbundance table
My questions
Can we reach a consensus on whether WoRMS LSID should be used for dwc:taxonID or dwc:scientificNameID?
Right now the standard seems to suggest that dwc:taxonID should be used for WoRMS LSID, but the implementation side seems to suggest otherwise. So what exactly should a data manager like me do? This is so frustrating!
Is there anything unclear about the usage of dwc:taxonID, dwc:taxonConceptID or dwc:scientificNameID that should be improved in Darwin Core documentation?
If so, what is it? What leads to different interpretations between different people/organizations? If we could identify that, a term change request should perhaps be submitted.
Thank you
Thank you everyone who talked to me and helped me in understanding this in any way! I hope I summarized the issue well. I definitely am not the most tactful person, apology if I stepped on your ego. Please correct me if I said anything wrong!
The text was updated successfully, but these errors were encountered: