-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships #1
Comments
I come at the definition of Taxonomic Concept from a different direction, which is from population perspectives, i.e. one or more population(s) of taxonomically related individuals. As opposed to approaching this from the taxonomic names direction. I see taxonomic names as labels to taxonomic concepts and while the names will change from author to author and taxonomy to taxonomy, the underlying concept must be unchanging through time (though, of course, individuals within population(s) are born, reproduce and die). 'A rose by any other name ...' Denis Lepage et al. in https://dx.doi.org/10.3897/zookeys.420.7089 describe concepts and the issues though I don't immediately see an actual definition in his paper. I do believe that our first task is to coin a definition that should include the immutability and disassociation of a concept with specific scientific names, author and publication of each concept. I can 'mostly agree with this definition |
One thought that will certainly cause discussion and controversy is that from my perspective, a taxonomic concept doesn't actually require a name. At the most basic, it requires some form of reference (even unpublished) and description of the associated population(s). There are cases that eBird must manage every year where an unpublished species still is a valid taxonomic concept, it doesn't have a name, author or citation yet, but it is being observed and recorded in the natural world and therefore, most have a taxonomic concept ID for it to be useful in the eBird world. |
I actually don't find that controversial at all. When a tree falls in the woods and nobody is there to hear it, it still creates sound waves. Likewise, when groups of organisms are known to exist in nature but have not been formally assigned an Linnean-style scientific name, they still exist as taxon concepts. As for the actual definition of "taxon concept", I would go with something simpler: "A circumscribed set of organisms asserted to represent a taxon". It's not circular, because the word "concept" is what we're trying to define here. A slightly more elaborated version might qualify "organisms" as "inclusive of individuals living, recently dead, and yet to be born, " Technically, Code-governed Linnean-style names are labels attached to name-bearing type specimens. To use a Linnean-style name as a label for a concept, it's necessary to include a "sensu" or "SEC" qualifier, which by convention is some form of Reference citation (typically author + year). I agree that names shouldn't be part of the definition of a taxon/concept, and I wouldn't include this convention for labeling concepts as part of the definition of a concept either. Instead, I would keep the definition of "taxon"/"concept" as the simple version above, then strongly support a standard human-friendly concept labeling format along the lines of: [LinneanStyleName] [NomenclaturalAuthority] SEC [ReferenceAssertingOrDefiningConcept] As for identifiers, applied to names and concepts, my own views are well-documented: we need a system of persistent, shared identifiers for taxon-name usage instances, and then apply those identifiers in context as proxy anchor-points for both taxon names and taxon concepts. But I imagine that would be a different thread... Aloha, |
I would also go for a short definition with the focus on defining a set of organisms. "A circumscribed set of organisms asserted to represent a taxon" is pretty good. It makes me wonder what a taxon exactly is though. Does that need another definition? For example is a classification essential for a taxon and does a change in the classification change the concept? I would think it does not, but I know some think differently.
Markus
|
Thanks, @mdoering . Yeah... I was a little queasy about that as well. My hope is that "Taxon" is reasonably well understood, given that it is the basis of an entire field of study (i.e., taxonomy). But you're right -- while my proposed definition may not be circular per se, it does somewhat dodge and obfuscate a clean definition by leaning too heavily on an equally abstract and vague term. I suppose the definition could simply be "A circumscribed set of organisms", but there are other reasons for circumscribing sets of organisms that are non-taxonomic (e.g., "marine organisms", "organisms in Hawaii", etc.). That's why I felt the definition needed the additional refinement of "asserted to represent a taxon". I think as Jeff and others have said, the "asserted" part is key, because any taxon concept really inherits its meaning from an assertion put forth by taxonomists (or non-taxonomists). My sense of "taxon" is that the word implies a set of organisms that more or less share an evolutionary history. I wanted to avoid such specifics, however, to sidestep the whole monophyletic/holophyletic/paraphyletic issue which, while interesting in its own right, is outside the scope of what we're trying to achieve here. So... my feeling is that the definition I proposed stands as it is even without a clear/agreed definition of what a "taxon" is. Different people may agree or disagree on what is implied by "taxon", but what matters to the definition is that someone asserted a set of organisms to represent a taxon -- by whatever notion of "taxon" that someone had in mind. Linneaus predated Darwin by a century and was himself a creationist; but I think it's fair to say that he asserted circumscribed set of organisms to represent taxa. In his mind, taxa were created by God, which is not consistent with the view of modern evolutionary biologists; yet despite this fundamental gap in the essence of a "taxon", both Linneaus and modern evolutionary biologists still assert circumscribed sets of organisms to represent taxa in ways that are fundamentally comparable, and fall within the scope of what I think we're circling around for defining what we mean by "taxon concept" in this context. Sorry for the ramblings.... Aloha, P.S -- Sorry - I accidentally clicked the wrong button.... |
I like Rich's definition. We need to work out how Taxon, Taxon Concept, Name Usage and Instance relate to each other (I'll create a new issue for that tomorrow; it's in the discussion document that Greg and I wrote), but I would say that the Taxon is the actual group of organisms that is out there (or we think is out there), while the Taxon Concept is the abstraction, or what is in our heads. |
Thanks @nielsklazenga -- I agree with your distinction between "taxon" as being the actual set of organisms, and "concept" as being our abstract human interpretation of it. In that context, I would probably apply my proposed definition to "taxon", and parse out the other terms as follows: Taxon: "A circumscribed set of organisms, inclusive of individuals living, recently dead, and yet to be born, asserted to represent a natural cohesive biological unit" [This may need some elaboration on "natural cohesive biological unit", but again the key is that in order to exist, it must asserted to be such.] Taxon Concept: "A set of physical, genealogical, phylogenetic or other biological properties or characters of organisms used to define the abstract boundaries of a taxon circumscription that collectively distinguish it from other taxa." [What I'm trying to suggest here is that the "concept" is derived from the actual properties used to describe the abstract boundaries of taxon circumscriptions, which is the way that taxonomists determine whether any particular organism/individual is or is not an instance of an asserted Taxon.] For my own understanding of "Taxon Name Usage" and associated terms (e.g., "Reference", "Name-String", "Appearance", etc, see: Taxonomic name usage files. I'm not a big fan of defining the term "Instance" by itself within this context, because that word is so broad and vague that we shouldn't try to co-opt it to have a more specific meaning. |
Awesome. @deepreef, in terms of the relationship between Taxon Concept and Taxon Name Usage, would you agree that Taxon Name Usage can be an operationalisation of Taxon Concept? |
I guess my answer to that depends on what you mean by "operationalisation". The way I have characterized it in the past, is that a "Taxon Name Usage" (TNU) encompasses all of the text, numbers, figures, data, etc. associated with the implied taxon concept asserted within a Reference. An identifier assigned to that TNU includes all of that associated information collectively as the "thing" that is identified. Thus, I guess I would say that the TNU identifier implies the full set of information used in asserting a Taxon Concept. In this sense, I think it's fine and appropriate to regard the TNU as the "operationalisation" of the Taxon Concept, in the sense that it encompasses all of the documented information used in the Reference to define the boundaries of that Taxon Concept. One of the caveats, however, is that I think that a TNU can be used to operationalise more than just the Taxon Concept. For example, a subset of TNUs are Protonyms (i.e., those that create new scientific names, or "nomenclatural novelties"). In some contexts, the TNU (=Protonym) can also simultaneously be the operationalisation of the "taxon name" entity (important for nomenclators, but devoid of any connection to taxon concepts other than the name-bearing type specimen), as well as the operationalsation of the implied taxon concept associated with that name within that Reference (no different from any non-Protonym TNU). I personally don't see a problem with that, because the distinction of whether or not a particular TNU identifier implies (or serves as proxy for) the nomenclatural bits of the TNU or the taxon concept bits of the TNU depends on the context in which the identifier is cited. The identifier identifies the TNU (i.e., the collective set of text, numbers, figures, data, etc. associated with the implied taxon concept asserted within a Reference); but the TNU serves as a very useful proxy for both nomenclatural actions, and taxon concept definitions. Man, this stuff is hard enough to think about, let alone write about! And for those who argue that these sorts of discussions are too deep into the weeds to be useful in this context; I would counter that the reason we've been unable to solve these issues after decades of discussing and debating them is because we have thus far failed, as a community, to dive this deep into the weeds previously. |
You are very good at writing about it though. I agree with all that. At a later stage we can probably come up with a list of types of Taxon Name Usages and how they relate to Taxon Names and Taxon Concepts. I agree that it is important to have these discussions, as I think that, once we've nailed down the core concepts, the rest will become more straightforward. |
If we use Taxon as being "A circumscribed set of organisms, inclusive of individuals living, recently dead, and yet to be born, asserted to represent a natural cohesive biological unit" then a taxon_identifier would be an identifier that is persistent and always means the same 'circumscribed set of organisms' regardless of what taxonomic name is applied, what taxon authority is applied and what taxonomic level is applied. Isn't taxonomic id already utilized with and generally closely tied to a name? as opposed to a 'set of organisms'? Maybe I have a basic misunderstanding that can be corrected. |
Yes, that is my understanding conceptually. However, for practical purposes, I'm not sure how one would ever know that two circumscribed sets of organisms asserted by two different authorities (accordingTo), with the same or different names, and the same or different taxonomic levels, represent the same taxon concept (at least with enough confidence to utilize the same taxon_identifier). An example we wrestled with in the early days of discussing this is suppose you have Smith 1950 asserting a taxon concept, with various information delimiting the boundaries of that concept (e.g., characters, junior synonyms, geographic distributions, etc.). Then Jones 1980 uses the same name, same synonymy, but adds some additional characters (not mentioned by Smith), and perhaps adds a geographic range extension. Can we confidently assume that both are the same taxon concept, and therefore both can utilize or reference the same taxon_identifier? That would require expert knowledge of the group to assert, and even then what would be required for Smith herself and Jones himself to mutually agree that they are referring to the same implied circumscribed set of organisms? This is why I never felt there was much practical value in creating taxon_identifiers that are independent of the underlying TNU(s) that assert the taxon concept(s). It's also why TCS went with the notion of "TaxonRelationshipAssertions". That is to say, while we may be able confidently document that Brown 2000 asserted that taxonConcept sensu Smith 1950 is congruent with taxonConcept sensu Smith 1980, we cannot "know" they actually are congruent with enough confidence that we can share the same identifiers for both concepts. This is why I think anchoring everything to TNUs (rather than taxon_identifiers of some sort) is more practical, and instead of asserting concept congruence via shared taxon_identifiers, we assert some sort of set-theory relationship between the concepts represented by two TNUs (e.g., as congruent, or includes, or overlaps or whatever). Sure there may be some cases where we can universally accept congruence in taxon concept from separate TNUs with enough confidence that we could anchor both to the same taxon_identifier; but I wager such cases would represent the vast (VAST) minority, and in that context does it really make sense to define and maintain and utilize yet ANOTHER class of identifiers (in a domain that is already overflowing with subtly different classes of identifiers)? On the other hand, if we lower the "bar" for what we accept as "congruent" concepts (e.g., sets of distinct name-bearing type specimens -- aka heterotypic/subjective synonomies), then we're in a much better place to aggregate sets of TNUs into congruent taxon concepts more objectively, in which case a dedicated class of taxon_identifier might well be useful. Sorry for the extended ramblings... |
Thanks for raising this.
For along time I wonder if we should differ between a NameUsage and a TaxonConcept.
In most cases when we talk about concepts we refer to a specific, published usage of a name - NAME sec. REFERENCE. What exactly the concept is, is not expressed at all and it is going to be hard to find properties that describe it. Is it worthwhile to differ between the attempt to list defined (and unique?) concepts and the simple referring to a name used in some publication? If it is only about the later I much prefer the term NameUsage which does not pretend to be more that just that.
Markus
|
In the world of birds, this happens very frequently, i.e. where different authorities and even different versions within an authority have different name usages that apply to the exact same taxon concept and we can be very certain that they do in fact refer to the same concept.
It often seems that when a species is described, the concept exists (as discussed earlier) but the description of the concept does not always exist. A later authority will come along and describe the concept in more detail (maybe adding a geographic range), but I would argue that doesn't change the concept, only clarifies it.
I do recognize that we are fortunate in the bird world because concepts do not change very often and are fairly well known/agreed upon, though there have certainly been some surprises which require a new concept (even when names don't change).
I manage eBird and several online taxonomic monographs and having a taxonomic concept identifier that was static through time (as long is it refers to the same set of organisms) if very important as we manage 500 million observations and 4000+ species pages. Each observation or species page is keyed in the database to a taxon concept ID. And when a name changes, I can simply apply a new name to that concept as opposed to changing the impacted concepts.
Jeff
…--
Jeff Gerbracht
Lead Application Developer
Neotropical Birds, eBird, Birds of North America
Cornell Lab of Ornithology
607-254-2117
________________________________
From: Richard L. Pyle <notifications@github.com>
Sent: Friday, August 31, 2018 3:52:30 AM
To: tdwg/tnc
Cc: Jeff A. Gerbracht; Comment
Subject: Re: [tdwg/tnc] Taxon, Taxon Concept and Taxon Name Usage: definitions and relationships (#1)
Yes, that is my understanding conceptually. However, for practical purposes, I'm not sure how one would ever know that two circumscribed sets of organisms asserted by two different authorities (accordingTo), with the same or different names, and the same or different taxonomic levels, represent the same taxon concept (at least with enough confidence to utilize the same taxon_identifier). An example we wrestled with in the early days of discussing this is suppose you have Smith 1950 asserting a taxon concept, with various information delimiting the boundaries of that concept (e.g., characters, junior synonyms, geographic distributions, etc.). Then Jones 1980 uses the same name, same synonymy, but adds some additional characters (not mentioned by Smith), and perhaps adds a geographic range extension. Can we confidently assume that both are the same taxon concept, and therefore both can utilize or reference the same taxon_identifier? That would require expert knowledge of the group to assert, and even then what would be required for Smith herself and Jones himself to mutually agree that they are referring to the same implied circumscribed set of organisms?
This is why I never felt there was much practical value in creating taxon_identifiers that are independent of the underlying TNU(s) that assert the taxon concept(s). It's also why TCS went with the notion of "TaxonRelationshipAssertions". That is to say, while we may be able confidently document that Brown 2000 asserted that taxonConcept sensu Smith 1950 is congruent with taxonConcept sensu Smith 1980, we cannot "know" they actually are congruent with enough confidence that we can share the same identifiers for both concepts.
This is why I think anchoring everything to TNUs (rather than taxon_identifiers of some sort) is more practical, and instead of asserting concept congruence via shared taxon_identifiers, we assert some sort of set-theory relationship between the concepts represented by two TNUs (e.g., as congruent, or includes, or overlaps or whatever). Sure there may be some cases where we can universally accept congruence in taxon concept from separate TNUs with enough confidence that we could anchor both to the same taxon_identifier; but I wager such cases would represent the vast (VAST) minority, and in that context does it really make sense to define and maintain and utilize yet ANOTHER class of identifiers (in a domain that is already overflowing with subtly different classes of identifiers)?
On the other hand, if we lower the "bar" for what we accept as "congruent" concepts (e.g., sets of distinct name-bearing type specimens -- aka heterotypic/subjective synonomies), then we're in a much better place to aggregate sets of TNUs into congruent taxon concepts more objectively, in which case a dedicated class of taxon_identifier might well be useful.
Sorry for the extended ramblings...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AB3JSXoK0j6WL6NJaF1ppBzNnfzkDOspks5uWOs-gaJpZM4WNKqM>.
|
In reply to @mdoering: "Is it worthwhile to differ between the attempt to list defined (and unique?) concepts and the simple referring to a name used in some publication? If it is only about the later I much prefer the term NameUsage which does not pretend to be more that just that." Short reply: I agree with your second sentence! Longer reply: The way I look at it, NameUsage instances come in many flavors -- ranging from a mere mention of a name within a Reference to full-blown treatments with full synonymies, robust material examined and character descriptions, phylogentic analyses, geographic distributions, etc., etc. The degree to which one can divine the boundaries of a taxon concept circumscription will likewise vary tremendously as well. There may be some value in drawing a line between NameUsage instances that include a full heterotypic synonymy, and those that do not. The former can be used to algorithmically compare NameUsage instances as to the sets of type specimens they include and determine them to be congruent, includes, included in, etc. (per our discussions in Dave Remsen's house a couple years ago). While circumscription boundaries drawn using collective sets of type specimens (i.e., complete asserted heterotypic synonymies) are not as granular as those marked by character states and/or enumerated specimens & populations; they are FAR more practical in terms of determining (approximate) concept congruity. As such, we can do all the reasoning we need using only NameUsage instances, without the need to separately mint identifiers for Taxon Concepts as entities that exist independently of the individual usage instances. I agree with @jgerbracht that the concept exists (at least in the abstract) independently of the extent to which it is described or fleshed out within the documented Name-Usage instance; but the problem as I mentioned before is that, beyond comparing heterotypic synonomies, expert knowledge is necessary to assert the congruency (or not) in concept circumscription between any two given name-usage instances. In cases where that expert knowledge is available, I think it's better to capture something along the lines of a TaxonRelationshipAssertion (sensu TCS) to map the relationship between two Name-Usage instances, rather than mint some sort of identifier for the abstract concept itself, then link both name usages to it. Also, the precision and granularity of what the concept boundaries are will vary, and as such the decision to regard them as the "same" concept or "different" concepts will change. In some cases range extensions do not represent a change in concept, but in other cases they do. Take for example that Species A SEC Ref1 is described from specimens in Hawaii. Then someone finds a population in the Marshall Islands (range extension; Species A SEC Ref2). Because it's a range extension, there is no change in concept. However, later genetic data and other evidence convince someone else that they're actually different species, so we have Species A SEC Ref3 from Hawaii, and Species B Sec Ref3 from the Marshall Islands. Now... what is the relationship between Species A SEC Ref1 and Species A SEC Ref3? If the author of Ref1 (who was unaware of the Marshalls population) was a splitter, her concept might be sensu stricto and hence the same as Ref3. Or she might have been a lumper, in which case her concept would be sensu lato and congruent with Ref2. I think it's much better to anchor our "concepts" as 1:1 with individual name-usage instances, then add a separate layer for assertions about how those concepts relate to each other in terms of congruency/etc. @jgerbracht : one possible solution for what you describe is to establish a system analogous to type specimens but for Name-Usage instances that define taxon concepts. Instead of minting a new taxon_identifier to represent the concept (independent of the individual name usages that collectively define it) and linking all relevant TNUs to that separate identifier, you could (eBird could) have a system where they pick one TNU among several that relate to the same Concept, then brand that the "type TNU" for the concept, and link the other TNUs to it. This "Type TNU" effectively serves the same role as a taxon_identifier would, but without needing to deal with a new class of identifiers. Think about it this way: even if we do mint taxon_identifiers to represent abstract concepts independent of the name usages, then you still need some reference point for that concept instance. Suppose there are four TNUs linked to the same concept instance, but then later someone realizes that a mistake was made, and that two of the TNUs refer to a slightly different concept than the other two. What happens to the concept instance? Does it disappear and two new ones are minted? Or does the original concept instance stay with two of the TNUs, and another concept instance is minted to represent the other two? What if three go with one concept, and one with the other? What if 49 go with one concept and 1 goes with the other? If we mint two new ones and "retire" the original concept, what happens to all the external data linked to that "retired" concept? If we maintain the original concept with one subset of TNUs and mint only one new one for the other, then there will need to be some mechanism for deciding which set of TNUs the original concept remains with (e.g., a "type usage" instance, analogous to a type specimen). Again, I apologize for the long post here; but there's a reason we've never quite sorted all this stuff out before. The good news is that this conversation seems genuinely fresh to me, and I honestly think we're making good progress! |
I'm a bit behind on this thread due to traveling at the end of the TDWG meeting. But I had several items that I wanted to add for the record.
It's possible that this node could also connect names to things like sets of specimens or organism occurrences rather than to a reference if that is an acceptable alternative way to define the taxon. |
Many thanks, @baskaufs ! Your post reminded me of our very animated discussions of "dwc:Organism", which in the end was, in my opinion, an extremely useful exercise. Evidently it was also successful, in that unlike this never-ending discussion about taxon (which parallels the never-ending debates about "What is a species?"), the "Organism" discussion seemed to come to a stable close (or perhaps no one cares enough about it to debate it anymore?) In any case, I really like (and agree with) your point that "we ended up defining the class in a way so that it "did" what we wanted it to do, rather than defining it to "be" what we thought it should be." To be honest, I think that applies to the definitions of all of our terms (not just dwc:Organism). We like to think we're modelling nature as it is; but that's not what we're doing. We're modelling how to track information about nature in a way that makes it easier for us to answer the diverse set of questions we want to ask about it. In that context, and having participated in the "taxon definition" discussions since the 1990's (the discussions began earlier than that), I actually feel that this discussion is making some novel progress, which I think is a good sign that we may be able to achieve some consensus in moving forward. Your post above made me realize why I think we're getting somewhere: in the past, the debate always got bogged down in "what IS a taxon?" (~= "What IS a species?") However, I think you captured a key point that I hadn't been able to put my finger on before, which is that we shouldn't spin our wheels endlessly trying to define what IS a taxon, and instead focus on how we want to define a taxon entity such that it fulfills our desires to answer the diverse set of questions we want to ask about nature. We seem to have mostly stabilized on what a TNU is (and how its used). The outstanding question is whether there ought to be a separate entity (with a separate pool of identifiers) to represent a "Taxon Concept". The role such an entity/identifier would play is as an aggregator of TNUs that all represent the same circumscribed set of organisms. Similarly to "Organism", the "Concept" entity would not have many (any?) properties of its own, but rather would serve the function of linking clusters of TNUs together for the convenience of using one identifier to represent a collection of many TNUs. In principle, I understand the value & simplicity of having such a defined entity (and corresponding identifier). In practice, though, I fear that it will end up as a hodgepodge of fuzzily-defined (to varying degrees) instances whereby different people will aggregate different sets of TNUs differently into concepts. The only way I can see it working effectively is via an additional "join" entity similar in many ways to Identifications for assertions about which TNUs map to which concepts (and that will start to get messy). The problem is that I'm not sure how effective that will be in helping us to answer the diverse set of questions we want to ask about nature. Instead, I'd like to see us pin down the definition of TNU (and its various flavors, including Protonyms, Treatments, etc.), then flesh out a few million instances of them with their core properties (especially heterotypic synonym mapping), then allow the need for a "Concept" entity to emerge (or not) from that. Again, sorry for the long diatribe... |
Cool! After spending time looking at how other standards organizations work, I'm increasingly convinced that the effective way to work is to define the use cases first, then develop the standards while testing the proposed features of the standard against those use cases. That's basically what you've proposed - define what we would like for TNUs and taxon concepts to do, then try to build the system to make them work. Keep the features that work, discard the ones that don't. THEN write the standard describing how the features were successfully implemented. |
OK, then maybe one way to establish use cases is to enumerate some questions we would like to ask about organisms in nature, specifically related to taxa and their names (starting with the pedantic ones and moving on to more general ones): Nomenclature Taxonomy Classification Biodiversity OK, I got tired of writing these questions, but there are obviously many of these kinds of questions we would like to be able to answer. In my mind, use cases involve sets of these questions to allow is to traverse from a given set of inputs to a given set of outputs. For example, a use case might be: Another might be: To fulfill these use cases, we'd need to be able to answer several of the questions above. I don't know if this is the right strategy to identify how best to proceed on this discussion and its desired outcome, but it seems to me that enumerating questions of this sort both builds the foundations for addressing Use Cases (or, perhaps, enumerating the Use Cases allow us to figure out what questions we need to answer to fulfill them), and allows us to be more specific about what entities we need to define, and what properties for each entity we need to capture. Hoping that was at least somewhat helpful.... |
Hi all. I'd like to be part of this, at some level. I'd also like to suggest that doing taxonomic concepts well is in an important sense a shift in value system, or value assignment. Technical definitions may be somewhat secondary, and agreeing on them is not necessarily critical to my mind. The value shift is this though: a commitment to taxonomic concepts is a commitment to support the process of systematic research/products, with particular emphasis on making the provisional, evolving, and frequently locally and temporally conflicting aspects of systematic inference and product use explicit, and indeed prioritizing software design and functions to showcase the provisional, evolving, and conflicting aspects of systematic inference making and usage. To the extent that this group can make such a commitment, I'd be excited to contribute. |
@baskaufs, thank you for your summary of the TCS discussion thread - really fantastic. Very helpful to be able to see that history. I strongly agreed that much valuable insight is often lost in the transience of internet forum and email discussions. |
I'd like to again point to this publication https://doi.org/10.1186/s13326-017-0174-5 which is on top of the thread. Please consider reading it in full. This is an ontology (proposal, if you will) that is also pilot-implemented here: http://openbiodiv.net/. It was part of a Ph.D. thesis, sponsored also by a biodiversity data publishing house, whose aims are well aligned with those of the TNC. It has a lengthy section "Domain Description" in which the issue of representing taxonomic concepts is tackled. I am not saying that there are no other important efforts, but if I had to point to a single most indicated descendent of the 2005 TCS, this just is it. I believe that if we take this paper and approach as a pragmatic foundation and begin to understand what services it can provide and which it cannot, we have a strategy to advance effectively. |
Many thanks for re-linking this publication, @nfranz! I thought I had clicked on your original link, but evidently not as this is the first I'm seeing the full publication. Although I do have some minor philosophical quibbles (e.g., I still fail to understand how a taxon concept can justifiably be called a "hypothesis", rather than an asserted opinion -- I don't agree with the arguments put forth about falsifiability), once I got past those I found the article to be very useful in framing the problem we're up against with this discussion. It's definitely worth carefully reading by anyone interested in this sort of stuff. I do have a couple of technical questions that are most likely due to my ignorance of OpenData, (SPAR Ontologies, etc.; but I'm going to take a risk and ask them anyway. Perhaps you can help clarify these. The article states that "Taxonomic Article is a subclass of FaBiO’s Journal Article". However, several other subclasses of FaBiO's Expression class (e.g., books, chapters,, etc.) also contain taxonomic treatments. Is this a problem for implementation, or are we only interested in treatments that appear in articles, or...? The article states "In OpenBiodiv-O, a taxonomic name usage is the mentioning of a taxonomic name in the text, optionally followed by a taxonomic status." If a name is mentioned several times within a single treatment, does that represent more than one TNU sensu OpenBiodiv-O? Or are they collectively contained within a signe TNU (e.g., represented by the NomenclatureHeading)? The reason I ask is that there is a subtle but important distinction between a TNU (which encompasses the entire treatment in cases where the TNU is the NomenclatureHeading), and what James Ytow referred to as "Appearances" (individual mentions of name-strings, often with abbreviated genus), which may appear many times within the context of a single TNU. I ask because, in the paragraph that follows ('For example, “Heser stoevi Deltschev 2016, sp. n.” is a taxonomic name usage.'), it seems that the TNU is the raw text string, not the Treatment as a whole, in which case the definition of TNU as asserted in the context of OpenBiodiv-O is a significant departure from how it has been defined elsewhere. An important aspect of TNUs is that there is generally a 1:1 correspondence between a Treatment and the TNU representing the NomenclatureHeading for the Treatment. However, as implied by Figure 1 of the article, a treatment often contains other TNUs (e.g. within the NomenclatureCitationList). Thus, while every Treatment has exactly one corresponding TNU, not all TNUs are treatments. I very-much like the way that "TaxonomicConceptLabel" (TCL) is defined. However, I'm not entirely sure I understand why the need for establishing OperationalTaxonomicUnit as a super class of TaxonomicConcept. In my mind, Taxonomic Concepts represent a circumscription of organisms, regardless of whether that circumscription happens to include a specimen (or more than one specimen, when heterotypic synonymy is involved) designated as a name-bearing type for a Linnean-style taxonomic name (i.e., regardless of whether the concept has a formal scientific name to label it with). Can you provide examples of instances of OperationalTaxonomicUnit that would not be regarded as instances of TaxonomicConcept? I.e., what other subclasses of OperationalTaxonomicUnit are there, and what function do they serve? Regarding the two patterns, replacement name and related name, is the former a susbset of the latter? Or are these mutually exclusive? It seems that replacement name implies congruence of concept/circumscription, whereas related name could apply to all five RCC-5 relations (or only the other four, excluding congruence), or...? Sorry for the long post -- just trying to make sure I understand the contents of and assertions in the paper correctly. |
I may live to regret this, but can I suggest another way of tackling this topic? I'm going try and be disciplined and avoid a WTF rant, and instead sketch out a way I think we can create something simple, and which might lead to some tools that people might find useful. I'm a fan of keeping things simple, reusing things, and trying to take into account what is going on elsewhere. For example, the http://schema.org vocabulary is gaining momentum, and covers a lot of things we care about (publications, people, places, etc.). I make extensive use of it in my latest toy https://ozymandias-demo.herokuapp.com. Interestingly, there is a community project to extend http://schema.org to include more life-science specific entities BioSchemas (a number of people on this list will be aware of this already). So it seems to me there's a case to be made for avoiding domain-specific vocabularies as much as possible, and trying to make our stuff as interoperable with the wider world as we can. TaxaI regard taxa as nodes in a tree. What a taxon "is" is defined by its place in that tree (although identifiers don't change if the composition changes, that way lies madness). A taxon in NCBI is ultimately all the organisms that yielded the sequences in the subtree rooted at that node. A taxon in GBIF is ultimately all the occurrences in the subtree rooted at that node. There's a proposal by @frmichel for taxa in BioSchemas](https://github.com/BioSchemas/specifications/tree/master/Taxon). This seems pretty straightforward and uses terms that will be familiar. If we use this for taxa (i.e., nodes in a classification) then we have a simple vocabulary that anyone can use, from people working in genomics with the NCBI taxonomy, to people building little taxon-specific web sites and who want to increase their visibility to Google by including structured markup (the primary driver behind schema.org). Lots of people care about taxa, let's give them a simple way to talk about them. Names, usages, etc.It seems to me that the core idea here is the pair ('a name string', 'a bibliographic locator'). The bibliographic locator can be at the level of a "work" (e.g., an article or book), in which case a identifier like a DOI is the obvious candidate. If we want metadata, the schema.org has terms to cover pretty much any aspect of an article or other publication. If we want more granularity, then the W3C Web Annotation Data Model covers pretty much everything, see https://www.w3.org/TR/2017/REC-annotation-model-20170223/#selectors. So we can refer to whole work, individual pages, XPath fragments in an XML document from, say, Pensoft, regions on a scanned page, etc. A further advantage of this is that tools such as hypothes.is use these selectors to locate annotations, and many academic publishers are adopting hypothes.is as their annotation tool. So, nomenclators are essentially lists of annotations (think of IPNI where each record is basically a name and a page location). Treating "usages" as annotations makes it easy to integrate projects such as BHL - indexing all the pages for names, record their locations as annotations, flag those annotations that have some special significance (e.g., the first publication of a name). Imagine developing a tool that overlays BHL (or any literature database) and says "here the the names on this page, and by the way this is where this species name was published". Some people care about names, many more people care about searching for information anchored to a name, use one to drive the other, and use a model that can handle both automatic text indexing as well as manual annotation. Name usages are basically annotations. The LSIDs in databases such as IPNI, Index Fungorum, ZooBank, and ION are identifiers for annotations (not "names" as such). It seems to me that name usages in the National Species Lists (NSL) are essentially annotations (with rather a lot of administrative cruff attached) Taxonomic conceptsThis seems to be the third-rail of this discussion. I'd argue that few people care about this topic, despite the acres of space devoted to it. The reason for that is that most people use whatever taxonomic classification is available to navigate the data they care about (e.g., the NCBI taxonomy if you work with sequences), and a taxonomic classification is essentially also a taxonomic concept (arguably they are the only concepts that are actually defined in any operational way). So, as a user, most people don't care. The proof of this is that science gets done without taxonomic concepts (we can argue about whether that's a good thing or not). The one version of taxonomic concept that seems tractable is the "accordingTo" idea, in other words if I'm writing a paper I can say "when I use this name I mean this". This could be something as simple as saying "subgenus Stegomyia NCBI:53541" for NCBI's view of mosquito taxonomy. If I want to refer to a different concept of what Stegomyia is (and this is a very touchy subject in mosquito taxonomy) I could cite another work, in other words (Stegomyia, DOI:xxxxx). So, a taxonomic concept is a set of one or more (name, bibliographic locator) pairs. Hence, we just need a way to represent a set (or ordered list if we think of it as a list of synonyms), and schema.org has ways to represent those. So, in its simplest form, the NCBI taxonomic concept of Stegomyia is (Stegomyia, NCBI:53541) (i.e., itself). I think this is the model also used by the Australian Faunal Directly where the authority for each taxon in the AFD classification is, of course, the AFD. We could expand the concept by listing all the synonyms, to make it more useful. If I understand the NSL model correctly, they link each node in their classification to a (name, reference) pair that corresponds to the concept in the tree. People who care about taxonomic concepts (e.g., doing taxonomy, building classifications and trying to make sense of the literature) can describe these concepts as sets of (name,reference) pairs, which seems to me to be pretty much what taxonomists actually do. SummaryI don't claim much originally here, and may well have completely misunderstood the discussion. But it seems to me there's a chance to adopt a simple, workable approach that builds on existing projects that have traction (e.g., schema.org, the W3C annotation model, bioschemas?) and hence get to the point where we, you know, build stuff that people want and need. |
Thanks, Rod – this is very good stuff. I’m on a ship with extremely limited internet access, so a more detailed reply will need to come later (if at all – lots of stuff keeping me busy when I get home).
Verify briefly:
- Taxa as Nodes on a tree: I think this is fine, and is one of a number of ways the word “taxa”/”taxon” has been used, and it’s certainly a “thing” many people care about. I have n problem fixing the word “taxa”/”taxon” to nodes on a tree, rather than something else. But I’m not sure that works for how this word is/has been used in the sense of Darwin Core.
- Yes, usages are pairing of a name and a reference. Identifying the reference with DOIs is great, as long as someone does them for all the historical references that do not already have DOIs and for new publications that don’t already have DOIs. But the “reference” part of the pairing has always been easy. The “name” part is the hard part. The simple approach is just use the literal string of characters to represent the “name” part. That’s the approach that most people have done for most of the history of trying to track this stuff. That’s the approach that created the current mess. To quote you, “that way lies madness”. So the hard part is capturing a name “entity” (or as I have always called it, a “name object”. Also, usages don’t really map well to individual pages. They map to Treatments, which typically span several pages. But that’s not really the problem. That said, I think you captured it perfectly: “Some people care about names, many more people care about searching for information anchored to a name, use one to drive the other, and use a model that can handle both automatic text indexing as well as manual annotation.” We just need to figure out what we mean by “name”.
- I think “usages as annotations” is a legitimate way to frame it (ultimately everything can be thought of as an annotation, depending on what you’re most interested in). ZooBank identifiers are explicitly NOT identifiers for names – they identifiers for nomenclatural acts (which are a subset of usages). I can’t speak for IPNI, IF, ION, etc., but I think life would be a lot simpler if we did NOT treat these as identifiers for “names”. And in that context, treating them as identifiers for annotations makes sense.
- If I understand you correctly on the Taxon Concepts stuff, then we are in complete agreement. And once you adopt the position that the most practical way to handle concepts is the “accordingTo” approach, you (should) realize that taxon concept is best represented by a usage instance (or set of usage instances).
OK, it turns out we’re the last boat to launch this morning, so I had some extra time to write the above. Therefore, this *is* the more detailed reply.
Aloha,
Rich
|
Conceptually, I agree with most of what Richard and Rod describe, taxa are nodes on a tree, though what happens when the tree branches are completely rearranged and/or there are multiple trees made up of the same branches but in a different arrangement (as currently is the case with birds). These are the scenarios that I think the Taxonomic Concept or Name accordingTo really helps to organize accurately, especially for any data aggregator, be it GBIF, EOL, Wikipedia or a researcher bringing together data on the same Taxonomic Concept from different domains. A clarification on "What a taxon "is" is defined by its place in that tree (although identifiers don't change if the composition changes, that way lies madness)." The reason I think an ID is needed to identify each Taxonomic Concept as opposed to a Name accordingTo, is that with the ID, users of these data don't need to go through the mapping exercise of their Name with Names from other providers. All instances of Name accordingTo would have the same Taxonomic Concept ID, so that the ID can be used to aggregate data. If there is one thing I've learned, the harder it is to aggregate data, the less likely it is to be aggregated by the users. I'm really thinking of this from the end user perspective, if we don't make that part simple, it won't be used. |
Thanks @jgerbracht -- I agree completely. The fundamental problem (and the reason we've never really solved this issue before) is because there are some extremely complex and subtle/nuanced relationships between organisms, names, and taxonomic relationships/classifications, and these complex issues have been further confounded by confused and inconsistent terms to describe some fundamental things. As for Taxonomic Concepts and TNUs, I think the best way to characterize this goes back to Walter Berendsohn's notion of a "Potential Taxon" -- which in our terminology would be a "Potential Taxonomic Concept". A TNU represents the cloud of information and properties for how a particular Reference treated a particular Protonym (=Name-as-object). A reasonably well-defined subset of TNUs represent "Potential Taxonomic Concepts". One of the key questions we need to figure out, with respect to the second paragraph of your post above, is whether it makes sense to collapse a set of TNUs representing confidently congruent Taxonomic Concept circumscriptions into a single "Taxonomic Concept Instance" with its own identifier and properties. I definitely think it's worth exploring, but it might make sense to first clearly define TNUs and the relationships among them; then figure out what a secondary layer of aggregated congruent TNUs into a single defined object instance. In this sense, it's important that TNUs are defined in such a way that they can be easily aggregated in this fashion, if it ends up making sense to do so. |
Reading through these threads I keep trying to figure out what problems we are trying to solve? I confess that I struggle with abstractions that don’t readily translate into something that I could imagine using and/or building. I also find it helpful to have actual examples to focus on. Looking at eBird as an example (and @jgerbracht can correct me if I’ve misunderstood) there seem to be several problems to tackle:
It seems to me that 1 is straightforward, we simply define a way to represent a tree. Many biodiversity informatics projects use trees (classifications) to help users navigate through data. Note that the tree could be explicitly defined (e.g., as a tree structure in a file) or implicitly (say, as a checklist in a paper). 2 is also straightforward if we have identifiers for classifications, and optionally some way of locating a node in a tree, again, either explicitly in a tree structure, or on a page in a published checklist. (I could see and obvious role for GBIF here in that you could publish a checklist on GBIF and use the resulting DOI to identify that taxonomy.) So I think what would be useful here is a convention for explicitly citing a given taxonomy (formalising “sec”). There is scope for exploring the best way to identify nodes in a tree (e.g., do we simply cite a node name and tree version, or do we have identifiers like eBirds that remain unchanged between trees if node is the “same”) 3 Is either trivial or difficult, depending on how you approach it. Given that the vast majority of references to taxa will be by name, we either accept the ambiguity and treat this as a effectively a search (find me every taxonomy with that name) or endeavour to work out what particular classifications a publication at a certain date may apply to (e.g., what versions of bird taxonomy were in use at that time?) 4 Is perhaps the most interesting topic, and we have seen at least two ways to think about this, either do pairwise mappings between nodes in the two trees, or compute edit operations between the two trees. Given that we are having the discussion on GitHub it may come as no surprise that I view 4 as essentially versioning. If the 2017 tree was in GitHub, we could imagine editing it as each new paper on avian taxonomy comes out, then freezing the tree and releasing a new version in 2018. The “diff” between tree 2017 and tree 2018 defines the differences between the two trees. So, I see three “products” that would be useful:
For me a really interesting test case would be to take, say, the August 2017 eBird classification, take all the taxonomic work between 2017 and 2018 (listed on the eBird cite), represent those works in terms of 2 and 4 above, that is, they reference the 2017 classification, and they describe the changes made (e.g., subspecies x is now a full species in a different genus if you think in terms of edit operations, or the equivalent set relationships if you think in terms of mapping), and see if we can then compute the August 2018 tree using just that information. This would mean we could have a way to describe taxonomic information that was computable and could be used to generate new classifications. If taxonomic information was described in that way then it would seem that the goals of aggregators and taxonomists could be aligned: the aggregator’s task is easier because the data is well described in nice, computable, citable chunks, which means the taxonomist’s work gets quickly incorporated into the aggregation in a way that gives them credit and visibility. |
Going to point to this as an example of doing 4: https://doi.org/10.1093/sysbio/syw023. |
@nfranz Thanks! Maybe we should assemble a set of relevant examples, such as the primate study you linked to, the eBird classifications, etc., and use those as test cases? For example, given the two MSW primate classifications an obvious question is how we can represent MSW2, MSW3, and the relationships between them using a simple vocabulary. Related to that goal, can we then link names and literature to those, so we could imagine giving someone a set of files and saying "here is the history of primate classification linked to all the relevant publications, enjoy!". |
+1 for assembling use cases |
+2 @deepreef It's probably best not to do this in the issues as all. I have created a folder 'use-cases'. Put them in there in any form you like. We can make them consistent filetype- and design-wise) later. |
+3 :-) |
This is a response to @frmichel's comments on the pull request. @frmichel noted problems with the Darwin Core dwciri: terms and with Darwin-SW. Just to clarify about those two things: the DwC RDF Guide (which minted the dwciri: terms) recognized that there were problems with the taxon/taxon concept/TNU in Darwin Core, but did not consider "fixing" them to be within its scope. It simply provided guidance on how to use the existing DwC terms (or their dwciri: analogs) but did not generally suggest how to clarify their meaning or add any new terms that were missing. It assumed that some future group (like this one) would fix that problem. Darwin-SW was not an TDWG effort, so it has no official standing in TDWG. It suggested a fix for the missing object properties needed to connect the Darwin Core classes, but also basically dodged the issue of clarifying taxon/taxon concept/TNU. So really, neither of those two efforts should be looked at as a solution. As far as updating the TDWG Ontologies (TaxonConcept and TaxonName) is concerned, I think it would probably be better to just focus our efforts on incorporating the good parts of them into what we build here. Although those two ontologies don't have any official standing within TDWG either, they do reflect one attempt to translate an actual TDWG Standard (TCS 1.0) into the Linked Data/Semantic Web world, and should therefore have some weight in the discussion - particularly since some members of this group have experience trying to implement them. That's really useful information. |
@rdmpage Re I would add a 5th one. How do we track a particular taxonomic concept through time/taxonomies. |
In a sense it seems to be solved for eBird by the use of stable identifiers between classifications (e.g., radshe1, although it's not clear what rules are used to carry those identifiers across trees. But yes, the success of comparing trees to computing changes does depend on how well labelled the trees are.
Can you give an example? I'm not sure that there are things which can't be computed, I suspect it's more a question of whether the changes made (and/or the reasons) are represented with enough precision to be easily converted into something a computer can handle. Taxonomy is a pretty simple affair in many ways, we have sets, we have notions of relationships among those sets, and we have collections of labels to be assigned to those sets. I think it's eminently computable. |
The tracking of taxonomic changes I'm referring to is the tracking of concepts and in cases where concepts are added or removed, the taxonomist is the one who knows the path from taxonomy 2017 to 2018 and to retroactively calculate those paths using only the starting and ending taxonomies is currently problematic at best. I agree completely with your statement that it's "more a question of whether the changes made (and/or the reasons) are represented with enough precision to be easily converted into something a computer can handle", and is something we can and should strive to help the taxonomic communities where we can (though that's certainly a very different but interesting topic for another day). I was referring to the status of taxonomies today, which do not provide those necessary details (Clement's comes close). |
Hi @jgerbracht. Yes, this is why - as I suggested here #1 (comment) - it will be hard to come to an agreement about the scope of TCS2 without resolving at least these two issues upfront:
In summary, I think the way to resolve discussions about scope is to first agree on any normative aspirations of TCS2, i.e., whether we are putting this out partly also to help make future systematics practice better, somewhat regardless of the field's legacy. We have sufficient use cases to indicate that "better" is feasible. But must acknowledge that it remains rare today. [Having done many hundreds of RCC-5 alignments myself, I believe that this is more limited by current incentive structures than by the nature of the data. But that is not so relevant for us now.] Then we need to decide how much of that "making it better" must be allowed by TCS2, versus how much of that must be enforced by it (as opposed to being enforced by TCS2-utilizing implementations). |
@nfranz It's not clear to me who TCS2 is for, or at least, there seem to be multiple possible audiences, and I'm not sure taxonomists are likely the be either the biggest nor the most important. Indeed, playing devils advocate, I'm not entirely convinced there is even a need for TCS2, given that taxonomists, biodiversity informatics projects, and genomics databases (e.g., NCBI) seem pretty happy to pump out taxonomies and lists of names without any vocabularies at all! In other words, it's not clear that people are banging on TDWG's door saying "we can't do our science without TCS2". One can certainly make a case that things could be done better if we had a better way of representing taxonomic information, but what we have at the moment seems to work OK for most purposes. So I wonder if it would be helpful to have some notion of who the users are, both of TCS2, and of products that use TCS2. At the moment much of the focus seems to be on database builders who:
Now, there is certainly a case that working taxonomists could make their work more accessible to machines by marking up their work, and providing easy means to do that would be a great TCS2 use case, although the vast majority of taxonomic work is not published in journals that support any kind of mark up. Likewise, being able to provide TCS2-enabled things that taxonomists would find useful would be great (e.g., for any taxon give a summary of the current and past taxonomies, a complete bibliography - linked to digitial versions where possible, a list of relevant specimens, especially types, essentially a "project in a box"). So I think in part any expectation of what a standard can achieve depends on who you think it is for. I don't think taxonomists care at all about 99.9% of what TDWG does, they will care about anything which makes their life easier, and which helps increase the visibility of their work. I think the people who care about TCS2 will be mostly much limited to those dealing with large chunks of data, either publishing it, aggregating it, or both. |
thanks @rdmpage, fully agree. And I can give you at least a very concrete request from the CoL+ project which seeks a new standard to share nomenclatural and taxonomic data in CSV files. DwC-A has various issues, TCS XML is actually quite alright but hard to work with, the TDWG ontology is even harder yet. I would love to see something compatible with datapackages which could replace your custom dwc archives and free us from the "star" restriction |
Thanks, @rdmpage. When Jessie Kennedy led the TCS1 effort, the scope of users was inclusive; see: And the primary underlying motivation for TCS1 was the systemic inability of name-based systems to be taxonomically precise enough: https://www.napier.ac.uk/~/media/worktribe/output-255552/scientific-names-are-ambiguous-as-identifiers-for-biological-taxa-their-context-and.pdf Also echoed here: https://www.researchgate.net/publication/6886479_A_Standard_Data_Model_Representation_for_Taxonomic_Information I vote for preserving that still very much valuable problem diagnosis legacy of the 2005 TDWG-ratified TCS1. The primary purpose was and still is to do name/relationship management as as well as possible, and do better where possible with TCS2-facilitated syntax. In that context, I think the right long-term strategy is to be more engaging towards the systematic expert community. Jessie Kennedy's history with TDWG and TCS1 possibly began with this, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.50.2436&rep=rep1&type=pdf, which is in line with the trajectory of supporting expert systematic workflows. TCS2 can be viewed as an opportunity to bring TDWG and the systematic research community closer together. |
I vote a resounding "no", in the context of TCS2.
It's not too hard to make the standard accommodate the future/ideal (when the information is available), without drowning out an effective basic mechanism for capturing what we can capture from less-than-ideal legacy sources. The enforced components should be kept to a minimum.
Hmm... what we have at the moment doesn't allow anyone to filter GBIF data on all taxa identified as X or identified as something regarded by authority Y as being a synonym of taxon X (to take an extremely over-simplified example use case that I think "most" users would like to be able to do). I think the role of TCS2 should be to allow us to capture taxonomic metadata associated with biological datasets in a way that enables automated data-enrichment through various online services (e.g., CoL+). In short, what we should be aiming for is allowing a non-taxonomist user-base to get the answers they want/need without having taxonomic expertise themselves. The status quo definitely does NOT allow this (it only makes people think they have it because they are only using text strings to represent scientific names, and are blissfully unaware of how anemic the results sets are because of it). |
@deepreef Well said @nfranz "channel or even syntactically enforce an evolution in systematic practice" I agree that TCS2 should force systematic practice, though if TCS2 is done well, the standards and eventually, the tools will be available to enable the evolution of systematic practice in regards to how one thinks about and manages taxon concepts. |
@deepreef writes: "In short, what we should be aiming for is allowing a non-taxonomist user-base to get the answers they want/need without having taxonomic expertise themselves." But, is that not very often the happy secondary effect or by-product of this more primary cause? Expert systematists have been enabled (TCS2 design), empowered (decentralization => implementation design), and incentivized (accreditation => implementation design) to transfer our knowledge via TCS2 syntax into aggregating environments. In a world where most scientists operate within a merit-based framework, how can a non-expert user base benefit lastingly if the expert contributor base does not benefit first or foremost? |
Did he die because his brain went hypoxic? Or because his lungs were full of water (causing his brain to go hypoxic)? Or because he went unconscious underwater (causing his lungs to fill with water)? Or because he had a seizure (causing him to go unconscious)? Or because he was breathing too much oxygen under pressure (causing him to have a seizure)? Or because his rebreather provided too much oxygen (causing him to breathe to much oxygen)? Or because he set up the rebreather incorrectly (causing it to provide too much oxygen)? Or because it was a bad rebreather design (making it too easy for him to set it up incorrectly)? Why did he die? Sorry for that weird/morbid analogy, but it sounds like we're making the same point at slightly different levels. My statement about what we should be aiming for isn't the "secondary effect" (happy or otherwise), it's what I see as the terminal goal (within the scope of TCS2). There are many things that need to happen in order to achieve that terminal goal. Certainly among them are steps that enable, empower, and incentivize scientists to to play their role in extracting and synthesizing information from raw data (occurrence records, literature information, etc.) and transforming it in a way (TCS2) that serves a function to non-scientists (or scientists lacking specific expertise). The point has been made many times over many years that if all we achieve with TCS is the goal of allowing taxonomists easier access to data to help them achieve their taxonomic goals, then we have failed. We certainly do need to do that, but in a way that facilitates something useful to a much broader audience. |
I realize the issue has been closed but I would like to nevertheless answer the questions @deepreef raised on Sep. 8. I apologize for the late reply but other commitments prevented me from writing a detailed response. I am copying Lyubo's new PhD student Maria (mdimitrova095 at gmail.com) as well, as she is slowly transitioning to maintaining the pioneering biodiversity knowledge graph OpenBiodiv.
If a taxonomic concept is an unfalsiable opinion, it must logically follow that taxonomic circumscription does not follow the scientific process. If you want the taxonomic process to contend to describe the real-world in a Popperian fashion, then it is necessary that the opinion can be checked against some form of experiment. In the case of taxonomic concepts, a single taxonomic concept can be checked as to whether or not it follows some species concept.
Please, feel free to get back to me per email or Skype whenever you wish---I am more than willing to discuss this should this explanation fall short.
Neither. While Taxonomic Article is a subclass of Journal Article, a Treatment is a subclass of Discourse Element. From the guide:
The above code is in OWL. Without going into too much detail it is the standard way Peroni and Shotton deal with discourse elements such as special sections in the article (e.g. Introduction, Methods, Discussion, etc.).
Yes. Each text area is a single TNU with a unique identifier. This is modelled after the Mention class of the base ontology PROTON Extensions module.
No.
Possibly. However, in the broader Natural Language Processing (NLP) community, this is how "mentions" of particular entities are modeled. E.g. if I have text about Germany, I will have in it
In a system, where there is a bijective mapping between Treatment and TNU, one of these two classes is extraneous. This is not the case in OpenBiodiv-O as it tries to provide only way to express any given statement.
True. Treatments are specialized discourse elements. Treamtents are expressions of the more abstract class class concept. Think of this like this: a treatment is the "writing down" of the idea that the concept represents. In order to fully appreciate this, please refer to page 6 of the FRBR model.
Thanks. This is @taxonbytes idea.
This is a point of modeling and different ways to do this are possible without sacrificing expressivity. My idea was, however, to make taxonomic concepts the biodiversity-grouping concepts that are formed by taxonomists and that can be identified with taxonomic concept labels (Aus bus sec. X). Clearly, one may form a biodiversity-grouping concept in a non-traditional way: e.g. a BOLD BIN would be an example of that. Such a "taxonomic concept" will not have, at least initially, a taxonomic concept label. However, The BOLD BIN is clearly a falsifiable hypothesis about a unit of biodiversity. In a different example, may I bring up my current work on an entirely new system of grouping organisms on the basis of integrative information and Deep Neural Networks. The biodiversity operational units that BOLD or my system form will be biodiversity-grouping concepts, as well. In order to distinguish such circumscription from the more traditional Linnean one, I have restricted taxonomic concept to denote the set of biodiversity-grouping concepts that can be formed with traditional means, and relaxed operational taxonomic unit to denote the set of all concepts about units of biodiversity. Note, I could have used the clunky biodiveristy-grouping concept that I am using in this paragraphs, but I decided to defer to Sokal and use the established term OTU, which has already been used for numerical circumscriptions and will not suffer by this extension.
Replacement name and related name are properties, i.e. binary relations:
It is a little hard for me to parse "replacement name is a subset of related name." Neither of these two objects are sets: they are binary relations. What is true, though, is a) related name is a reflexive property. I.e. if
No. One implies the other (not in the ontology but in the extension), but not the inverse.
Both of these relations are weak and underdetermined as they describe relationships between names that are unsuitable proxies for taxonomic concepts. They may imply something about the taxonomic concept aligments, but mostly they only imply nomenclatural statements. @taxonbytes has done some logic (Franz, Nico M., Chao Zhang, and Joohyung Lee. "A logic approach to modelling nomenclatural change." Cladistics 34.3 (2018): 336-357.) to model how one can be deduced from the other.
Sorry as well for the long. This stuff is very hard to describe formally but there is no way around it if you want to make a computer systems that reasons about it. |
Thanks, @vsenderov! I will reply via email to the CC list. If anyone following the GitHub thread is interested in this discussion, please let me know and I'll forward my reply to you. |
This aged well 😉✅ |
You write:
A taxonomic concept is a taxonomic name instance establishing or circumscribing a taxonomic entity - often linking synonymic inclusions and adding annotations, description…
I think it's cleaner to say that the taxonomic concept is a theory of a certain taxonomy identity. And then "taxonomic concept label" (name sec. source) is the "name" for that theory.
More or less like here: https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0174-5
...
Best, Nico
The text was updated successfully, but these errors were encountered: