-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set_coverage()
: Express common names in commonName
in taxonomicCoverage
#344
Comments
set_coverage()
: Express common names in commonName
in `taxonomicCoverageset_coverage()
: Express common names in commonName
in taxonomicCoverage
Good catch, @peterdesmet -- I agree this is a serialization error. The common name should be in the <taxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Class</taxonRankName>
<taxonRankValue>Aves</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Anas platyrhynchos</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Common</taxonRankName>
<taxonRankValue>mallard</taxonRankValue>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicCoverage> |
I see where the problem lies -- the
So the code is behaving as planned, and there is no capability to add a commonName using the convenience method. That said, you could use the convenience method to create the structure, then add the commonName to the list manually like so: library(EML)
df <- data.frame(Class = "Aves", Species = "Anas platyrhynchos")
coverage <- set_coverage(sci_names = df)
coverage$taxonomicCoverage$taxonomicClassification[[1]]$taxonomicClassification$commonName <- 'Mallard'
str(coverage$taxonomicCoverage)
#> List of 1
#> $ taxonomicClassification:List of 1
#> ..$ :List of 3
#> .. ..$ taxonRankName : chr "Class"
#> .. ..$ taxonRankValue : chr "Aves"
#> .. ..$ taxonomicClassification:List of 3
#> .. .. ..$ taxonRankName : chr "Species"
#> .. .. ..$ taxonRankValue: chr "Anas platyrhynchos"
#> .. .. ..$ commonName : chr "Mallard" Created on 2022-06-03 by the reprex package (v2.0.1) This produces the following XML output, which matches the IPT format. <taxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Class</taxonRankName>
<taxonRankValue>Aves</taxonRankValue>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Anas platyrhynchos</taxonRankValue>
<commonName>Mallard</commonName>
</taxonomicClassification>
</taxonomicClassification>
</taxonomicCoverage> |
Thanks for answering! I guess there are two solutions then:
|
Yeah, I was contemplating both of those as well. (1) seems like a good idea, (2) seems like a bit of a hack, especially as |
there are a handful of functions in the
|
👍 Yup, I think it's deliberate that some functions support a simplified subset of the schema, (complex structures can always be built with the list-constructors and should still be covered by the validator). Even with good documentation, too much complexity can be off-putting to new users. Sometimes I think we can nudge people to better options too. For instance, very few programmatic parsers of text-delignated tabular data can handle multiple missing value codes, even though this case does occur in real-world data. It's important the EML can support such cases, but it might be useful to nudge folks away from some options. For taxonomy, I think it would be useful to nudge people to taxonomic identifiers. As you know, these provide mappings to naming authorities, common names in different languages etc. |
For what it's worth, the dataframe I wanted to provide was one like:
Which I reduced to the convenience format:
But as originally described, the common name ended up as a child. I would be happy with a convenience method that supported 1 common name (and 1 taxonID) for the lowest rank in the row. In fact - since I don't use the hierarchy because the IPT does not support it - I would be fine with ID, scientificName, rank, vernacularName. |
@peterdesmet Thanks! Nice example. I think it would be better though for the user to specify only the taxonID, from which we derive the rest. Note that I think we would need to use recognized prefixes on the taxonID or some other indication of which authority we're referring too, e.g. that looks like a a stable (i.e. post 2019) Catalogue of Life ID. Note that according to COL, the English common name appears to be "red fox". When the user gives both a taxon ID and other data such as a common name(s) that may not match that associated with the taxon ID, it's not obvious which one the R package should add to the metadata. e.g.:
|
It would be nice to provide that as an option, but I think users should still have the option to provide a list of taxa and be done with it. |
I agree with that sentiment, @peterdesmet -- while the taxon databases are good for looking up official scientific names, vernacular names are often locally idiosyncratic, and people may want to include them in the metadata to establish that local usage. |
Taxonomic coverage can be provided as a data frame where each column is a rank.
Common
is also considered a rank (https://docs.ropensci.org/EML/reference/set_coverage.html#note):Created on 2022-06-03 by the reprex package (v2.0.1)
GBIF IPT handles Common names in a separate property:
I'm not sure is that is according to EML specs, but if so, would it be possible for
set_coverage()
to treatCommon
the same way?The text was updated successfully, but these errors were encountered: