-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export identification status (validated, dubious, predicted) to DWCA #764
Comments
Also tagging @rubenpp7 |
Update: To indicate the status of the id, in the DarwinCore field identificationVerificationStatus: in EurOBIS we will not use "Dubious according to human", only: "Predicted by machine" and "Verified by human" |
Indeed, as of today, what is not |
Code browsing:
|
Doc browsing:
|
…out while trying last version of mypy.
…us inside the same taxon.
…de the same taxon and sample.
An example with mix of Predicted and Validated occurrences. The corresponding Emofs distinguish the 2 different occurrences inside the same sample. |
We decide to only mention the latest validator, who has the authority on the validation. This field is therefore used to "know who to blame" 😉 Since one occurence corresponds to one or more objects in EcoTaxa, this should be the concatenated list of all validators (separated by | )
When validated, this should be a paper/book. For us it would be the future EcoTaxoGuide. Storing this for each object seems like a waste of bits. When predicted, the best practices document mentions that it should be a reference to the model. We don't store those and even if we did, they would not guarantee reproducibility. => We do not use this field for the moment.
Giving the links to all images is not realistic. Giving the link to the project is (i) not guaranteed to work forever, (ii) redundant with the link back to EcoTaxa at the level of the whole dataset. => We do not use this field for the moment. |
Currently, we export only validated objects in DWCA (@grololo06, can you confirm?)
A proposal is underway (by @PatriciaCabrera) to use the DarwinCore field
identificationVerificationStatus
to indicate the status : "Verified by human", "Dubious according to human", "Predicted by machine".This maps directly to the statuses in EcoTaxa. 🥳
But an occurrence in the
occurrence.txt
file of a DWCA (i.e. a line) can only have oneidentificationVerificationStatus
; this means that, to use this field, the abundances/concentrations/biovolumes would need to be summed by sample + taxon + status; then for a taxon that has objects of the three statuses, there would be three lines inoccurrences.txt
and 3 lines inemof.txt
, each the with concentration corresponding to the objects with the given status. Then it would be the responsibility fo the user of the data to decide if he/she wants to sum all three (and risk mistakes), keep only the validated (and risk underestimating concentration), etc.The text was updated successfully, but these errors were encountered: