Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed in Arctos: Catalog record type for records that exist only as sequence data #8293

Open
campmlc opened this issue Nov 11, 2024 · 9 comments
Assignees
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Function-Relationship Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.

Comments

@campmlc
Copy link

campmlc commented Nov 11, 2024

Tell us what you are trying to do

With the increasing use of whole genome sequencing, we are now able to extract the identities not only of the sequenced individual but also its parasites (or host), endosymbionts, pathogens, etc. Arctos can handle this in two ways. We can add multiple identifications to a preserved specimen record to reflect the identification of the "host" as well as all parasites, pathogens, endosymbionts etc associated with it. We can also, and preferably, create related records linked via "parasite of" or "collected with" relationships between the original specimen and other sequenced entities discovered through whole genome sequencing. If these are published through a rigorous process that excludes contamination, we can reasonably assume that this sequence is a related taxon and thereby record of some kind.
My question is - if we create a new record with the related identification based on genomic evidence, what is the catalog record type in Arctos, or the "basis of record" for GBIF? PreservedSpecimen? MachineObservation? HumanObservation?

What are relevant pages in Arctos

Provide a link to or a description of the page where you need help.

@campmlc campmlc added Priority-Normal (Not urgent) Normal because this needs to get done but not immediately. Function-Relationship Aggregator issues e.g., GBIF, iDigBio, etc labels Nov 11, 2024
@mkoo mkoo added this to the Needs Discussion milestone Nov 14, 2024
@mkoo
Copy link
Member

mkoo commented Nov 14, 2024

great questions and topic-- this is a great topic for a WG meeting too.

using relationships between records makes sense and keeps things discoverable. As for the record based on whole genomic sequencing, I'd be inclined to adopt what I've seen for eDNA samples -- i.e., MaterialSample (not MachineObservation to distinguish from camera traps). But maybe a new term is needed? (just first glance thoughts...)

@campmlc
Copy link
Author

campmlc commented Nov 14, 2024

Happy to have this be an AWG topic.

@dustymc
Copy link
Contributor

dustymc commented Nov 14, 2024

I don't think AWG can help - this isn't our vocabulary, and it's either clear or it's not. (It's not to me...)

https://dwc.tdwg.org/list/#dwc_MaterialSample is, I think, something else.

https://dwc.tdwg.org/terms/#materialsample / https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type#materialentity

(Did something change, or was something entered improperly? Why materialsample <--> materialentity ???)

A material entity that represents an entity of interest in whole or in part.

seems reasonable, and so does

https://dwc.tdwg.org/terms/#preservedspecimen / https://arctos.database.museum/info/ctDocumentation.cfm?table=ctcataloged_item_type#preservedspecimen

A specimen that has been preserved.

@tucotuco can you steer us towards an answer?

@campmlc
Copy link
Author

campmlc commented Nov 14, 2024

In this case, and in others that I only have sequence data for, there is no preserved specimen. I can use Material Entity, but it does not constitute "material". It is merely an observation/deduction, based on the available matching sequence data in GenBank or ENA at this moment in time given current methods.

@Jegelewicz
Copy link
Member

t is merely an observation/deduction, based on the available matching sequence data in GenBank or ENA at this moment in time given current methods.

Isn't that the answer? HumanObservation?

An output of a human observation process. Human observations are unvouchered and are expected to have NO parts.

@dustymc
Copy link
Contributor

dustymc commented Nov 14, 2024

there is no preserved specimen

Oh - yea, I agree with @Jegelewicz and don't see any ambiguity in that situation.

@campmlc
Copy link
Author

campmlc commented Nov 14, 2024

But it could be machine observation, because this ID is based on a computer algorithm to suggest the match.

@dustymc
Copy link
Contributor

dustymc commented Nov 14, 2024

could be machine observation

An example containing all of the information would be most useful, it's very difficult to be helpful from the dark.

If there's a machine-produced indirect evidence (eg media record) of an Occurrence then machine observation would be correct.

@jrpletch
Copy link

I can give some context here. I've been working with WGS data from tapeworms and fleas trying to extract mitochondrial genomes for phylogenetic analysis. On a whim, I used the host (a vole) as a seed sequence in Novoplasty, which then managed to extract a whole mitochondrial genome that came back as a vole on Genbank. For my data, I am interested in pulling out any other species that may happen to be present in the sample (such as bacteria, viruses, other parasites, and host DNA). For the fleas it would be particularly interesting to see if it can give evidence of feeding on hosts other than the one it was collected from. So my question was if I were to upload host (or other non-target) sequences to Genbank, how would it be best to link those to Arctos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Aggregator issues e.g., GBIF, iDigBio, etc Function-Relationship Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.
Projects
Status: To do
Development

No branches or pull requests

8 participants