Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incertae_sedis VS NA Taxonomic Assignment #2004

Open
mya-darsan opened this issue Aug 22, 2024 · 1 comment
Open

Incertae_sedis VS NA Taxonomic Assignment #2004

mya-darsan opened this issue Aug 22, 2024 · 1 comment

Comments

@mya-darsan
Copy link

Hello!

I used UNITE to assign taxonomy.
When looking at the assignments some received:

  • OTU_6 k__Fungi NA NA NA NA NA NA
  • OTU_4 k__Fungi p__Fungi_phy_Incertae_sedis c__Fungi_cls_Incertae_sedis o__Fungi_ord_Incertae_sedis f__Fungi_fam_Incertae_sedis g__Fungi_gen_Incertae_sedis NA

I am wondering what the difference is between an OTU being assigned NA and Incertae_sedis?

Is NA just completely not found in the database (therefore not a fungus even though it get kingdom fungus?) and Incertae_sedis is when it is a fungus but the relationship is unknown?

@benjjneb
Copy link
Owner

NA is assigned when the taxaonomic classification method finds different assignments at a given taxonomic level from subsets of the sequence than it did from the full sequence. The specifics of how this is done are described in the original paper on the naive Bayesian classifier: https://doi.org/10.1128/AEM.00062-07

An Incertae_sedis assignment means that the taxonomic classification method found a reference sequences with Incertae_sedis at that taxonomic level from the full length sequences, and from most of the subsets of the sequences. So, in some sense this "classification" is confident. However, Incertae_sedis means that the taxonomic placement at that level is uncertain. Thus, I would generally interpret this as the same as an NA -- we don't know the classification.

In shorter version: NA comes from uncertainty at the level of comparing the query sequence to the reference database, while Incertae_sedis comes from uncertainty of taxonomic assignments for the reference database entries themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants