-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in submitting annotations to NCBI #330
Comments
Hi @kevinmyers , I will add a couple of sanitizing rules and steps so that we will be able to handle as many of them as possible, soon. Thanks again. I'll keep you updated. |
No problem. Bakta is the best annotation tool I've used for annotating our metagenomics samples. I love it and am happy to do whatever I can to help improve it. |
OK, I have added a few additional checks and product improvements fixing the following:
However, for the following I need an example or better the exact feature entry, e.g. from the
|
Thanks @oschwengers! I'm attaching one of the discrepancy reports for the tmRNA problem. Here is the associated lines in the GFF file:
|
Hmm, very odd/interesting. There are indeed entries in UniRef solely annotated with Again, thanks a lot for reporting! These changes are now public in the |
I submitted Bakta annotations to NCBI this week and over half had some fatal errors. They weren't hard to fix, but I wanted to let you know in case there's something that can be done with a future update to avoid them. I am using Bakta version 1.9.1 installed using
conda
and ran with the--compliant
tag.FATAL: SUSPECT_PRODUCT_NAMES: 1 feature equals 'tmRNA'. Is this a tmRNA or is it a protein?
(Looking at the product it appears to be a hypothetical protein, so I changed it to that)
FATAL: SUSPECT_PRODUCT_NAMES: 1 feature starts with '-'
(Product name: putative-PNPOx domain-containing protein)
FATAL: SUSPECT_PRODUCT_NAMES: 2 features start with '''
(Product name: 'chromo' domain containing protein)
(Product name: 'Cold-shock' DNA-binding domain)
FATAL: 1 feature contains 'remnant'
(Product name: Remnant of transposase, IS3 family)
FATAL: SUSPECT_PRODUCT_NAMES: 2 features contain '#'
(Product name: ATPase/5###-3### helicase helicase subunit RecD of the DNA repair enzyme RecBCD (exonuclease V))
(Product name: 3###-5### helicase subunit RecB of the DNA repair enzyme RecBCD (exonuclease V))
(Product name: putative DNA-binding protein with ###double-wing### structural motif, MmcQ/YjbR family)
(Product name: Anthranilate synthase, amidotransferase component Para-aminobenzoate synthase, amidotransferase component # TrpAbPabAb)
(Product name: Chorismate mutase I # AroHI)
FATAL: RRNA_NAME_CONFLICTS: 3 rRNA product names are not standard. Correct the names to the standard format, eg "16S ribosomal RNA"
(Product name: (partial) 23S ribosomal RNA)
(Product name: (5' truncated) 16S ribosomal RNA)
The text was updated successfully, but these errors were encountered: