-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refseq_virus/NCBI taxonomy behind current VMR.39 on more than a year #86
Comments
Thank you for pointing this out and great to hear that Metabuli was working well. |
Do you have access to AWS S3? This directory contains all required files including dmp and nucleotide fasta >2Gb. |
Great! I'm downloading the directory. |
Thank you for detailed explanation!
|
as first step take column "species" from VMR and try map them into NCBI names.dmp file interesting how many species from VMR can not be mapped to current NCBI taxonomy. I'll do it myself later today as well. I want recreate metabuli reference database with accession2speciestaxid |
NCBI taxonomy outdated on species level as well. Alphacrustrhavirus wenling |
s3://serratus-public/igortu/metabuli.VMR.39/vmr.gc - genetic codes for genbank accessions s3://serratus-public/igortu/metabuli.VMR.39/ReadTSV.py you can modify script as you need: |
Viral Refseq last time was synchronized with VMR in May,2023. |
Thank you so much for sharing your DB.
|
SARS-COV2-2 : severe acute respiratory syndrome coronavirus 2 presented in VMR.39 as MN908947:NC_045512 - NCBI Taxonomy still report it After you you rebuild metabuli viral db I would suggest you try it on ICTV challenge: |
VMR.39.2 just released, |
Hi Igor !! I finally built a virus DB using VMR.39.2 Please use this version. |
Very good!I think , as first good test can be taken ICTV Computational Challengeictv.globaljust compare NCBI taxonomy vs ICTV.VMR.39.2 metabuli output and put difference somewhere on github.Only question I have for now:How you deal with bacterial genomes(accessions for prophages inside complete chromosomes) which presented in VMR? included full chtomosomeexcluded completelyonly prophage region included?Sent from my iPhoneOn Oct 15, 2024, at 4:00 AM, Jaebeom Kim ***@***.***> wrote:
Hi Igor !! I finally built a virus DB using VMR.39.2
Could you try this database and give feedback?
It's available here.
https://hulk.mmseqs.com/jaebeom/vmr39.2/
(Some viruses without genbank accession were missed)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
This is a good example of how ridiculous nomenclature can be. Now all sarbecoviruses are all called this, while demolishing previous species level would definitely cause more confusion than understanding. |
"Virus Name" is not really tax rank. it is assembly name or sequence name, it can be non unique, linked to assembly accession(NCBI), proteome_id (UniRef), isolate_id(ICTV). Taxonomy tree after VMR introduction should contain only officially recognized ICTV names, everything else need to be organized different way outside of taxonomy. As one possible decision , see Serratus.io project which process whole SRA in real time (sponsor -Amazon cloud) |
I used full sequences. |
you need take location provided in VMR or exclude them completely, |
I would suggest rebuild metabuli viral reference database using current official viral taxonomy provided on https://ictv.global/vmr
Technically, it is not difficult.
Your tool processed it very smoothly.
If interested, I can provide you url where new database can be downloaded.
You will be suprprised how many virus names/lineages can not be synchronized between these two major resources - thousands...
The text was updated successfully, but these errors were encountered: