-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building genbank/refseq databases from assembly_summary.txt #7
Comments
Much belated update that I think finally resolves this - The moreover, I'm 99.9% sure that wort uses the same information as in closing! |
related to sourmash-bio/sourmash#970
Each subset of RefSeq and GenBank has an
assembly_summary.txt
file.This is from fungi: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/fungi/assembly_summary.txt
All refseq subsets: https://ftp.ncbi.nlm.nih.gov/genomes/refseq/
Benefits of using
assembly_summary.txt
:assembly_accession
,organism_name
,infraspecific_name
andasm_name
. For example, for GCF_001477545.1, the name could beGCF_001477545.1 Pneumocystis carinii B80 strain=B80, Pneu_cari_B80_V3
taxid
field can be used to generate TaxInfo and save it in the Zipped SBT during indexing. Because we control both the name (instead of using--name-from-first
) and how it is saved in theTaxInfo
, scripts for converting results like gather_to_opal.py can be simplified.gather
orsearch
)More info: https://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_Downloading_Genomic_Data.pdf
The text was updated successfully, but these errors were encountered: