Software, architecture, and data index design for the 2018/2019 Virus Discovery Project
Here we present a compromise pipeline for extracting virological information from publicly available metagenomic datasets, in order to present a usable index to the virological research community.
ref_viruses_rep_genomes_v5 Subset of refseq_genomes “latest_refseq[Prop] AND viruses[Organism]”
ref_viroids_rep_genomes_v5 Subset of refseq_genomes “latest_refseq[Prop] AND viroids[Organism]”
NCBI_VIV_protein_sequences_v5 nr Sergey Resenchuk supplied GI list
NCBI_VIV_nucleotide_sequences_v5 nt Sergey Resenchuk supplied GI list