Skip to content

Serratus Annotation

Rayan Chikhi edited this page Jul 27, 2020 · 9 revisions

Data on S3

GenBank genomes

All darth annotations are in :

s3://serratus-public/seq/cov5/annotations.nt_otus.id99/

For PFAM alignments, make sure to only download *.darth.alignments.fasta and disregard the *.pfam.alignments.fasta and *.transeq.alignments.fasta. The .darth.alignments.fasta files correspond to the transeq/alignments.fasta file (which handles multi-contig assemblies) inside the Darth folder, when it exists, and otherwise pfam/alignments.fasta (which was made for single-contig assemblies).

Date of full run: 26-07-2020

Assemblies

All darth/serratax/serraplace annotations for the master table assemblies are in

s3://serratus-public/assemblies/annotations/

Same remark as above. For alignments, take *.darth.alignments.fasta.

Date of full run: 27-07-2020

Note: some annotations are also present in s3://serratus-public/assemblies/other/[accession].coronaspades/. Please disregard those, as they're only for Checkv-filtered assemblies, not the for BGC-filtered assemblies present in the master table.

Resources to Evaluate

Gene-Calling

(with emphasis on methods able to handle frameshifts/indels/ribosome slippage)

Lightweight Annotation Pipelines

Genome submission requirements & documentation

NCBI Submission instructions for viruses: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/docs/submit/#submit_other_viruses

From Steven Hallam:

Here are some specific viral annotation resources to consider:

https://www.viprbrc.org/brc/home.spg?decorator=vipr [a nexus for virus pathogen resources]

*** https://www.viprbrc.org/brc/home.spg?decorator=corona_ncov [specific information in ViPR on SARS-CoV-2, contains more than 1,607 genomes with Genbank Accession numbers] ***

https://www.viprbrc.org/brc/vigorAnnotator.spg?method=ShowCleanInputPage&decorator=corona [VIGOR4 viral genome ORF predition developed by JCVI]

https://www.ncbi.nlm.nih.gov/genome/viruses/ [a useful resource for viral genomics including links to database]

http://www.virusite.org/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6381048/ [this paper is very relevant to viral ORF prediction]

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2606-y [lightweight viral annotation pipeline]

https://msphere.asm.org/content/3/2/e00069-18 [a viral annotation database cured from NCBI with manual refinement]

Clone this wiki locally