-
Notifications
You must be signed in to change notification settings - Fork 34
ref_rdrp1
lukepereira edited this page Mar 29, 2023
·
3 revisions
For the translated-nucleotide search of viral RNA-dependent RNA polymerase (RdRP). Also referred to "RdRP search" in the main Serratus manuscript.
rdrp1.fa
- Sequence and Index Directory:
s3://lovelywater2/seq/rdrp1/
- Make notebook entry
Database was compiled from.
-
- The
wolf18
collection is a curated snapshot (ca. 2018) of RdRP from GenBank. link
- The
-
- The
wolf20
collection is RdRPs from assembled from marine metagenomes. link
- The
-
- All viral GenBank protein sequences (release version 241) were aligned with diamond --ultra-sensitive against the combined
wolf18
andwolf20
sequences (E-value < 1e-6). These produced local alignments which contained truncated RdRP, so each RdRP-containing GenBank sequence was then re-aligned to thewolf18
andwolf20
collection to "trim" them towolf
RdRP boundaries.
- All viral GenBank protein sequences (release version 241) were aligned with diamond --ultra-sensitive against the combined
-
- The above algorithm was also applied to all viral GenBank nucleotide records to capture additional RdRP not annotated as such by GenBank . A region of HCV capsid protein shares similarity to HCV RdRP, sequences annotated as HCV-capsid were therefore removed. Eight novel coronavirus RdRP sequences identified in a pilot experiment were added manually. The combined RdRP sequences from the above collections were clustered (uclust) at 90% amino acid identity and the resulting representative sequences (centroids, N = 14,653) used as the
rdrp1
search query.
- The above algorithm was also applied to all viral GenBank nucleotide records to capture additional RdRP not annotated as such by GenBank . A region of HCV capsid protein shares similarity to HCV RdRP, sequences annotated as HCV-capsid were therefore removed. Eight novel coronavirus RdRP sequences identified in a pilot experiment were added manually. The combined RdRP sequences from the above collections were clustered (uclust) at 90% amino acid identity and the resulting representative sequences (centroids, N = 14,653) used as the
-
- Deltavirus antigen protein sequences added manually from NC_001653, M21012, X60193, L22063, AF018077, AJ584848, AJ584847, AJ584844, AJ584849, MT649207, MT649208, MT649206, NC_040845, NC_040729, MN031240, MN031239, MK962760, MK962759, and eight additional homologs we identified in a pilot experiment.