-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VADR doesn't annotate second segment of segmented CoV genome #52
Comments
@taltman can you please provide the |
@nawrockie , here is the
The relevant section:
Also, as a reminder the Docker container for testing out the environment that I'm using is Thanks for any insights you might be able to share! |
I guess I could do the following:
It's not too hard, but I was hoping there might be a more elegant way to do it. Please let me know if you have any suggestions. Thank you! |
Thanks for providing the command that you used. After recognizing that the concatenated sequence is homologous That region doesn't match well to the model, but the alignment has been Unfortunately, if you want to force vadr to annotate these seqs I see that you are trying to do remote homology detection/annotation That said, the best way to maximize the remote homology detection However, the power of this nucleotide profile for remote homology |
Thanks for your reply. Yes, we are using VADR to annotate CoV genomes that are very distant from the known CoV genomes in GenBank. I agree with your described approach ( we in fact used protein-space searches to find these distant CoV genomes). But due to lack of time and funding, I was trying to leverage an existing pipeline, and VADR performed the best. I'd be happy to "pay back" the VADR project by doing any sort of model building with the newer genomes, to help the 'corona' model include these distant members. Just let me know how I can help. Is this model building described in the Wiki? |
I forgot to mention: the concatenation + genomic interval arithmetic method worked! |
The Serratus Project expanded the set of known CoV/nidovirus genomes, including segmented ones. An example of a segmented nidovirus similar to the ones that we found is the Pacific salmon nidovirus (MK611985.1). Please see Figure 3 of our preprint for more context:
https://www.biorxiv.org/content/10.1101/2020.08.07.241729v2
When I try to annotate the AmexNV genome, with two segments in the input FASTA file, VADR 1.3 annotates the first segment, and then reports the following for the second one:
Yet, when I concatenate the two contigs with a run of 16 Ns: I get additional annotations (see below). Is there a way for VADR to recognize the multiple segments, and annotate them individually? (see below for the input files used)
Additional annotations:
Original FASTA file with two segments:
SRR6788790.epsy.fa.txt
Modified FASTA with the two segments concatenated:
AmexNV-one-contig-test.fa.txt
The text was updated successfully, but these errors were encountered: