-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a mixcr result, too long CDR3 sequences were generated #332
Comments
Please post raw alignments from this clone, like this:
|
What MiXCR version do you use? |
Sorry, I check carefully the mixcr-2.1.5 result, most of the too long CDR3 sequence do not contain terminating codons or in non-coding boxes. While after I Remove these clone, there are still some long sequences that have been retained. The following files is one clone with too long CDR3 sequence, and the mixcr result is different from the IMGT-alignment-result: |
Thank you for your help! |
Hi! Sorry for the late response. Both sequences you provided seems to be an artefact. In the first one, that you mentioned in the first message, the wrong J gene was chosen by the splicing machinery. We've seen such sequences a lot in RNA-Seq data. As you can see on the picture below, after successful rearrangement many "acceptor" splice sites near the J genes still remain in the gene sequence (marked by stars), and splicing, as one would expect, can't perfectly distinguish between these sites, and selects wrong one from time to time:
MiXCR uses top J gene (by score) as a basis to locate the CDR3 boundary, in this case Here is also BLAST result for this sequence: The second case seems to be just a a faulty rearrangement. I searched the sequence in BLAST, and here is the result: So, summing up:
|
I use the pair-end 150 sequencing method ,but now I find too long CDR3 sequences were generated in the result,Spliced sequence is "CGCTCAGGCTGGAGTCGGCTGCTCCCTCCCAGACATCTGTGTACTTCTGTGCCAGCAGTTACGACGGACAAAGAACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCGGTAAGCGGGGGCTCCCGCTGAAGCCCCGGAACTGGGGAGGGGGCGCCCCGGGACGCCGGGGGCGTCGCAGGGCCAGTTTCTGTGCCGCGTCTCGGGGCTGTGAGCCAAAAACATTCAGTACTTCGGCGCCGGGACCCGGCTCTCAGTGC",but the corresponding CDR3 sequence in the result is “TGTGCCAGCAGTTACGACGGACAAAGAACAGATACGCAGTATTTTGGCCCAGGCACCCGGCTGACAGTGCTCGGTAAGCGGGGGCTCCCGCTGAAGCCCCGGAACTGGGGAGGGGGCGCCCCGGGACGCCGGGGGCGTCGCAGGGCCAGTTTCTGTGCCGCGTCTCGGGGCTGTGAGCCAAAAACATTCAGTACTTC”,However, different CDR3 sequences were obtained after IMGT alignment. After blast, this sequence was found to be a true gene. I wonder Whether this CDR3 sequence should be retained?Thank you !
The text was updated successfully, but these errors were encountered: