N bases #1799

bshim181 · 2024-09-23T13:26:19Z

i have noticed in that in the reports files, if there is a base N in the gene feature sequences, it translates to amino acid X.
I was wondering if there is a way to handle those base Ns. is there a way to replace those bases based on the reference?

mizraelson · 2024-09-24T00:28:24Z

Hi, what command do you run to analyze the data?

bshim181 · 2024-09-24T14:08:02Z

preset of analyze rnaseq-full-length with MiXCR version 4.3.2 I believe. Is there a possibility where updating to newer version of MiXCR might solve the issue?

Also, if updating to new MiXCR version is hard to do(predefined sets of workflow), is there a way to modify the parameter to handle this?

bshim181 · 2024-09-27T01:49:36Z

From looking at alignment files, it seems like alignment gaps leads to these ambiguous base of N.

mizraelson · 2024-09-27T03:46:09Z

Not exactly. In the example above, there is no ambiguity, but rather a single nucleotide deletion in FR3, which will shift the reading frame, rendering the clone non-productive.

The appearance of “N” occurs during the assembleContigs step, when MiXCR extends the initially assembled CDR3 clones to cover more regions of the sequence. This is where ambiguity can arise. You can discard such sequences by adding the following to the analyze command:

-MassembleContigs.parameters.discardAmbiguousNucleotideCalls=true to the analyze command.

bshim181 · 2024-10-01T15:51:19Z

Regards to the image I have sent above, so I looked at an example where N base appeared in the sequence.

This is the sequence I looked at, I believe in the FR3 region with two Ns in the sequence.

I see two different pools of reads. Out of total of 21 reads that cover map to this clone, about half of the reads have this variation.

At these two positions with N, I am seeing deletion in the first N position and mismatch in the second N position (mismatch between reference=G and query=C)

For another half of reads, I am seeing mismatch in the first N position and the match to the reference in the second position.

Rather than replacing these bases with N, is there a possibility to output all possible sequences with variants? we are also interested in mutations within vdj sequences and these read evidences might be pointing toward potential biologically relevant targets.

mizraelson · 2024-10-01T23:27:43Z

Did you try using:
-MassembleContigs.parameters.discardAmbiguousNucleotideCalls=true ? Do you still see Ns in the sequences?

Regarding the first case: a deletion of A nucleotide in FR3 will lead to a frameshift in translation of CDR3, FR4 and C gene and this clone will not be functional.

bshim181 · 2024-10-02T13:26:01Z

I have tried using -MassembleContigs.parameters.discardAmbiguousNucleotideCalls=true and it does discard ambiguous nucleotides and replaces with the reference sequence.

Possibility that I am considering here is that the variants captured in these reads are mutations rather than sequencing error and therefore i was wondering if there is a way to output all possible variation at those N base positions (rather than getting replaced with ambiguous base).

mizraelson · 2024-10-02T22:41:36Z

I see. Generally speaking, there is an algorithm behind assembleContigs that splits a clone if there is enough data to support both variants, which is the output you’re looking for. This algorithm considers the shares of each variant, the Phred quality of the nucleotides, their location on the read, and the surrounding context (for example, if you have NN, there might be multiple possible resolutions) among other things. In some cases, there isn’t enough data to determine if the clone should be split, and MiXCR will then place an N. Several parameters guide this process, but the main ones are:

-MassembleContigs.parameters.branchingMinimalQualityShare=0.1
-MassembleContigs.parameters.branchingMinimalSumQuality=60
-MassembleContigs.parameters.outputMinimalQualityShare=0.75

These are the default values for MiXCR v4.7 with the rna-seq preset. You can find explanations for all parameters on our website. I recommend trying the latest version first and adjusting the parameters if needed (generally, the lower the thresholds, the more likely MiXCR will split a clone into two).

That said, based on our experience, the default parameters work best, as they have been empirically evaluated on hundreds of different datasets.

milaboratory locked and limited conversation to collaborators Oct 2, 2024

mizraelson converted this issue into discussion #1813 Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

N bases #1799

N bases #1799

bshim181 commented Sep 23, 2024

mizraelson commented Sep 24, 2024

bshim181 commented Sep 24, 2024 •

edited

Loading

bshim181 commented Sep 27, 2024

mizraelson commented Sep 27, 2024 •

edited

Loading

bshim181 commented Oct 1, 2024

mizraelson commented Oct 1, 2024

bshim181 commented Oct 2, 2024

mizraelson commented Oct 2, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

N bases #1799

N bases #1799

Comments

bshim181 commented Sep 23, 2024

mizraelson commented Sep 24, 2024

bshim181 commented Sep 24, 2024 • edited Loading

bshim181 commented Sep 27, 2024

mizraelson commented Sep 27, 2024 • edited Loading

bshim181 commented Oct 1, 2024

mizraelson commented Oct 1, 2024

bshim181 commented Oct 2, 2024

mizraelson commented Oct 2, 2024

This issue was moved to a discussion.

bshim181 commented Sep 24, 2024 •

edited

Loading

mizraelson commented Sep 27, 2024 •

edited

Loading