Ambiguous AA handling X vs - chars #153

tnrich · 2021-01-25T18:17:01Z

I noticed that Ambiguous AA 'X' are not rendered.

I used this fix. But may not work where proteinAlphabet["-"] is used.

  if (threeLetterSequenceStringToAminoAcidMap[sequenceString]) {
    return threeLetterSequenceStringToAminoAcidMap[sequenceString];
  } else  {
    return proteinAlphabet["X"];
  } 
//else {
//    return proteinAlphabet["-"]; //return a gap/undefined character
 // }

Originally posted by @kelmazouari in #140 (comment)

The text was updated successfully, but these errors were encountered:

tnrich · 2021-01-25T20:14:21Z

@kelmazouari I'm looking at the code you referenced. It seems like there needs to be more logic here to determine when the three letter DNA code coming in should be one of the following AA's or an X or a gap character.

B: ND
J: IL
X: ACDEFGHIKLMNPQRSTVWY
Z: QE
"-": gap char
".": gap char

I've added logic to handle those cases better to ve-sequence-utils and published a major version there. I'll update bio-parsers and ove with this update and let you know when those have been released.

Here's what the new logic looks like:

  let aa = threeLetterSequenceStringToAminoAcidMap[sequenceString];
  if (aa) {
    return aa;
  }
  const letter =
    degenerateDnaToAminoAcidMap[
      sequenceString.replace("x", "n") //replace x's with n's as those are equivalent dna chars
    ] || "x";
  return proteinAlphabet[letter.toUpperCase()];

kelmazouari · 2021-01-26T08:44:17Z

Hi @tnrich
IMHO, and sorry if I am using some Java keyword; I think the meaning is the still the same in JS
If my DNA seq contains an 'X', I will throw an Exception.
In case of a valid codon, like above (TNT), I will return an ambiguous AA.
I will also not mix '-' and '.' with translation char/String: AA, X and *
I guess '-' and '.' are used in further seq processing, maybe multiple alignments (I didn't check). I will move them out of the AA list. Creating a new class that extends the AA and includes '-', '.' and * will give me more flexibility. Please notice that a * may have a different meaning. It's a stop in a translation, and in multiple alignments is a non-consensus AA or NT.

tnrich · 2021-01-27T17:01:22Z

Hey @kelmazouari

I am taking X as a valid ambiguous dna character ('x' === 'n'). I am mostly using this chart here to make my ambiguous letter determinations:

https://www.dnabaser.com/articles/IUPAC%20ambiguity%20codes.html

From what I can tell by comparing with other sources, that chart seems fairly accurate.

Is the new parser logic not working for you for a specific case or were you just pointing out that you do things differently in your code?

Thanks!
Thomas

kelmazouari · 2021-02-03T10:14:34Z

Thanks @tnrich,
That was just my own suggestion based on another project I am working.
The new parser is working fine.

tnrich closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguous AA handling X vs - chars #153

Ambiguous AA handling X vs - chars #153

tnrich commented Jan 25, 2021

tnrich commented Jan 25, 2021 •

edited

Loading

kelmazouari commented Jan 26, 2021

tnrich commented Jan 27, 2021 •

edited

Loading

kelmazouari commented Feb 3, 2021

Ambiguous AA handling X vs - chars #153

Ambiguous AA handling X vs - chars #153

Comments

tnrich commented Jan 25, 2021

tnrich commented Jan 25, 2021 • edited Loading

kelmazouari commented Jan 26, 2021

tnrich commented Jan 27, 2021 • edited Loading

kelmazouari commented Feb 3, 2021

tnrich commented Jan 25, 2021 •

edited

Loading

tnrich commented Jan 27, 2021 •

edited

Loading