Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous AA handling X vs - chars #153

Closed
tnrich opened this issue Jan 25, 2021 · 4 comments
Closed

Ambiguous AA handling X vs - chars #153

tnrich opened this issue Jan 25, 2021 · 4 comments

Comments

@tnrich
Copy link
Contributor

tnrich commented Jan 25, 2021

Hi @tnrich ,

I noticed that Ambiguous AA 'X' are not rendered.

image

I used this fix. But may not work where proteinAlphabet["-"] is used.

  if (threeLetterSequenceStringToAminoAcidMap[sequenceString]) {
    return threeLetterSequenceStringToAminoAcidMap[sequenceString];
  } else  {
    return proteinAlphabet["X"];
  } 
//else {
//    return proteinAlphabet["-"]; //return a gap/undefined character
 // }

Originally posted by @kelmazouari in #140 (comment)

@tnrich
Copy link
Contributor Author

tnrich commented Jan 25, 2021

@kelmazouari I'm looking at the code you referenced. It seems like there needs to be more logic here to determine when the three letter DNA code coming in should be one of the following AA's or an X or a gap character.

B: ND
J: IL
X: ACDEFGHIKLMNPQRSTVWY
Z: QE
"-": gap char
".": gap char

I've added logic to handle those cases better to ve-sequence-utils and published a major version there. I'll update bio-parsers and ove with this update and let you know when those have been released.

Here's what the new logic looks like:

  let aa = threeLetterSequenceStringToAminoAcidMap[sequenceString];
  if (aa) {
    return aa;
  }
  const letter =
    degenerateDnaToAminoAcidMap[
      sequenceString.replace("x", "n") //replace x's with n's as those are equivalent dna chars
    ] || "x";
  return proteinAlphabet[letter.toUpperCase()];

@kelmazouari
Copy link

Hi @tnrich
IMHO, and sorry if I am using some Java keyword; I think the meaning is the still the same in JS
If my DNA seq contains an 'X', I will throw an Exception.
In case of a valid codon, like above (TNT), I will return an ambiguous AA.
I will also not mix '-' and '.' with translation char/String: AA, X and *
I guess '-' and '.' are used in further seq processing, maybe multiple alignments (I didn't check). I will move them out of the AA list. Creating a new class that extends the AA and includes '-', '.' and * will give me more flexibility. Please notice that a * may have a different meaning. It's a stop in a translation, and in multiple alignments is a non-consensus AA or NT.

@tnrich
Copy link
Contributor Author

tnrich commented Jan 27, 2021

Hey @kelmazouari

I am taking X as a valid ambiguous dna character ('x' === 'n'). I am mostly using this chart here to make my ambiguous letter determinations:

image

https://www.dnabaser.com/articles/IUPAC%20ambiguity%20codes.html

From what I can tell by comparing with other sources, that chart seems fairly accurate.

Is the new parser logic not working for you for a specific case or were you just pointing out that you do things differently in your code?

Thanks!
Thomas

@kelmazouari
Copy link

Thanks @tnrich,
That was just my own suggestion based on another project I am working.
The new parser is working fine.

@tnrich tnrich closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants