You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using the C API of block-aligner to align protein sequences from UniProt database. There are *s in some protein sequences. Currently using block-aligner to align sequences containing * will cause a Segmentation Fault. Although the users can resolve it by mapping * to other supported chars, it would be nice if we can support * internally! :)
The text was updated successfully, but these errors were encountered:
I'm not sure if * will every be directly supported internally. It will always have to be mapped to some character that fits within the scoring matrix, so SIMD lookups can be done. Right now, the amino acid matrix supports alphabetical characters A-Z.
There are a couple of ways this could be solved:
An unused letter like J could be used to represent *, like what you said. On the Rust side, the scores in the amino acid matrix can be cloned and changed, but this is not yet exposed in the C API. Without changing the scores, matches and mismatches with J incur a score of -128.
A letter not part of the original 20 amino acids but still has predefined scores can be used. For example, * can be translated to X.
Require letters to be mapped to numerical values 0-20, then allow block aligner to align numerical strings.
I'm using the C API of
block-aligner
to align protein sequences from UniProt database. There are*
s in some protein sequences. Currently usingblock-aligner
to align sequences containing*
will cause a Segmentation Fault. Although the users can resolve it by mapping*
to other supportedchar
s, it would be nice if we can support*
internally! :)The text was updated successfully, but these errors were encountered: