You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
There is a requirement to have parsing of RNA, DNA, and peptide sequences.
These sequences represented as a plain strings with a combination of the following symbols:
for peptides:
A - Alanine
C - Cysteine
D - Aspartic Acid
E - Glutamic Acid
F - Phenylalanine
G - Glycine
H - Histidine
I - Isoleucine
K - Lysine
L - Leucine
M - Methionine
N - Asparagine
P - Proline
Q - Glutamine
R - Arginine
S - Serine
T - Threonine
V - Valine
W - Tryptophan
Y - Tyrosine
for RNA nucleotides:
A - AMP (Adenosine monophosphate)
C - CMP (Cytidine monophosphate)
G - GMP (Guanosine monophosphate)
U - UMP (Uridine monophosphate)
T - rTMP (Ribothymidine monophosphate)
for DNA nucleotides:
A - dAMP (Deoxyadenosine monophosphate)
C - dCMP (Deoxycytidine monophosphate)
G - dGMP (Deoxyguanosine monophosphate)
U - dUMP (Deoxyuridine monophosphate)
T - TMP (Thymidine monophosphate)
Sequence parser should split nucleotides into its components: phosphate, sugar and nucleobase as following:
For RNA case:
Input: <string> (of RNA letters)
Input Visualization (given just for better explanation):
Output: All supported indigo formats including ket format in JSON for Ketcher.
Output Visualization (given just for better explanation):
Algo: every RNA letter wrap into r(...)p i.e. A -> r(A)p, C -> r(C)p and so on
Example:
Input: "ACGU"
Output: r(A)p, r(C)p, r(G)p, r(U) (in ket format)
For DNA, case:
Input: <string> (of RNA letters)
Input Visualization:
Output: All supported indigo formats including ket format in JSON for Ketcher.
Output Visualization:
Algo: every RNA letter wrap into d(...)p i.e. A -> d(A)p, C -> d(C)p and so on
Example:
Input: "ACGT"
Output: d(A)p, d(C)p, d(G)p, d(T) (in ket format)
Please note that phosphate component "p" appears from the right side of an expanded nucleotide.
"r" - means ribose and "d"- deoxyribose.
Solution
Implement C++ class SequenceLoader in addition to the existing Indigo loaders for molecular formats.
Implement following functions for C API, where type can be one of "RNA", "DNA" or "PEPTIDE":
int indigoLoadSequence(int source, const char* type);
int indigoLoadSequenceFromString(const char* string, const char* type);
int indigoLoadSequenceFromFile(const char* filename, const char* type);
int indigoLoadSequenceFromBuffer(const char* buffer, int size, const char* type);
Coordinates for monomers calculated according to the pictures above. Backbone monomers coordinates calculated from left to right. Branch monomers positioned under the sugars they connected to.
The text was updated successfully, but these errors were encountered:
Background
There is a requirement to have parsing of RNA, DNA, and peptide sequences.
These sequences represented as a plain strings with a combination of the following symbols:
for peptides:
A - Alanine
C - Cysteine
D - Aspartic Acid
E - Glutamic Acid
F - Phenylalanine
G - Glycine
H - Histidine
I - Isoleucine
K - Lysine
L - Leucine
M - Methionine
N - Asparagine
P - Proline
Q - Glutamine
R - Arginine
S - Serine
T - Threonine
V - Valine
W - Tryptophan
Y - Tyrosine
for RNA nucleotides:
A - AMP (Adenosine monophosphate)
C - CMP (Cytidine monophosphate)
G - GMP (Guanosine monophosphate)
U - UMP (Uridine monophosphate)
T - rTMP (Ribothymidine monophosphate)
for DNA nucleotides:
A - dAMP (Deoxyadenosine monophosphate)
C - dCMP (Deoxycytidine monophosphate)
G - dGMP (Deoxyguanosine monophosphate)
U - dUMP (Deoxyuridine monophosphate)
T - TMP (Thymidine monophosphate)
Sequence parser should split nucleotides into its components: phosphate, sugar and nucleobase as following:
For RNA case:
Input:
<string>
(of RNA letters)Input Visualization (given just for better explanation):
Output: All supported indigo formats including ket format in JSON for Ketcher.
Output Visualization (given just for better explanation):
Algo: every RNA letter wrap into r(...)p i.e.
A -> r(A)p
,C -> r(C)p
and so onExample:
Input:
"ACGU"
Output:
r(A)p, r(C)p, r(G)p, r(U)
(in ket format)For DNA, case:
Input:
<string>
(of RNA letters)Input Visualization:
Output: All supported indigo formats including ket format in JSON for Ketcher.
Output Visualization:
Algo: every RNA letter wrap into d(...)p i.e.
A -> d(A)p
,C -> d(C)p
and so onExample:
Input:
"ACGT"
Output:
d(A)p, d(C)p, d(G)p, d(T)
(in ket format)Please note that phosphate component "p" appears from the right side of an expanded nucleotide.
"r" - means ribose and "d"- deoxyribose.
Solution
Add language bindings for Python, Java, C#
python binding functions:
def loadSequence(self, input_string: string, sequence_type: string):
def loadSequenceFromFile(self, input_file: string, sequence_type: string):
Add the following content types to WASM "loadMoleculeOrReaction" and Indigo service "convert" API:
chemical/x-rna-sequence, chemical/x-dna-sequence, chemical/x-peptide-sequence
The text was updated successfully, but these errors were encountered: