Code to reproduce Angenent-Mari, N. et al 2019. Deep Learning for RNA Synthetic Biology
Data is loaded from a Toehold Sensor Database (data/2019-03-30_toehold_dataset_proc_with_params.csv) which is comma delimited table having the following columns of DNA encoded sub-sequences: organism, sequence_class, sequence_id, pre_seq promoter, trigger, loop1, switch, loop2, stem1, atg, stem2m linkerm post_linker, output
DS) Toehold Nucleotide Sequence
*NOTE: Base toehold string sequence [0-144]
GGG - Trigger - Loop1 - Switch - Loop2 - Stem1 - AUG - Stem2 - Linker - Post-linker
[-3,-1] [0,-29] [30-49] [50-79] [80-90] [91,96] [97,99] [100,108] [109,134] [135,144]
For training we select our input sequence vector start with GGG and concatenate everything from "Loop1" to "post-linker"... which is seq_SwitchOFF_GFP = ggg + seq[30:145].
Also, pre_seq & promoter sub-sequences are NEVER used because they are not converted into mRNA (is in the plasmid but > * it is never in the functional toehold module), so it won't contribute in secondary structure at all. For this example > * in particular we use DS_1.*
OUT) ON, Off &/OR ON-OFF State values derived from the experimental testing of toehold switch RNA sequence
To investigate if a deep learning network can be used to predict toehold switch ON/OFF functionality, because in that case it would suggest the network is learning secondary structure prediction that would be transferable to other RNA based problems.
Due to ocassional high demand in downloads of this GIT, the LFS bandwidth or egress limit of this repo may require you download our data directly from this link: https://drive.google.com/file/d/1t_OXvtW-hEGRt3-mgNlyBKHBqro2Z572/view?usp=sharing