CL_RNA_SynthBio

Code to reproduce Angenent-Mari, N. et al 2019. Deep Learning for RNA Synthetic Biology

DATA STRUCTURE (INPUT / OUTPU)

Data is loaded from a Toehold Sensor Database (data/2019-03-30_toehold_dataset_proc_with_params.csv) which is comma delimited table having the following columns of DNA encoded sub-sequences: organism, sequence_class, sequence_id, pre_seq promoter, trigger, loop1, switch, loop2, stem1, atg, stem2m linkerm post_linker, output

Input tensor is defined as (DS=Data_Style):

DS) Toehold Nucleotide Sequence
*NOTE: Base toehold string sequence [0-144]
  GGG  - Trigger - Loop1 - Switch  - Loop2 - Stem1 -  AUG  -  Stem2  -  Linker - Post-linker
[-3,-1]  [0,-29]  [30-49]  [50-79]  [80-90] [91,96] [97,99] [100,108] [109,134]  [135,144]    
For training we select our input sequence vector start with GGG and concatenate everything from "Loop1" to "post-linker"... which is seq_SwitchOFF_GFP  = ggg + seq[30:145].
Also, pre_seq & promoter sub-sequences are NEVER used because they are not converted into mRNA (is in the plasmid but > *     it is never in the functional toehold module), so it won't contribute in secondary structure at all. For this example > *     in particular we use DS_1.*

Output vector is defined as:

OUT) ON, Off &/OR ON-OFF State values derived from the experimental testing of toehold switch RNA sequence

PROBLEM DEFINITION

To investigate if a deep learning network can be used to predict toehold switch ON/OFF functionality, because in that case it would suggest the network is learning secondary structure prediction that would be transferable to other RNA based problems.

DATASET (Direct Download)

Due to ocassional high demand in downloads of this GIT, the LFS bandwidth or egress limit of this repo may require you download our data directly from this link: https://drive.google.com/file/d/1t_OXvtW-hEGRt3-mgNlyBKHBqro2Z572/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
backup		backup
data		data
logs		logs
models		models
src		src
.gitignore		.gitignore
1_DL_CNN_1D_toehold_V1.ipynb		1_DL_CNN_1D_toehold_V1.ipynb
1_DL_CNN_2D_toehold_V1.ipynb		1_DL_CNN_2D_toehold_V1.ipynb
1_DL_LSTM_1D_toehold_V1.ipynb		1_DL_LSTM_1D_toehold_V1.ipynb
1_DL_MLP_1D_RP_toehold_V1.ipynb		1_DL_MLP_1D_RP_toehold_V1.ipynb
1_DL_MLP_1D_toehold_V1.ipynb		1_DL_MLP_1D_toehold_V1.ipynb
1_DL_MLP_RP_toehold_V1.ipynb		1_DL_MLP_RP_toehold_V1.ipynb
1_GPML_CNN_toehold_design_gpu_AttentionMap.ipynb		1_GPML_CNN_toehold_design_gpu_AttentionMap.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CL_RNA_SynthBio

DATA STRUCTURE (INPUT / OUTPU)

Input tensor is defined as (DS=Data_Style):

Output vector is defined as:

PROBLEM DEFINITION

DATASET (Direct Download)

About

Releases

Packages

Languages

License

lrsoenksen/CL_RNA_SynthBio

Folders and files

Latest commit

History

Repository files navigation

CL_RNA_SynthBio

DATA STRUCTURE (INPUT / OUTPU)

Input tensor is defined as (DS=Data_Style):

Output vector is defined as:

PROBLEM DEFINITION

DATASET (Direct Download)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages