rna-localization

Background

In this repository we tackle the problem of rna-localization. In this repository we propose different models that predict the distribution of RNA sequences in cellular compartments.

Setup

Install graphviz:
- Please see https://graphviz.gitlab.io/download/ for that
Install conda (check if conda is already installed with conda --version, if not then get it from https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
Perform conda env create -n reggen -f environment.yml
- If you have already installed and activated a conda environment, before you do the above command deactivate the previous conda env with conda deactivate
Activate the conda environement with conda activate reggen

Content

Structure

In this section we describe the important repository content.

dataloaders: folder containing the our custom GeneDataloader. As described in the report, we allow the use of three different inputs: sequence, struct, m6A.

model_architectures: This folder contains architectures used during the past weeks. They come in form .yaml files and follow the writing convention found in the convention text files.

test_data_evaluation: This folder contains the notebooks used for predicting the labels of the test data set. There is a notebook for every trained model (seen in the paper).

models: This folder contains the code for the used models. Interesting are the following scripts:

model.py, singlebranchModel.py, multibranchModel.py: abstraction of models that can be constructed automatically from the yaml files.
CNN_RNN_models.py: final models created directly with the keras api.
utils.py: script used to create and add the different needed layers specified in the architecture yaml file. Possible layers: Convolution, Dense, Flatten, Drouput, MaxPooling, Attention, Skip, ...

main:

notebooks:
- singlebranch, multibranch: templates for training pipelines of the specific model types
- attention_base, CNN_Baseline_4Conv_Struct_ext, dataPipeline_RNATracker: notebooks used to train different models
- MotifSearch.ipynb: notebooks used to generate the integrated gradients
python scripts:
- plotting: script containing different plotting modalities. Together with the plot_res2.ipynb they constitute the tools used to generate the plots portrayed in the report.
- utils: functions used to load the datasets and prepare the parameters for the dataloaders and models.
- metrics: custom Pearson metric for visualizing training with keras. This is the pearson correlation used to display the correlation between the ground truth and predicted compartments.

Paper Results

The results present in the paper can be reproduced with the notebooks present under folder test_data_evaluation. In order to run a notebook, follow these steps:

Copy the notebook under the parent folder (rna-localization)
Every notebook corresponds to a provided trained model (link can be found in the paper). Download the corresponding model.
Change the path variable in the notebook to the path of the model. Or change the path to the model accordingly.
Follow the steps in the notebook (Run the steps)

Motif Search

The integrated gradients can be generated with the MotifSearch.ipynb notebook. In order to do it follow these steps:

Download the wanted model from the repository linked in the report.
Copy the model file in the rna-localization folder and change the value of the model_name variable in the notebook
Adjust the values of the GeneDataloader to correspond to the chosen inputs of the model.
Follow and run the steps in the notebook.

Note: Motif search was only tested with only sequence data as input. It might not work for models trained with multiple inputs. We recommend using Baseline_no_str and CNN_RNN_no_str.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.idea		.idea
__pycache__		__pycache__
dataloaders		dataloaders
model_architectures		model_architectures
models		models
motif_search		motif_search
notes		notes
test_data_evaluation		test_data_evaluation
.DS_Store		.DS_Store
.gitignore		.gitignore
CNN_Baseline_4Conv_Struct_ext.ipynb		CNN_Baseline_4Conv_Struct_ext.ipynb
CNN_Training_Pipeline.ipynb		CNN_Training_Pipeline.ipynb
MotifSearch.ipynb		MotifSearch.ipynb
README.md		README.md
__init__.py		__init__.py
attention_base.ipynb		attention_base.ipynb
dataPipeline_RNATracker.ipynb		dataPipeline_RNATracker.ipynb
environment.yml		environment.yml
metrics.py		metrics.py
multibranch_notebook.ipynb		multibranch_notebook.ipynb
plot_recovery_script.ipynb		plot_recovery_script.ipynb
plot_res2.ipynb		plot_res2.ipynb
plotting.py		plotting.py
singlebranch_notebook.ipynb		singlebranch_notebook.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rna-localization

Background

Setup

Content

Structure

Paper Results

Motif Search

References

RNATracker

DeepmRNALoc

About

Releases

Packages

Contributors 4

Languages

Tralius/rna-localization

Folders and files

Latest commit

History

Repository files navigation

rna-localization

Background

Setup

Content

Structure

Paper Results

Motif Search

References

RNATracker

DeepmRNALoc

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages