Welcome to TeamRC4DSA's repository! This genomics project is about the detection of m6A modifications on cell lines using RNA-Seq data, taken from the SG-NEx Project. This repository contains the code, our findings, as well as the references we used. Feel free to poke around!
Note
To student testers for DSA4266, while you are free to roam around the repository, note that you will find the /deployment
folder to be of greater relevance to you.
.
├─── src
│ ├─── naive_Autoencoder
│ | ├─── build_autoencoder.py
│ | ├─── predict_autoencoder.py
│ | ├─── probability_autoencoder.py
│ | └─── score_conversion.py
│ ├─── naive_RF
│ | ├─── build_naive_RF.py
│ | ├─── data_normalization.py
│ | ├─── data_parser.py
│ | ├─── naive_RF_feature_engineering.py
│ | ├─── predict_naive_RF.py
│ | ├─── RF_testing_pipeline.py
│ | └─── RF_training_pipeline.sh
│ ├─── util.py
│ ├─── significant_transcripts_positions.R
│ ├─── make_all_predictions.sh
│ └─── README.md
├─── notebooks
│ ├─── Autoencoder
│ | ├─── autoencoder_experimentation.ipynb
│ ├─── Random Forest
│ | ├─── Model Evaluation.ipynb
│ | ├─── Naive RF Model.ipynb
│ | └─── Naive RF Prediction Pipeline.ipynb
│ ├─── Data Analysis
│ | ├─── Analysis with PCA.ipynb
│ | ├─── Analysis - Identifier Transcripts.ipynb
│ | ├─── Data Parsing.ipynb
│ | ├─── Dataset Normalization.ipynb
│ | ├─── EDA.ipynb
│ | └─── Feature Extraction.ipynb
├─── data
│ ├─── raw
│ | ├─── bag_meta.csv
│ | ├─── data.info
│ | ├─── dataset0.json.gz
│ | └─── dataset2.json
│ └─── curated
│ ├─── dataset1_naiveRF_predictions.csv
│ └─── dataset2_naiveRF_predictions.csv
├─── deployment
│ ├─── data/raw
│ | └─── dataset2.json
│ ├─── model
│ | └─── minmaxscaler
│ ├─── src
│ | ├─── util.py
│ | └─── RF
| | ├─── RF_testing_pipeline.py
| | ├─── RF_training_pipeline.py
| | ├─── build_naive_RF.py
| | ├─── data_normalization.py
| | ├─── data_parser.py
| | ├─── naive_RF_feature_engineering.py
| | └─── predict_naive_RF.py
│ ├─── Dockerfile
│ ├─── docker_installation.sh
│ └─── requirements.txt
├─── model
│ ├─── autoencoder
│ ├─── autoencoder_scalar
│ ├─── minmaxscaler
│ ├─── rf-ntrees_10
│ └─── rf-ntrees_100
├─── reference
│ ├─── deliverables
│ | ├─── handout_project2_RNAModifications.html
│ | └─── Student_evaluation_guideline.html
│ └─── research
├─── README.md
├─── .gitignore
└─── .gitattributes
Note
Git LFS has been configured for this project. To install Git LFS, follow the installation steps below:
MacOS:
brew install git-lfs
git lfs install
Windows:
- Follow the instructions here.
Chen, Ying, et al. "A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines." bioRxiv (2021). doi: https://doi.org/10.1101/2021.04.21.440736
The SG-NEx data was accessed on 12 November 2023 at registry.opendata.aws/sg-nex-data.