Improved prediction of smoking status via isoform-awareRNA-seq deep learning models

This repo contains code and necessary files to reproduce the result in the paper, Improved prediction of smoking status via isoform-awareRNA-seq deep learning models. In this paper, we propose a novel deep learning model, Isoform Map Layer, which maps exon level information to isoform level. Comprehensive experiments on a self collected blood RNA-seq data from 2,557 subjects in the COPDGene Study demonstrate the effectiveness of our method. We hypothesized that since smoking alters40patterns of exon and isoform usage, greater predictive accuracy could be obtained by using exon and isoform-level quantifications to predict smoking status.

Abstract

Predictive models based on gene expression are already a part of medical decisionmaking for selected situations such as early breast cancer treatment. Most of thesemodels are based on measures that do not capture critical aspects of gene splicing, butwith RNA sequencing it is possible to capture some of these aspects of alternative splicing and use them to improve clinical predictions. Building on previous models topredict cigarette smoking status, we show that measures of alternative splicing significantly improve the accuracy of these predictive models.

Dependencies

python 3.6.9
keras 2.2.4
numpy 1.16.4
pandas 0.25.1
joblib 0.13.2
tensorflow 1.12.0
scikit-learn 0.21.2
(optinal, for model interpretation) DeepExplain

Usage

Please refer to layer_search.bash for the architecture search, and run.bash for running the main experiments. The bash script should be self-explanatory. A sample to run the experiment is shown below:

chmod +x run.bash
./run.bash

Trained model weights

We have a folder for saving trained model weights, however, the model weights files are too large to be saved on GitHub. Thus, we provide the following Google Drive link to access the weights we have for our experiments.

All model weights

Code to drive gene, isoform, exon definition

Please see:

deeplearning_geneAnnotation.html

Gene, isoform, exon list and corresponding mapping files

We have save the list of genes, isoforms and exons in this GitHub repo, however, due to the size of the mapping files, we also save them on the external Google Drive.

Citing the paper

This paper is still under review, please cite the bioRxiv version for now. Once the paper gets published, we will update the citation correspondingly.

@article{wang2021improved,
  title={Improved prediction of smoking status via isoform-aware RNA-seq deep learning models},
  author={Wang, Zifeng and Masoomi, Aria and Xu, Zhonghui and Boueiz, Adel and Lee, Sool and Zhao, Tingting and Bowler, Russell and Cho, Michael and Silverman, Edwin K and Hersh, Craig and others},
  journal={PLoS computational biology},
  volume={17},
  number={10},
  pages={e1009433},
  year={2021},
  publisher={Public Library of Science San Francisco, CA USA}
}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
mapping_data		mapping_data
utils		utils
LICENSE		LICENSE
README.md		README.md
SmokingPredictions_7.6.20_revisionresults_4.15.2021.Rmd		SmokingPredictions_7.6.20_revisionresults_4.15.2021.Rmd
SmokingPredictions_Cotinine_4.15.2021.Rmd		SmokingPredictions_Cotinine_4.15.2021.Rmd
data_checking.ipynb		data_checking.ipynb
data_loading.py		data_loading.py
deeplearning_geneAnnotation.html		deeplearning_geneAnnotation.html
generate_mapping.ipynb		generate_mapping.ipynb
isomap_training.py		isomap_training.py
layer_search.bash		layer_search.bash
run.bash		run.bash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improved prediction of smoking status via isoform-awareRNA-seq deep learning models

Abstract

Dependencies

Usage

Trained model weights

Code to drive gene, isoform, exon definition

Gene, isoform, exon list and corresponding mapping files

Citing the paper

Contributing

License

About

Releases

Packages

Languages

License

KingSpencer/COPD-IsoformMap

Folders and files

Latest commit

History

Repository files navigation

Improved prediction of smoking status via isoform-awareRNA-seq deep learning models

Abstract

Dependencies

Usage

Trained model weights

Code to drive gene, isoform, exon definition

Gene, isoform, exon list and corresponding mapping files

Citing the paper

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages