This repo contains code and necessary files to reproduce the result in the paper, Improved prediction of smoking status via isoform-awareRNA-seq deep learning models. In this paper, we propose a novel deep learning model, Isoform Map Layer, which maps exon level information to isoform level. Comprehensive experiments on a self collected blood RNA-seq data from 2,557 subjects in the COPDGene Study demonstrate the effectiveness of our method. We hypothesized that since smoking alters40patterns of exon and isoform usage, greater predictive accuracy could be obtained by using exon and isoform-level quantifications to predict smoking status.
Predictive models based on gene expression are already a part of medical decisionmaking for selected situations such as early breast cancer treatment. Most of thesemodels are based on measures that do not capture critical aspects of gene splicing, butwith RNA sequencing it is possible to capture some of these aspects of alternative splicing and use them to improve clinical predictions. Building on previous models topredict cigarette smoking status, we show that measures of alternative splicing significantly improve the accuracy of these predictive models.
- python 3.6.9
- keras 2.2.4
- numpy 1.16.4
- pandas 0.25.1
- joblib 0.13.2
- tensorflow 1.12.0
- scikit-learn 0.21.2
- (optinal, for model interpretation) DeepExplain
Please refer to layer_search.bash for the architecture search, and run.bash for running the main experiments. The bash script should be self-explanatory. A sample to run the experiment is shown below:
chmod +x run.bash
./run.bash
We have a folder for saving trained model weights, however, the model weights files are too large to be saved on GitHub. Thus, we provide the following Google Drive link to access the weights we have for our experiments.
Please see:
We have save the list of genes, isoforms and exons in this GitHub repo, however, due to the size of the mapping files, we also save them on the external Google Drive.
This paper is still under review, please cite the bioRxiv version for now. Once the paper gets published, we will update the citation correspondingly.
@article{wang2021improved,
title={Improved prediction of smoking status via isoform-aware RNA-seq deep learning models},
author={Wang, Zifeng and Masoomi, Aria and Xu, Zhonghui and Boueiz, Adel and Lee, Sool and Zhao, Tingting and Bowler, Russell and Cho, Michael and Silverman, Edwin K and Hersh, Craig and others},
journal={PLoS computational biology},
volume={17},
number={10},
pages={e1009433},
year={2021},
publisher={Public Library of Science San Francisco, CA USA}
}
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.