Skip to content

Latest commit

 

History

History
63 lines (49 loc) · 3.91 KB

README.md

File metadata and controls

63 lines (49 loc) · 3.91 KB

Improved prediction of smoking status via isoform-awareRNA-seq deep learning models

This repo contains code and necessary files to reproduce the result in the paper, Improved prediction of smoking status via isoform-awareRNA-seq deep learning models. In this paper, we propose a novel deep learning model, Isoform Map Layer, which maps exon level information to isoform level. Comprehensive experiments on a self collected blood RNA-seq data from 2,557 subjects in the COPDGene Study demonstrate the effectiveness of our method. We hypothesized that since smoking alters40patterns of exon and isoform usage, greater predictive accuracy could be obtained by using exon and isoform-level quantifications to predict smoking status.

Abstract

Predictive models based on gene expression are already a part of medical decisionmaking for selected situations such as early breast cancer treatment. Most of thesemodels are based on measures that do not capture critical aspects of gene splicing, butwith RNA sequencing it is possible to capture some of these aspects of alternative splicing and use them to improve clinical predictions. Building on previous models topredict cigarette smoking status, we show that measures of alternative splicing significantly improve the accuracy of these predictive models.

Dependencies

  • python 3.6.9
  • keras 2.2.4
  • numpy 1.16.4
  • pandas 0.25.1
  • joblib 0.13.2
  • tensorflow 1.12.0
  • scikit-learn 0.21.2
  • (optinal, for model interpretation) DeepExplain

Usage

Please refer to layer_search.bash for the architecture search, and run.bash for running the main experiments. The bash script should be self-explanatory. A sample to run the experiment is shown below:

chmod +x run.bash
./run.bash

Trained model weights

We have a folder for saving trained model weights, however, the model weights files are too large to be saved on GitHub. Thus, we provide the following Google Drive link to access the weights we have for our experiments.

Code to drive gene, isoform, exon definition

Please see:

Gene, isoform, exon list and corresponding mapping files

We have save the list of genes, isoforms and exons in this GitHub repo, however, due to the size of the mapping files, we also save them on the external Google Drive.

Citing the paper

This paper is still under review, please cite the bioRxiv version for now. Once the paper gets published, we will update the citation correspondingly.

@article{wang2021improved,
  title={Improved prediction of smoking status via isoform-aware RNA-seq deep learning models},
  author={Wang, Zifeng and Masoomi, Aria and Xu, Zhonghui and Boueiz, Adel and Lee, Sool and Zhao, Tingting and Bowler, Russell and Cho, Michael and Silverman, Edwin K and Hersh, Craig and others},
  journal={PLoS computational biology},
  volume={17},
  number={10},
  pages={e1009433},
  year={2021},
  publisher={Public Library of Science San Francisco, CA USA}
}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT