From b870d03db1ce91219dc2969ddc6758d34d9f194f Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Sat, 19 Jan 2019 22:23:07 -0800 Subject: [PATCH 1/9] Pushing background Planning on adding two more sections that expand on the points of the last paragraph. Will need help editing these points and making text more concise, to leave room for remaining two paragraphs. Also looking to adjust some text from the previous gene expression paragraphs and text surrounding latent space prediction. --- content/04.study.md | 8 ++++++++ content/citation-tags.tsv | 4 ++++ 2 files changed, 12 insertions(+) diff --git a/content/04.study.md b/content/04.study.md index 05c5054f..cdac2b56 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -47,6 +47,14 @@ Deep learning applied to gene expression data is still in its infancy, but the f Many previously untestable hypotheses can now be interrogated as deep learning enables analysis of increasing amounts of data generated by new technologies. For example, the effects of cellular heterogeneity on basic biology and disease etiology can now be explored by single-cell RNA-seq and high-throughput fluorescence-based imaging, techniques we discuss below that will benefit immensely from deep learning approaches. +### DNA Methylation + +DNA Methylation (DNAm), a process of epigenetic alteration that can change gene transcription without modifying sequence, is an important mechanism for understanding the development of oncogenesis. It has also shed light on the processes involving initial differentiation of stem cells, aging and pathogenesis in response to environmental exposures. + +Traditional epigenetic computational approaches focus on estimate metrics pertaining to prognosis. For instance, it has been shown that DNA methylation could be used to calculate immune cell type proportions, a proxy for the patient’s immune profile, in patient subpopulations. Such approaches are important for adjusting the results of epigenome-wide association studies for cell-type composition [@tag:Teschendorff2017]. These methods typically use a restricted set of candidate methylation sites (CpG sites), a reference library from which to make predictions [@tag:Houseman2012]. While utilizing reference-based libraries demonstrates strong predictive value for immune cell type estimation, these methods severely restrict the amount of underlying biology that can be understood and correlated with disease manifestations and phenotypes. When a referenced-based library is not available for use, unsupervised methods are able to estimate these immune profiles [@tag:TitusReview2017]. Methods that do not rely on these reference libraries [@tag:Houseman2014] are hindered by being unable to fully capture the nonlinearity of the methylation data. + +There are many promising deep learning approaches that serve to expand the number of sites that can be studied by capturing the complex interactions between different methylated regions of DNA and extract the complete set of informative biologically relevant features. The main approaches focus on: 1) estimating regions of methylation status and imputing missing methylation values, 2) performing classification and regression tasks, and 3) understanding latent embeddings of methylation states from which to extract biologically meaningful features, infer interpolated disease states, and uncover relevant CpG sites for the above prediction tasks. + ### Splicing Pre-mRNA transcripts can be spliced into different isoforms by retaining or skipping subsets of exons or including parts of introns, creating enormous spatiotemporal flexibility to generate multiple distinct proteins from a single gene. diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index 9da21983..2dc26c7f 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -96,6 +96,8 @@ Hinton2015_dk arxiv:1503.02531v1 Hochreiter doi:10.1093/bioinformatics/btm247 Hoff doi:10.1093/nar/gkp327 Horton1992_assessment doi:10.1093/nar/20.16.4331 +Houseman2012 doi:10.1186/1471-2105-13-86 +Houseman2014 doi:10.1093/bioinformatics/btu029 Hubara2016_qnn arxiv:1609.07061 Huddar2016_predicting doi:10.1109/ACCESS.2016.2618775 Hughes2016_macromol_react doi:10.1021/acscentsci.6b00162 @@ -245,7 +247,9 @@ Tan2014_psb doi:10.1142/9789814644730_0014 Tan2015_adage doi:10.1128/mSystems.00025-15 Tan2016_eadage doi:10.1101/078659 TAC-ELM doi:10.1142/S0219720012500151 +Teschendorff2017 doi:10.2217/epi-2016-0153 TensorFlow arxiv:1603.04467 +TitusReview2017 doi:10.1093/hmg/ddx275 Torracinta2016_deep_snp doi:10.1101/097469 Torracinta2016_sim doi:10.1101/079087 Tu1996_anns doi:10.1016/S0895-4356(96)00002-9 From 1063bb58c880c86570dface54a9d1b9c996b56cd Mon Sep 17 00:00:00 2001 From: Joshua Levy Date: Sat, 19 Jan 2019 22:42:32 -0800 Subject: [PATCH 2/9] Make DNAm Background More Readable for Editing --- content/04.study.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/content/04.study.md b/content/04.study.md index cdac2b56..4b04253f 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -49,11 +49,19 @@ For example, the effects of cellular heterogeneity on basic biology and disease ### DNA Methylation -DNA Methylation (DNAm), a process of epigenetic alteration that can change gene transcription without modifying sequence, is an important mechanism for understanding the development of oncogenesis. It has also shed light on the processes involving initial differentiation of stem cells, aging and pathogenesis in response to environmental exposures. - -Traditional epigenetic computational approaches focus on estimate metrics pertaining to prognosis. For instance, it has been shown that DNA methylation could be used to calculate immune cell type proportions, a proxy for the patient’s immune profile, in patient subpopulations. Such approaches are important for adjusting the results of epigenome-wide association studies for cell-type composition [@tag:Teschendorff2017]. These methods typically use a restricted set of candidate methylation sites (CpG sites), a reference library from which to make predictions [@tag:Houseman2012]. While utilizing reference-based libraries demonstrates strong predictive value for immune cell type estimation, these methods severely restrict the amount of underlying biology that can be understood and correlated with disease manifestations and phenotypes. When a referenced-based library is not available for use, unsupervised methods are able to estimate these immune profiles [@tag:TitusReview2017]. Methods that do not rely on these reference libraries [@tag:Houseman2014] are hindered by being unable to fully capture the nonlinearity of the methylation data. - -There are many promising deep learning approaches that serve to expand the number of sites that can be studied by capturing the complex interactions between different methylated regions of DNA and extract the complete set of informative biologically relevant features. The main approaches focus on: 1) estimating regions of methylation status and imputing missing methylation values, 2) performing classification and regression tasks, and 3) understanding latent embeddings of methylation states from which to extract biologically meaningful features, infer interpolated disease states, and uncover relevant CpG sites for the above prediction tasks. +DNA Methylation (DNAm), a process of epigenetic alteration that can change gene transcription without modifying sequence, is an important mechanism for understanding the development of oncogenesis. +It has also shed light on the processes involving initial differentiation of stem cells, aging and pathogenesis in response to environmental exposures. + +Traditional epigenetic computational approaches focus on estimate metrics pertaining to prognosis. +For instance, it has been shown that DNA methylation could be used to calculate immune cell type proportions, a proxy for the patient’s immune profile, in patient subpopulations. +Such approaches are important for adjusting the results of epigenome-wide association studies for cell-type composition [@tag:Teschendorff2017]. +These methods typically use a restricted set of candidate methylation sites (CpG sites), a reference library from which to make predictions [@tag:Houseman2012]. +While utilizing reference-based libraries demonstrates strong predictive value for immune cell type estimation, these methods severely restrict the amount of underlying biology that can be understood and correlated with disease manifestations and phenotypes. +When a referenced-based library is not available for use, unsupervised methods are able to estimate these immune profiles [@tag:TitusReview2017]. +Methods that do not rely on these reference libraries [@tag:Houseman2014] are hindered by being unable to fully capture the nonlinearity of the methylation data. + +There are many promising deep learning approaches that serve to expand the number of sites that can be studied by capturing the complex interactions between different methylated regions of DNA and extract the complete set of informative biologically relevant features. +The main approaches focus on: 1) estimating regions of methylation status and imputing missing methylation values, 2) performing classification and regression tasks, and 3) understanding latent embeddings of methylation states from which to extract biologically meaningful features, infer interpolated disease states, and uncover relevant CpG sites for the above prediction tasks. ### Splicing From 8d3301d1258365c120e24d9260a13bb86008a43c Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Thu, 9 May 2019 10:46:22 -0400 Subject: [PATCH 3/9] Deep Review PR # 2 --- content/04.study.md | 28 +++++++++++++++------------- content/citation-tags.tsv | 19 +++++++++++++++---- 2 files changed, 30 insertions(+), 17 deletions(-) diff --git a/content/04.study.md b/content/04.study.md index 4b04253f..0fde4f34 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -49,19 +49,21 @@ For example, the effects of cellular heterogeneity on basic biology and disease ### DNA Methylation -DNA Methylation (DNAm), a process of epigenetic alteration that can change gene transcription without modifying sequence, is an important mechanism for understanding the development of oncogenesis. -It has also shed light on the processes involving initial differentiation of stem cells, aging and pathogenesis in response to environmental exposures. - -Traditional epigenetic computational approaches focus on estimate metrics pertaining to prognosis. -For instance, it has been shown that DNA methylation could be used to calculate immune cell type proportions, a proxy for the patient’s immune profile, in patient subpopulations. -Such approaches are important for adjusting the results of epigenome-wide association studies for cell-type composition [@tag:Teschendorff2017]. -These methods typically use a restricted set of candidate methylation sites (CpG sites), a reference library from which to make predictions [@tag:Houseman2012]. -While utilizing reference-based libraries demonstrates strong predictive value for immune cell type estimation, these methods severely restrict the amount of underlying biology that can be understood and correlated with disease manifestations and phenotypes. -When a referenced-based library is not available for use, unsupervised methods are able to estimate these immune profiles [@tag:TitusReview2017]. -Methods that do not rely on these reference libraries [@tag:Houseman2014] are hindered by being unable to fully capture the nonlinearity of the methylation data. - -There are many promising deep learning approaches that serve to expand the number of sites that can be studied by capturing the complex interactions between different methylated regions of DNA and extract the complete set of informative biologically relevant features. -The main approaches focus on: 1) estimating regions of methylation status and imputing missing methylation values, 2) performing classification and regression tasks, and 3) understanding latent embeddings of methylation states from which to extract biologically meaningful features, infer interpolated disease states, and uncover relevant CpG sites for the above prediction tasks. +#### Inference, Imputation, and Prediction + +Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states. +One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. +DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018]. +DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. +In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. +While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. +Not only does imputing DNA methylation share similar limitations with genotype imputation, but there is additional complexity in correlation of DNA methylation that can depend on cell type and other variables related with DNA methylation that can vary by sample. +As the abundance of available tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing), grows, the accuracy of imputation methods for DNA methylation is expected to increase. + +Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. +For instance, one group employed a Deep Neural Network (DNN) to predict triglyceride concentrations pre- and post-treatment from approximately 450K features (differential DNAm levels) from the Illumina 450K microarray, and used the Dropout technique to generalize the model [@tag:Islam2018] [@tag:Darst2018]. +Another study transformed methylation profiles of about ten thousand TCGA samples to perform classification tasks to differentiate 32 different cancer types using the concatenation of various Convolutional Neural Network Maps and learn important patterns of differentially methylated regions that were used to make the classifications [@tag:Chatterjee2018]. +Finally, the prediction of cancer subtypes using DNAm was proposed based on a deep autoencoder. The system exploited content retrieval mechanisms to additionally understand the cancer cell type differentiation of the predicted cancer types [@tag:Khwaja2018] based on methylation of CpG islands. ### Splicing diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index 2dc26c7f..00aada8d 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -10,6 +10,7 @@ Asgari doi:10.1371/journal.pone.0141287 blast doi:10.1016/S0022-2836(05)80360-2 Angermueller2016_dl_review doi:10.15252/msb.20156651 Angermueller2016_single_methyl doi:10.1186/s13059-017-1189-z +Angermueller2017 doi:10.1186/s13059-017-1189-z Artemov2016_clinical doi:10.1101/095653 Arvaniti2016_rare_subsets doi:10.1101/046508 Bach2015_on doi:10.1371/journal.pone.0130140 @@ -27,6 +28,7 @@ Bracken2016_mirna doi:10.1038/nrg.2016.134 Boza doi:10.1371/journal.pone.0178751 Buggenthin2017_imaged_lineage doi:10.1038/nmeth.4182 Burlina2016_amd doi:10.1109/ISBI.2016.7493240 +Chatterjee2018 arxiv:1807.09617 Caruana2014_need arxiv:1312.6184 Caruana2015_intelligible url:https://dl.acm.org/citation.cfm?id=2788613 Chaudhary2017_multiom_liver_cancer doi:10.1101/114892 @@ -46,6 +48,7 @@ Codella2016_ensemble_melanoma arxiv:1610.04662 Consortium2012_encode doi:10.1038/nature11247 CudNN arxiv:1410.0759 Dahl2014_multi_qsar arxiv:1406.1231 +Darst2018 doi:10.1186/s12863-018-0646-3 Dean2012_nips_downpour url:http://research.google.com/archive/large_deep_networks_nips2012.html DeepChem url:https://github.com/deepchem/deepchem Deming2016_genetic arxiv:1605.07156 @@ -70,6 +73,7 @@ Esteva2017_skin_cancer_nature doi:10.1038/nature21056 Faruqi url:http://alifar76.github.io/sklearn-metrics/ Finnegan2017_maximum doi:10.1101/105957 Fong2017_perturb doi:10.1109/ICCV.2017.371 +Fu2019 doi:10.1109/TCBB.2019.2909237 Gal2015_dropout arxiv:1506.02142 Gaublomme2015_th17 doi:10.1016/j.cell.2015.11.009 Gargeya2017_dr doi:10.1016/j.ophtha.2017.02.008 @@ -96,12 +100,11 @@ Hinton2015_dk arxiv:1503.02531v1 Hochreiter doi:10.1093/bioinformatics/btm247 Hoff doi:10.1093/nar/gkp327 Horton1992_assessment doi:10.1093/nar/20.16.4331 -Houseman2012 doi:10.1186/1471-2105-13-86 -Houseman2014 doi:10.1093/bioinformatics/btu029 Hubara2016_qnn arxiv:1609.07061 Huddar2016_predicting doi:10.1109/ACCESS.2016.2618775 Hughes2016_macromol_react doi:10.1021/acscentsci.6b00162 Iglovikov2017_baa doi:10.1101/234120 +Islam2018 doi:10.1186/s12919-018-0121-1 Ithapu2015_efficient doi:10.1016/j.jalz.2015.01.010 Jafari2016_skin_lesions doi:10.1007/s11548-017-1567-8 Jha2017_integrative_models doi:10.1101/104869 @@ -123,9 +126,12 @@ Koh2016_denoising doi:10.1101/052118 Koh2017_understanding arxiv:1703.04730 Kooi2016_mamm_lesions doi:10.1016/j.media.2016.07.007 Kooi2017_mamm_tl doi:10.1002/mp.12110 +Korfiatis2017 doi:10.1007/s10278-017-0009-z Kraus2017_deeploc doi:10.15252/msb.20177551 Krizhevsky2013_nips_cnn url:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Krizhevsky2014_weird_trick arxiv:1404.5997 +Khwaja2017 doi:10.1109/BIOCAS.2017.8325078 +Khwaja2018 arxiv:1810.01243 Lacey2016_dl_fpga arxiv:1602.04283 Lakhani2017_radiography doi:10.1148/radiol.2017162326 Lanchantin2016_motif arxiv:1608.03644 @@ -133,6 +139,7 @@ Lee2016_deeptarget arxiv:1603.09123v2 Lee2016_emr_oct_amd doi:10.1101/094276 Lei2016_rationalizing arxiv:1606.04155 Leibig2016_dr doi:10.1101/084210 +Levy-Jurgenson2018 doi:10.1101/491357 Li2014_minibatch doi:10.1145/2623330.2623612 Li2016_variation doi:10.1126/science.aad9417 Liang2015_exprs_cancer doi:10.1109/TCBB.2014.2377729 @@ -164,6 +171,7 @@ McHardy2 doi:10.7717/peerj.1603 Metaphlan doi:10.1038/nmeth.2066 Meng2016_mllib arxiv:1505.06807 Min2016_deepenhancer doi:10.1109/BIBM.2016.7822593 +Momeni2018 doi:10.1101/438341 Moritz2015_sparknet arxiv:1511.06051 Mordvintsev2015_inceptionism url:http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html Mrzelj url:https://repozitorij.uni-lj.si/IzpisGradiva.php?id=85515 @@ -171,6 +179,7 @@ matis doi:10.1016/S0097-8485(96)80015-5 nbc doi:10.1093/bioinformatics/btq619 Murdoch2017_automatic arxiv:1702.02540 Nemati2016_rl doi:10.1109/EMBC.2016.7591355 +Ni2018 doi:10.1101/385849 Nguyen2014_adversarial arxiv:1412.1897v4 Ngiam2011 url:https://ai.stanford.edu/~ang/papers/icml11-MultimodalDeepLearning.pdf Nie2016_3d_survival doi:10.1007/978-3-319-46723-8_25 @@ -182,7 +191,9 @@ onecodex url:https://www.onecodex.com/ Papernot2017_pate url:https://openreview.net/forum?id=HkwoSDPgg Park2016_deepmirgene arxiv:1605.00017 Parnamaa2017 doi:10.1534/g3.116.033654 +Pan2018 doi:10.1101/438218 Pawlowski2016 doi:10.1101/085118 +Peng2019 doi:10.1101/527044. Pereira2016_docking doi:10.1021/acs.jcim.6b00355 PerezSianes2016_screening doi:10.1007/978-3-319-40126-3_2 Phymm doi:10.1038/nmeth.1358 @@ -191,6 +202,7 @@ Pratt2016_dr doi:10.1016/j.procs.2016.07.014 Quang2017_factor doi:10.1101/151274 Qin2017_onehot doi:10.1371/journal.pcbi.1005403 Qiu2017_graph_embedding doi:10.1101/110668 +Qiu2018 doi:10.1101/406066 Ragoza2016_protein arxiv:1612.02751 RAD2010_view_cc doi:10.1145/1721654.1721672 Radford_dcgan arxiv:1511.06434v2 @@ -247,9 +259,7 @@ Tan2014_psb doi:10.1142/9789814644730_0014 Tan2015_adage doi:10.1128/mSystems.00025-15 Tan2016_eadage doi:10.1101/078659 TAC-ELM doi:10.1142/S0219720012500151 -Teschendorff2017 doi:10.2217/epi-2016-0153 TensorFlow arxiv:1603.04467 -TitusReview2017 doi:10.1093/hmg/ddx275 Torracinta2016_deep_snp doi:10.1101/097469 Torracinta2016_sim doi:10.1101/079087 Tu1996_anns doi:10.1016/S0895-4356(96)00002-9 @@ -259,6 +269,7 @@ Vera2016_sc_analysis doi:10.1146/annurev-genet-120215-034854 Vervier doi:10.1093/bioinformatics/btv683 Wallach2015_atom_net arxiv:1510.02855 Wang2016_breast_cancer arxiv:1606.05718 +Wang2016_methyl doi:10.1038/srep19598 Wang2016_protein_contact doi:10.1371/journal.pcbi.1005324 Wasson1985_clinical doi:10.1056/NEJM198509263131306 WayGreene2017_eval arxiv:1711.04828 From e0ad1ed7a0f24abf346571c0f5ed4ca27559d9b6 Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Thu, 9 May 2019 17:34:49 -0400 Subject: [PATCH 4/9] Tab-delimited hopefully. --- content/citation-tags.tsv | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index 00aada8d..e68fe288 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -10,7 +10,7 @@ Asgari doi:10.1371/journal.pone.0141287 blast doi:10.1016/S0022-2836(05)80360-2 Angermueller2016_dl_review doi:10.15252/msb.20156651 Angermueller2016_single_methyl doi:10.1186/s13059-017-1189-z -Angermueller2017 doi:10.1186/s13059-017-1189-z +Angermueller2017 doi:10.1186/s13059-017-1189-z Artemov2016_clinical doi:10.1101/095653 Arvaniti2016_rare_subsets doi:10.1101/046508 Bach2015_on doi:10.1371/journal.pone.0130140 @@ -28,7 +28,7 @@ Bracken2016_mirna doi:10.1038/nrg.2016.134 Boza doi:10.1371/journal.pone.0178751 Buggenthin2017_imaged_lineage doi:10.1038/nmeth.4182 Burlina2016_amd doi:10.1109/ISBI.2016.7493240 -Chatterjee2018 arxiv:1807.09617 +Chatterjee2018 arxiv:1807.09617 Caruana2014_need arxiv:1312.6184 Caruana2015_intelligible url:https://dl.acm.org/citation.cfm?id=2788613 Chaudhary2017_multiom_liver_cancer doi:10.1101/114892 @@ -48,7 +48,7 @@ Codella2016_ensemble_melanoma arxiv:1610.04662 Consortium2012_encode doi:10.1038/nature11247 CudNN arxiv:1410.0759 Dahl2014_multi_qsar arxiv:1406.1231 -Darst2018 doi:10.1186/s12863-018-0646-3 +Darst2018 doi:10.1186/s12863-018-0646-3 Dean2012_nips_downpour url:http://research.google.com/archive/large_deep_networks_nips2012.html DeepChem url:https://github.com/deepchem/deepchem Deming2016_genetic arxiv:1605.07156 @@ -73,7 +73,7 @@ Esteva2017_skin_cancer_nature doi:10.1038/nature21056 Faruqi url:http://alifar76.github.io/sklearn-metrics/ Finnegan2017_maximum doi:10.1101/105957 Fong2017_perturb doi:10.1109/ICCV.2017.371 -Fu2019 doi:10.1109/TCBB.2019.2909237 +Fu2019 doi:10.1109/TCBB.2019.2909237 Gal2015_dropout arxiv:1506.02142 Gaublomme2015_th17 doi:10.1016/j.cell.2015.11.009 Gargeya2017_dr doi:10.1016/j.ophtha.2017.02.008 @@ -104,7 +104,7 @@ Hubara2016_qnn arxiv:1609.07061 Huddar2016_predicting doi:10.1109/ACCESS.2016.2618775 Hughes2016_macromol_react doi:10.1021/acscentsci.6b00162 Iglovikov2017_baa doi:10.1101/234120 -Islam2018 doi:10.1186/s12919-018-0121-1 +Islam2018 doi:10.1186/s12919-018-0121-1 Ithapu2015_efficient doi:10.1016/j.jalz.2015.01.010 Jafari2016_skin_lesions doi:10.1007/s11548-017-1567-8 Jha2017_integrative_models doi:10.1101/104869 @@ -126,12 +126,12 @@ Koh2016_denoising doi:10.1101/052118 Koh2017_understanding arxiv:1703.04730 Kooi2016_mamm_lesions doi:10.1016/j.media.2016.07.007 Kooi2017_mamm_tl doi:10.1002/mp.12110 -Korfiatis2017 doi:10.1007/s10278-017-0009-z +Korfiatis2017 doi:10.1007/s10278-017-0009-z Kraus2017_deeploc doi:10.15252/msb.20177551 Krizhevsky2013_nips_cnn url:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Krizhevsky2014_weird_trick arxiv:1404.5997 -Khwaja2017 doi:10.1109/BIOCAS.2017.8325078 -Khwaja2018 arxiv:1810.01243 +Khwaja2017 doi:10.1109/BIOCAS.2017.8325078 +Khwaja2018 arxiv:1810.01243 Lacey2016_dl_fpga arxiv:1602.04283 Lakhani2017_radiography doi:10.1148/radiol.2017162326 Lanchantin2016_motif arxiv:1608.03644 @@ -139,7 +139,7 @@ Lee2016_deeptarget arxiv:1603.09123v2 Lee2016_emr_oct_amd doi:10.1101/094276 Lei2016_rationalizing arxiv:1606.04155 Leibig2016_dr doi:10.1101/084210 -Levy-Jurgenson2018 doi:10.1101/491357 +Levy-Jurgenson2018 doi:10.1101/491357 Li2014_minibatch doi:10.1145/2623330.2623612 Li2016_variation doi:10.1126/science.aad9417 Liang2015_exprs_cancer doi:10.1109/TCBB.2014.2377729 @@ -171,7 +171,7 @@ McHardy2 doi:10.7717/peerj.1603 Metaphlan doi:10.1038/nmeth.2066 Meng2016_mllib arxiv:1505.06807 Min2016_deepenhancer doi:10.1109/BIBM.2016.7822593 -Momeni2018 doi:10.1101/438341 +Momeni2018 doi:10.1101/438341 Moritz2015_sparknet arxiv:1511.06051 Mordvintsev2015_inceptionism url:http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html Mrzelj url:https://repozitorij.uni-lj.si/IzpisGradiva.php?id=85515 @@ -179,7 +179,7 @@ matis doi:10.1016/S0097-8485(96)80015-5 nbc doi:10.1093/bioinformatics/btq619 Murdoch2017_automatic arxiv:1702.02540 Nemati2016_rl doi:10.1109/EMBC.2016.7591355 -Ni2018 doi:10.1101/385849 +Ni2018 doi:10.1101/385849 Nguyen2014_adversarial arxiv:1412.1897v4 Ngiam2011 url:https://ai.stanford.edu/~ang/papers/icml11-MultimodalDeepLearning.pdf Nie2016_3d_survival doi:10.1007/978-3-319-46723-8_25 @@ -191,9 +191,9 @@ onecodex url:https://www.onecodex.com/ Papernot2017_pate url:https://openreview.net/forum?id=HkwoSDPgg Park2016_deepmirgene arxiv:1605.00017 Parnamaa2017 doi:10.1534/g3.116.033654 -Pan2018 doi:10.1101/438218 +Pan2018 doi:10.1101/438218 Pawlowski2016 doi:10.1101/085118 -Peng2019 doi:10.1101/527044. +Peng2019 doi:10.1101/527044. Pereira2016_docking doi:10.1021/acs.jcim.6b00355 PerezSianes2016_screening doi:10.1007/978-3-319-40126-3_2 Phymm doi:10.1038/nmeth.1358 @@ -202,7 +202,7 @@ Pratt2016_dr doi:10.1016/j.procs.2016.07.014 Quang2017_factor doi:10.1101/151274 Qin2017_onehot doi:10.1371/journal.pcbi.1005403 Qiu2017_graph_embedding doi:10.1101/110668 -Qiu2018 doi:10.1101/406066 +Qiu2018 doi:10.1101/406066 Ragoza2016_protein arxiv:1612.02751 RAD2010_view_cc doi:10.1145/1721654.1721672 Radford_dcgan arxiv:1511.06434v2 @@ -269,7 +269,7 @@ Vera2016_sc_analysis doi:10.1146/annurev-genet-120215-034854 Vervier doi:10.1093/bioinformatics/btv683 Wallach2015_atom_net arxiv:1510.02855 Wang2016_breast_cancer arxiv:1606.05718 -Wang2016_methyl doi:10.1038/srep19598 +Wang2016_methyl doi:10.1038/srep19598 Wang2016_protein_contact doi:10.1371/journal.pcbi.1005324 Wasson1985_clinical doi:10.1056/NEJM198509263131306 WayGreene2017_eval arxiv:1711.04828 From 5a85c4cba5b350cf2e04aa2d8ad336b650facef4 Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Thu, 9 May 2019 21:00:07 -0400 Subject: [PATCH 5/9] Fixed typo in citations --- content/citation-tags.tsv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index e68fe288..7de3c341 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -193,7 +193,7 @@ Park2016_deepmirgene arxiv:1605.00017 Parnamaa2017 doi:10.1534/g3.116.033654 Pan2018 doi:10.1101/438218 Pawlowski2016 doi:10.1101/085118 -Peng2019 doi:10.1101/527044. +Peng2019 doi:10.1101/527044 Pereira2016_docking doi:10.1021/acs.jcim.6b00355 PerezSianes2016_screening doi:10.1007/978-3-319-40126-3_2 Phymm doi:10.1038/nmeth.1358 From f73bf91458e2d1227a2a14e54fc8f263fbc249c8 Mon Sep 17 00:00:00 2001 From: Joshua Levy Date: Fri, 19 Jul 2019 22:30:17 -0400 Subject: [PATCH 6/9] Update content/04.study.md Co-Authored-By: Casey Greene --- content/04.study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/04.study.md b/content/04.study.md index 0fde4f34..0d424c98 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -57,7 +57,7 @@ DeepSignal employs a convolutional neural network to construct features from raw DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. -Not only does imputing DNA methylation share similar limitations with genotype imputation, but there is additional complexity in correlation of DNA methylation that can depend on cell type and other variables related with DNA methylation that can vary by sample. +Imputing DNA methylation has complexities above and beyond genotype imputation: correlation of DNA methylation marks can depend on cell types and other factors that can vary by sample. As the abundance of available tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing), grows, the accuracy of imputation methods for DNA methylation is expected to increase. Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. From d4966516385a40548dd62da44896d04574c12cf3 Mon Sep 17 00:00:00 2001 From: Joshua Levy Date: Fri, 19 Jul 2019 22:30:39 -0400 Subject: [PATCH 7/9] Update content/04.study.md Co-Authored-By: Casey Greene --- content/04.study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/04.study.md b/content/04.study.md index 0d424c98..b819e01f 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -58,7 +58,7 @@ DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. Imputing DNA methylation has complexities above and beyond genotype imputation: correlation of DNA methylation marks can depend on cell types and other factors that can vary by sample. -As the abundance of available tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing), grows, the accuracy of imputation methods for DNA methylation is expected to increase. +As the number of tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing) grows, the accuracy of DNA methylation imputation is expected to increase. Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. For instance, one group employed a Deep Neural Network (DNN) to predict triglyceride concentrations pre- and post-treatment from approximately 450K features (differential DNAm levels) from the Illumina 450K microarray, and used the Dropout technique to generalize the model [@tag:Islam2018] [@tag:Darst2018]. From 6a8c0fa420c4c753147dd164c3e5cdc44b5897bf Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Fri, 19 Jul 2019 23:37:46 -0400 Subject: [PATCH 8/9] Added 2 studies, and summarized a bit more text --- content/04.study.md | 10 +++++----- content/citation-tags.tsv | 2 ++ 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/content/04.study.md b/content/04.study.md index b819e01f..c439206e 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -54,16 +54,16 @@ For example, the effects of cellular heterogeneity on basic biology and disease Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states. One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018]. -DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. -In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. +DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL, MRCNN and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Tian2019] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. +In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018] [@tag:Korfiatis2017]. While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. Imputing DNA methylation has complexities above and beyond genotype imputation: correlation of DNA methylation marks can depend on cell types and other factors that can vary by sample. As the number of tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing) grows, the accuracy of DNA methylation imputation is expected to increase. Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. -For instance, one group employed a Deep Neural Network (DNN) to predict triglyceride concentrations pre- and post-treatment from approximately 450K features (differential DNAm levels) from the Illumina 450K microarray, and used the Dropout technique to generalize the model [@tag:Islam2018] [@tag:Darst2018]. -Another study transformed methylation profiles of about ten thousand TCGA samples to perform classification tasks to differentiate 32 different cancer types using the concatenation of various Convolutional Neural Network Maps and learn important patterns of differentially methylated regions that were used to make the classifications [@tag:Chatterjee2018]. -Finally, the prediction of cancer subtypes using DNAm was proposed based on a deep autoencoder. The system exploited content retrieval mechanisms to additionally understand the cancer cell type differentiation of the predicted cancer types [@tag:Khwaja2018] based on methylation of CpG islands. +For instance, Deep Neural Networks (DNN) have been employed on DNA methylation data to predict triglyceride concentrations pre- and post-treatment [@tag:Islam2018] [@tag:Darst2018] and differentiate cancer subtypes [@tag:Chatterjee2018] [@tag:Khwaja2018] while outperforming other methods such as Support Vector Machine (SVM). +Modular approaches to methylation prediction, such as MethylNet, have been able to predict age, cellular proportions and cancer subtypes, outperforming SVM and Elastic Net models while remaining concordant with expected biology [@tag:Levy2019]. +These approaches aim to make embedding, hyperparameter selection, regression, classification and model interpretation tasks more tractable for epigenetics researchers and machine learning scientists. ### Splicing diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index 7de3c341..a4b9686a 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -139,6 +139,7 @@ Lee2016_deeptarget arxiv:1603.09123v2 Lee2016_emr_oct_amd doi:10.1101/094276 Lei2016_rationalizing arxiv:1606.04155 Leibig2016_dr doi:10.1101/084210 +Levy2019 doi:10.1101/692665 Levy-Jurgenson2018 doi:10.1101/491357 Li2014_minibatch doi:10.1145/2623330.2623612 Li2016_variation doi:10.1126/science.aad9417 @@ -260,6 +261,7 @@ Tan2015_adage doi:10.1128/mSystems.00025-15 Tan2016_eadage doi:10.1101/078659 TAC-ELM doi:10.1142/S0219720012500151 TensorFlow arxiv:1603.04467 +Tian2019 doi:10.1186/s12864-019-5488-5 Torracinta2016_deep_snp doi:10.1101/097469 Torracinta2016_sim doi:10.1101/079087 Tu1996_anns doi:10.1016/S0895-4356(96)00002-9 From cc2437f8fc20d61581742c3961240a0fa429ffe1 Mon Sep 17 00:00:00 2001 From: jlevy44 Date: Sun, 21 Jul 2019 21:44:03 -0400 Subject: [PATCH 9/9] Response to recommendations, methylation section --- content/04.study.md | 3 ++- content/citation-tags.tsv | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/content/04.study.md b/content/04.study.md index c439206e..17106539 100644 --- a/content/04.study.md +++ b/content/04.study.md @@ -52,13 +52,14 @@ For example, the effects of cellular heterogeneity on basic biology and disease #### Inference, Imputation, and Prediction Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states. -One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. +One of the more popular applications is imputing the degree of methylation at CpG sites that are within a few thousand base pairs of measured sites or present in similar samples. DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018]. DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL, MRCNN and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Tian2019] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018] [@tag:Korfiatis2017]. While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. Imputing DNA methylation has complexities above and beyond genotype imputation: correlation of DNA methylation marks can depend on cell types and other factors that can vary by sample. As the number of tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing) grows, the accuracy of DNA methylation imputation is expected to increase. +While these methods reduce the computational overhead at comparable performance to other popular methylation imputation methods such as K-Nearest Neighbors, Random Forest, Singular Value Decomposition and Multiple Imputation by Chained Equations, the software implementations will need to become more user-friendly to gain widespread adoption. Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. For instance, Deep Neural Networks (DNN) have been employed on DNA methylation data to predict triglyceride concentrations pre- and post-treatment [@tag:Islam2018] [@tag:Darst2018] and differentiate cancer subtypes [@tag:Chatterjee2018] [@tag:Khwaja2018] while outperforming other methods such as Support Vector Machine (SVM). diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv index a4b9686a..2a6e3170 100644 --- a/content/citation-tags.tsv +++ b/content/citation-tags.tsv @@ -139,7 +139,7 @@ Lee2016_deeptarget arxiv:1603.09123v2 Lee2016_emr_oct_amd doi:10.1101/094276 Lei2016_rationalizing arxiv:1606.04155 Leibig2016_dr doi:10.1101/084210 -Levy2019 doi:10.1101/692665 +Levy2019 doi:10.1101/692665 Levy-Jurgenson2018 doi:10.1101/491357 Li2014_minibatch doi:10.1145/2623330.2623612 Li2016_variation doi:10.1126/science.aad9417 @@ -261,7 +261,7 @@ Tan2015_adage doi:10.1128/mSystems.00025-15 Tan2016_eadage doi:10.1101/078659 TAC-ELM doi:10.1142/S0219720012500151 TensorFlow arxiv:1603.04467 -Tian2019 doi:10.1186/s12864-019-5488-5 +Tian2019 doi:10.1186/s12864-019-5488-5 Torracinta2016_deep_snp doi:10.1101/097469 Torracinta2016_sim doi:10.1101/079087 Tu1996_anns doi:10.1016/S0895-4356(96)00002-9