greenelab · cgreene · Jul 22, 2019 · Jan 20, 2019 · Jan 20, 2019 · May 9, 2019
diff --git a/content/04.study.md b/content/04.study.md
@@ -47,6 +47,25 @@ Deep learning applied to gene expression data is still in its infancy, but the f
 Many previously untestable hypotheses can now be interrogated as deep learning enables analysis of increasing amounts of data generated by new technologies.
 For example, the effects of cellular heterogeneity on basic biology and disease etiology can now be explored by single-cell RNA-seq and high-throughput fluorescence-based imaging, techniques we discuss below that will benefit immensely from deep learning approaches.
 
+### DNA Methylation
+
+#### Inference, Imputation, and Prediction
+
+Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states.
+One of the more popular applications is imputing the degree of methylation at CpG sites that are within a few thousand base pairs of measured sites or present in similar samples.
+DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018].
+DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL, MRCNN and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Tian2019] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019].
+In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018] [@tag:Korfiatis2017].
+While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications.
+Imputing DNA methylation has complexities above and beyond genotype imputation: correlation of DNA methylation marks can depend on cell types and other factors that can vary by sample.
+As the number of tissue types and cell types with whole-genome bisulfite sequencing (and oxidative bisulfite sequencing) grows, the accuracy of DNA methylation imputation is expected to increase.
+While these methods reduce the computational overhead at comparable performance to other popular methylation imputation methods such as K-Nearest Neighbors, Random Forest, Singular Value Decomposition and Multiple Imputation by Chained Equations, the software implementations will need to become more user-friendly to gain widespread adoption.  
+
+Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks.
+For instance, Deep Neural Networks (DNN) have been employed on DNA methylation data to predict triglyceride concentrations pre- and post-treatment [@tag:Islam2018] [@tag:Darst2018] and differentiate cancer subtypes [@tag:Chatterjee2018] [@tag:Khwaja2018] while outperforming other methods such as Support Vector Machine (SVM).
+Modular approaches to methylation prediction, such as MethylNet, have been able to predict age, cellular proportions and cancer subtypes, outperforming SVM and Elastic Net models while remaining concordant with expected biology [@tag:Levy2019].
+These approaches aim to make embedding, hyperparameter selection, regression, classification and model interpretation tasks more tractable for epigenetics researchers and machine learning scientists.
+
 ### Splicing
 
 Pre-mRNA transcripts can be spliced into different isoforms by retaining or skipping subsets of exons or including parts of introns, creating enormous spatiotemporal flexibility to generate multiple distinct proteins from a single gene.

diff --git a/content/citation-tags.tsv b/content/citation-tags.tsv
@@ -10,6 +10,7 @@ Asgari	doi:10.1371/journal.pone.0141287
 blast	doi:10.1016/S0022-2836(05)80360-2
 Angermueller2016_dl_review	doi:10.15252/msb.20156651
 Angermueller2016_single_methyl	doi:10.1186/s13059-017-1189-z
+Angermueller2017	doi:10.1186/s13059-017-1189-z
 Artemov2016_clinical	doi:10.1101/095653
 Arvaniti2016_rare_subsets	doi:10.1101/046508
 Bach2015_on	doi:10.1371/journal.pone.0130140
@@ -27,6 +28,7 @@ Bracken2016_mirna	doi:10.1038/nrg.2016.134
 Boza	doi:10.1371/journal.pone.0178751
 Buggenthin2017_imaged_lineage	doi:10.1038/nmeth.4182
 Burlina2016_amd	doi:10.1109/ISBI.2016.7493240
+Chatterjee2018	arxiv:1807.09617
 Caruana2014_need	arxiv:1312.6184
 Caruana2015_intelligible	url:https://dl.acm.org/citation.cfm?id=2788613
 Chaudhary2017_multiom_liver_cancer	doi:10.1101/114892
@@ -46,6 +48,7 @@ Codella2016_ensemble_melanoma	arxiv:1610.04662
 Consortium2012_encode	doi:10.1038/nature11247
 CudNN	arxiv:1410.0759
 Dahl2014_multi_qsar	arxiv:1406.1231
+Darst2018	doi:10.1186/s12863-018-0646-3
 Dean2012_nips_downpour	url:http://research.google.com/archive/large_deep_networks_nips2012.html
 DeepChem	url:https://github.com/deepchem/deepchem
 Deming2016_genetic	arxiv:1605.07156
@@ -70,6 +73,7 @@ Esteva2017_skin_cancer_nature	doi:10.1038/nature21056
 Faruqi	url:http://alifar76.github.io/sklearn-metrics/
 Finnegan2017_maximum	doi:10.1101/105957
 Fong2017_perturb	doi:10.1109/ICCV.2017.371
+Fu2019	doi:10.1109/TCBB.2019.2909237
 Gal2015_dropout	arxiv:1506.02142
 Gaublomme2015_th17	doi:10.1016/j.cell.2015.11.009
 Gargeya2017_dr	doi:10.1016/j.ophtha.2017.02.008
@@ -100,6 +104,7 @@ Hubara2016_qnn	arxiv:1609.07061
 Huddar2016_predicting	doi:10.1109/ACCESS.2016.2618775
 Hughes2016_macromol_react	doi:10.1021/acscentsci.6b00162
 Iglovikov2017_baa	doi:10.1101/234120
+Islam2018	doi:10.1186/s12919-018-0121-1
 Ithapu2015_efficient	doi:10.1016/j.jalz.2015.01.010
 Jafari2016_skin_lesions	doi:10.1007/s11548-017-1567-8
 Jha2017_integrative_models	doi:10.1101/104869
@@ -121,16 +126,21 @@ Koh2016_denoising	doi:10.1101/052118
 Koh2017_understanding	arxiv:1703.04730
 Kooi2016_mamm_lesions	doi:10.1016/j.media.2016.07.007
 Kooi2017_mamm_tl	doi:10.1002/mp.12110
+Korfiatis2017	doi:10.1007/s10278-017-0009-z
 Kraus2017_deeploc	doi:10.15252/msb.20177551
 Krizhevsky2013_nips_cnn	url:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
 Krizhevsky2014_weird_trick	arxiv:1404.5997
+Khwaja2017	doi:10.1109/BIOCAS.2017.8325078
+Khwaja2018	arxiv:1810.01243
 Lacey2016_dl_fpga	arxiv:1602.04283
 Lakhani2017_radiography	doi:10.1148/radiol.2017162326
 Lanchantin2016_motif	arxiv:1608.03644
 Lee2016_deeptarget	arxiv:1603.09123v2
 Lee2016_emr_oct_amd	doi:10.1101/094276
 Lei2016_rationalizing	arxiv:1606.04155
 Leibig2016_dr	doi:10.1101/084210
+Levy2019	doi:10.1101/692665
+Levy-Jurgenson2018	doi:10.1101/491357
 Li2014_minibatch	doi:10.1145/2623330.2623612
 Li2016_variation	doi:10.1126/science.aad9417
 Liang2015_exprs_cancer	doi:10.1109/TCBB.2014.2377729
@@ -162,13 +172,15 @@ McHardy2	doi:10.7717/peerj.1603
 Metaphlan	doi:10.1038/nmeth.2066
 Meng2016_mllib	arxiv:1505.06807
 Min2016_deepenhancer	doi:10.1109/BIBM.2016.7822593
+Momeni2018	doi:10.1101/438341
 Moritz2015_sparknet	arxiv:1511.06051
 Mordvintsev2015_inceptionism	url:http://googleresearch.blogspot.co.uk/2015/06/inceptionism-going-deeper-into-neural.html
 Mrzelj	url:https://repozitorij.uni-lj.si/IzpisGradiva.php?id=85515
 matis	doi:10.1016/S0097-8485(96)80015-5
 nbc	doi:10.1093/bioinformatics/btq619
 Murdoch2017_automatic	arxiv:1702.02540
 Nemati2016_rl	doi:10.1109/EMBC.2016.7591355
+Ni2018	doi:10.1101/385849
 Nguyen2014_adversarial	arxiv:1412.1897v4
 Ngiam2011	url:https://ai.stanford.edu/~ang/papers/icml11-MultimodalDeepLearning.pdf
 Nie2016_3d_survival	doi:10.1007/978-3-319-46723-8_25
@@ -180,7 +192,9 @@ onecodex	url:https://www.onecodex.com/
 Papernot2017_pate	url:https://openreview.net/forum?id=HkwoSDPgg
 Park2016_deepmirgene	arxiv:1605.00017
 Parnamaa2017	doi:10.1534/g3.116.033654
+Pan2018	doi:10.1101/438218
 Pawlowski2016	doi:10.1101/085118
+Peng2019	doi:10.1101/527044
 Pereira2016_docking	doi:10.1021/acs.jcim.6b00355
 PerezSianes2016_screening	doi:10.1007/978-3-319-40126-3_2
 Phymm	doi:10.1038/nmeth.1358
@@ -189,6 +203,7 @@ Pratt2016_dr	doi:10.1016/j.procs.2016.07.014
 Quang2017_factor	doi:10.1101/151274
 Qin2017_onehot	doi:10.1371/journal.pcbi.1005403
 Qiu2017_graph_embedding	doi:10.1101/110668
+Qiu2018	doi:10.1101/406066
 Ragoza2016_protein	arxiv:1612.02751
 RAD2010_view_cc	doi:10.1145/1721654.1721672
 Radford_dcgan	arxiv:1511.06434v2
@@ -246,6 +261,7 @@ Tan2015_adage	doi:10.1128/mSystems.00025-15
 Tan2016_eadage	doi:10.1101/078659
 TAC-ELM	doi:10.1142/S0219720012500151
 TensorFlow	arxiv:1603.04467
+Tian2019	doi:10.1186/s12864-019-5488-5
 Torracinta2016_deep_snp	doi:10.1101/097469
 Torracinta2016_sim	doi:10.1101/079087
 Tu1996_anns	doi:10.1016/S0895-4356(96)00002-9
@@ -255,6 +271,7 @@ Vera2016_sc_analysis	doi:10.1146/annurev-genet-120215-034854
 Vervier	doi:10.1093/bioinformatics/btv683
 Wallach2015_atom_net	arxiv:1510.02855
 Wang2016_breast_cancer	arxiv:1606.05718
+Wang2016_methyl	doi:10.1038/srep19598
 Wang2016_protein_contact	doi:10.1371/journal.pcbi.1005324
 Wasson1985_clinical	doi:10.1056/NEJM198509263131306
 WayGreene2017_eval	arxiv:1711.04828