Skip to content

Commit

Permalink
Third section of methylation (#955)
Browse files Browse the repository at this point in the history
* Update 04.study.md

* Apply suggestions from code review

Updates based on pull-request peer-review.

Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>

* Update 04.study.md

* Apply suggestions from code review

Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>

* Update content/04.study.md

Additional clarity

Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>

* Apply suggestions from code review

Updates based on peer-review

Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>

* Update content/04.study.md

resolve citation error

Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>
  • Loading branch information
AlexanderTitus and cgreene committed Jul 23, 2019
1 parent 9f19543 commit f000c4c
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions content/04.study.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,30 @@ For instance, Deep Neural Networks (DNN) have been employed on DNA methylation d
Modular approaches to methylation prediction, such as MethylNet, have been able to predict age, cellular proportions and cancer subtypes, outperforming SVM and Elastic Net models while remaining concordant with expected biology [@tag:Levy2019].
These approaches aim to make embedding, hyperparameter selection, regression, classification and model interpretation tasks more tractable for epigenetics researchers and machine learning scientists.

#### Latent Space Construction

Unsupervised discovery of biologically-significant features is another major area of interest for researchers using DNA methylation data.
A consistent theme of these methods is that they construct a low-dimensional space that semantically encodes biologically important features from methylation profiles.
As with other applications, these low-dimensional representations are thought to capture a set of important, unmeasured sources of biological variability in the data, and that projection into these spaces results in biologically-similar examples being close together.
For this reason, they are often termed latent spaces.
One method used several stacked binary restricted Boltzmann machines (forming a deep neural network) to learn a low-dimensional subspace representation of the methylation profiles of 5000 CpG sites with highest variance across 136 women breast tissue samples, 113 breast cancer samples and 23 non-cancerous samples, and samples in the latent space were clustered (via self-organizing maps) to show that the latent space could differentiate breast cancer samples from non-neoplastic samples.
Furthermore, the latent space was visualized using t-SNE (t-distributed stochastic neighbor embedding) [@arxiv:1808.01359].
Titus et. al. [@doi:10.5220/0006636401400145] adapted a VAE strategy developed by Way et. al. [@doi:10.1142/9789813235533_0008] to methylation data.
The VAE was modified to perform dimensionality reduction on 300,000 PAM50-assigned CpG features to 100 latent features in 862 samples.
The authors performed t-SNE visualization, clustering, and classified tumor subtypes from a Breast Cancer dataset from TCGA.
In an subsequent extension of this work [@doi:10.1101/433763], the authors constructed a 100-dimensional latent space of 100k CpG sites across around 1200 samples, and selected latent space dimensions that were the most highly associated with the differentiation between estrogen-response (ER) positive and negative tumor samples in breast cancer patients to determine the extent to which the latent space could predict responses to endocrine therapy.
Certain latent space dimensions differentiated tumors based on their ER status and provided biologically-plausible hypotheses, which suggests that VAE-derived models may have a place in summarizing DNA methylation profiles into composite features that can aid in predicting treatment response.
Another study explored the latent features of lung cancer methylation profiles that were extracted using variational autoencoders.
After constructing a latent space representations of TCGA lung cancer samples, the authors used a logistic regression classifier on the latent dimentions to accturately classify cancer subtypes [@doi:10.1109/BIBM.2018.8621365].
These studies, along with the growing body of work using VAEs and other latent representations of genomic and epigenomic data demonstrate a suite of tools to explore the unmeasured aspects of biology.
Techniques that produce these representations provide the opportunity to discover important biological features that were previously missed.
The power of unsupervised deep learning models for this task comes from their ability to learn high-dimensional non-linear relationships among data.

Important applications in the future include predicting methylation and pathological states based on methylation profiles uncovered from datasets with more noise, such as solid tissue samples over blood samples.
Unsupervised deep learning approaches such as variational autoencoders, which leverage measured points to produce a generative, low-dimensional representation, may provide a more complete understanding of the biological processes underlying cell types, transitions in cell dynamics, and subject phenotypes.
In addition, latent representations may assist with biological hypothesis generation and have the ability to stratify patients by predicted risk.
While neural-network embeddings can outperform traditional embeddings, it is important to be aware that many of these methods can be highly sensitive to hyperparameter tuning and an evaluation of the impact of hyperparameter tuning should be included [@doi:10.1101/385534].

### Splicing

Pre-mRNA transcripts can be spliced into different isoforms by retaining or skipping subsets of exons or including parts of introns, creating enormous spatiotemporal flexibility to generate multiple distinct proteins from a single gene.
Expand Down

0 comments on commit f000c4c

Please sign in to comment.