-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNA Methylation Deep Review Section # 2 of 3 - Inference, Imputation, and Prediction #954
Conversation
Planning on adding two more sections that expand on the points of the last paragraph. Will need help editing these points and making text more concise, to leave room for remaining two paragraphs. Also looking to adjust some text from the previous gene expression paragraphs and text surrounding latent space prediction.
Pulling recent changes from greenelab
Just need to tab delimit those citations. I think this was an IDE error (using Atom to edit). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really nice! I just had a couple places where I wanted either some clarification, a bit of synthesis, or a little bit more information. I am happy to take another look quickly once these changes are made.
content/04.study.md
Outdated
#### Inference, Imputation, and Prediction | ||
|
||
Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states. | ||
One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does neighboring mean immediately adjacent or in the region? How big are the windows (in rough terms)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends, I think some methods like DeepCpG predict methylation in sites of local windows, integrating local information from sequence features at distances that could be well over 1kb away from the site. Methods like DAPL could potentially be integrating information that could be more long range to impute missingness, there's no threshold on window size like DeepCpG because it's a fully connected denoising autoencoder. I think it can be an assumption that sites that are nearby could be more useful.
The short answer is within a region of a few kb using methods like DeepCpG.
DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018]. | ||
DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. | ||
In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. | ||
While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the current state of the art performance for imputation, and is it sufficient for downstream analyses (in your view) or is getting to "useful for many downstream analyses" still a work in progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally just use MICE, K-NN and even Mean imputation, and personally have not tried deep learning imputation approaches, though I am open to developing and implementing new methodologies. I think many of these methods are more geared towards BS-Seq, which can make it harder to adopt for users of 450K and EPIC arrays. Though its conceivable that some of these methods could speed up the analysis, incorporating other modalities may make them more accurate, but coming across this data could still be a challenge. I think making them useful, easy-to-use, and tractable may still be a challenge, but standardized and modular workflows that incorporate these methods may make them more easily adoptable and mainstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add maybe one or two sentences at the very end of this paragraph around how these methods compare to what's used in practice and whether or not they are at the stage yet where they can replace current methods? From my read of what you wrote, the answer is no because there are still some bespoke processes to get them working on new data (which is not true of other methods). However, you can see a path to get there. Is that right?
content/04.study.md
Outdated
Once DNA methylation is measured, deep learning approaches can also be used to perform classification and regression tasks. | ||
For instance, one group employed a Deep Neural Network (DNN) to predict triglyceride concentrations pre- and post-treatment from approximately 450K features (differential DNAm levels) from the Illumina 450K microarray, and used the Dropout technique to generalize the model [@tag:Islam2018] [@tag:Darst2018]. | ||
Another study transformed methylation profiles of about ten thousand TCGA samples to perform classification tasks to differentiate 32 different cancer types using the concatenation of various Convolutional Neural Network Maps and learn important patterns of differentially methylated regions that were used to make the classifications [@tag:Chatterjee2018]. | ||
Finally, the prediction of cancer subtypes using DNAm was proposed based on a deep autoencoder. The system exploited content retrieval mechanisms to additionally understand the cancer cell type differentiation of the predicted cancer types [@tag:Khwaja2018] based on methylation of CpG islands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you synthesize this a little bit? How did performance relate to other methods? Was there anything unique/particularly interesting about what was found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a new commit soon. Thanks for all of these edits and questions @cgreene . I'd also like to find a place to add https://www.biorxiv.org/content/10.1101/692665v1 , though it may be more appropriate in the embedding section (especially with mention of hyperparameter optimization). I'll add it here for now and see if we can move it around soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you'd like me to incorporate the above discussions into the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I committed some more text, take a look!
I think the final paragraph could still use some work, though I synthesized some sections.
Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>
Co-Authored-By: Casey Greene <cgreene@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more things. Looking good!
content/04.study.md
Outdated
#### Inference, Imputation, and Prediction | ||
|
||
Deep learning approaches are beginning to help address some of the current limitations of feature-by-feature analysis approaches to DNA methylation data, and may help uncover additional important features necessary to understand the biological underpinnings behind different pathological states. | ||
One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the more popular applications is the prediction of the degree of methylation at CpG sites neighboring measured sites. | |
One of the more popular applications is imputing the degree of methylation at CpG sites that are within a few thousand base pairs of measured sites. |
Is this appropriate? I think it would be helpful to include some idea of what range the methods are trying to predict. Feel free to re-word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be appropriate enough. For the AE methods, it would be appropriate to add that either within a few thousand sites (one sample) or informed by similar samples.
DeepSignal employs a convolutional neural network to construct features from raw electrical Nanopore signals from sites near a methylated base, and concatenates uses a bi-directional recurrent neural network on DNA sequences of the aligned signals to detect methylation [@tag:Ni2018]. | ||
DeepCpG applies a similar method using scBS-Seq, DNA sequence and Bidirectional GRUs [@tag:Angermueller2017] and methods like DAPL and DeepMethyl incorporate sequence and topological structure [@tag:Qiu2018] [@tag:Khwaja2017] [@tag:Wang2016_methyl] [@tag:Fu2019]. | ||
In addition to this, Gene expression has been used to infer and impute methylation states [@tag:Peng2019] [@tag:Levy-Jurgenson2018], methylation of genes predicted from promoter methylation [@tag:Pan2018], and convolutional models have been able to predict methylation status from images [@tag:Momeni2018][@tag:Korfiatis2017]. | ||
While these examples of methylation imputation and inference methods have value it is imperative to recognize limitations of imputing cytosine modifications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add maybe one or two sentences at the very end of this paragraph around how these methods compare to what's used in practice and whether or not they are at the stage yet where they can replace current methods? From my read of what you wrote, the answer is no because there are still some bespoke processes to get them working on new data (which is not true of other methods). However, you can see a path to get there. Is that right?
Also looks like some refs have spaces instead of tabs:
|
Okay @cgreene I've added more edits. Hopefully this does the trick, but let me know if you have more questions. Thank you for the feedback. |
Hi there,
Following discussions from #942 , and the closing of #947 , our team has finished our internal edits, and are ready to PR. Our PR plan is for each author to submit their section in the same space reserved for DNA methylation, and then we can move sections around from there and merge/stitch together content from our three PRs.
I will be PR'ing the second of three deep review sections. It's focused on inferences, imputation, and prediction on methylation using deep learning.
Here are the order of the PR sections that our group will be submitting:
Thanks, and looking forward to the review.