Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

Open
gwaybio opened this issue Aug 10, 2016 · 1 comment

Comments

@gwaybio
Copy link
Contributor

gwaybio commented Aug 10, 2016

https://dx.doi.org/10.1093/bioinformatics/btq248

@gwaybio
Copy link
Contributor Author

gwaybio commented Aug 15, 2016

Clearly written article predicting the location of enhancers using chromatin signatures. The method (CSI-ANN) does not have great performance predicting enhancers in HeLa cells or CD4+ T cells but it significantly outperformed the state of the art in 2010 (see table 2). It is possible (maybe even likely) that the gold standard for enhancer locations is diluting performance. Several of the computation steps I have not seen in this context before - but I think are clever manipulations of the data that actually seem to make sense.

Biology

Six chromatin marks from ENCODE to predict enhancers in HeLa cells and 39 histone marks to predict enhancers in CD4+ T cells.

Computation

  • The authors use the chromatin marks to engineer a single feature that is input into a time-delay neural network (TDNN).
  • The single feature is built using a Fisher discriminant analysis (FDA) applied to histone marks
    • Mean and "energy transformed marks"
    • Finds the linear combination of features that maximally separates background from enhancer.
    • This feature is computed genome wide by a sliding window of 2.5 kb (with a 1.25 kb step size)
  • TDNN
  • Trained with particle swarm optimization
  • Training and testing on two different cell types with reasonable performance

General comments

Good discussion points about their feature engineering decisions - namely, a non-linear feature extractor may work better (an autoencoder maybe?). I also think lack of gold standards here harm performance reports - something that could be a major problem when applying to supervised learning problems and (although less so) unsupervised tasks

@gwaybio gwaybio added the study label Nov 9, 2016
dhimmel added a commit to dhimmel/deep-review that referenced this issue Nov 3, 2017
Do not use latest miniconda, since updates can cause bugs, which
happened in manubot/rootstock#75 (comment)
due to https://stackoverflow.com/a/46457813/4651668
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants