Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

gwaybio · 2016-08-10T17:13:35Z

https://dx.doi.org/10.1093/bioinformatics/btq248

gwaybio · 2016-08-15T21:20:01Z

Clearly written article predicting the location of enhancers using chromatin signatures. The method (CSI-ANN) does not have great performance predicting enhancers in HeLa cells or CD4+ T cells but it significantly outperformed the state of the art in 2010 (see table 2). It is possible (maybe even likely) that the gold standard for enhancer locations is diluting performance. Several of the computation steps I have not seen in this context before - but I think are clever manipulations of the data that actually seem to make sense.

Biology

Six chromatin marks from ENCODE to predict enhancers in HeLa cells and 39 histone marks to predict enhancers in CD4+ T cells.

Computation

The authors use the chromatin marks to engineer a single feature that is input into a time-delay neural network (TDNN).
The single feature is built using a Fisher discriminant analysis (FDA) applied to histone marks
- Mean and "energy transformed marks"
- Finds the linear combination of features that maximally separates background from enhancer.
- This feature is computed genome wide by a sliding window of 2.5 kb (with a 1.25 kb step size)
TDNN
- One input layer, one hidden layer, one output layer
  - A supervised algorithm with a similar architecture seen in ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions #22
- The way I see it, a TDNN has operations similar to convolutions
  - The "delay" can capture local dependencies and changes among peaks of the engineered variable
Trained with particle swarm optimization
Training and testing on two different cell types with reasonable performance

General comments

Good discussion points about their feature engineering decisions - namely, a non-linear feature extractor may work better (an autoencoder maybe?). I also think lack of gold standards here harm performance reports - something that could be a major problem when applying to supervised learning problems and (although less so) unsupervised tasks

Do not use latest miniconda, since updates can cause bugs, which happened in manubot/rootstock#75 (comment) due to https://stackoverflow.com/a/46457813/4651668

gwaybio added paper supervised labels Aug 15, 2016

gwaybio added this to the Initial review of primary literature milestone Aug 15, 2016

gwaybio mentioned this issue Aug 23, 2016

Refine our guiding question. #88

Closed

gwaybio added the study label Nov 9, 2016

dhimmel added a commit to dhimmel/deep-review that referenced this issue Nov 3, 2017

Pin miniconda version (greenelab#76)

7d64afa

Do not use latest miniconda, since updates can cause bugs, which happened in manubot/rootstock#75 (comment) due to https://stackoverflow.com/a/46457813/4651668

cgreene added the backlog label Mar 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

gwaybio commented Aug 10, 2016

gwaybio commented Aug 15, 2016

Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

Discover regulatory DNA elements using chromatin signatures and artificial neural network. #76

Comments

gwaybio commented Aug 10, 2016

gwaybio commented Aug 15, 2016

Biology

Computation

General comments