You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clearly written article predicting the location of enhancers using chromatin signatures. The method (CSI-ANN) does not have great performance predicting enhancers in HeLa cells or CD4+ T cells but it significantly outperformed the state of the art in 2010 (see table 2). It is possible (maybe even likely) that the gold standard for enhancer locations is diluting performance. Several of the computation steps I have not seen in this context before - but I think are clever manipulations of the data that actually seem to make sense.
Biology
Six chromatin marks from ENCODE to predict enhancers in HeLa cells and 39 histone marks to predict enhancers in CD4+ T cells.
Computation
The authors use the chromatin marks to engineer a single feature that is input into a time-delay neural network (TDNN).
The single feature is built using a Fisher discriminant analysis (FDA) applied to histone marks
Mean and "energy transformed marks"
Finds the linear combination of features that maximally separates background from enhancer.
This feature is computed genome wide by a sliding window of 2.5 kb (with a 1.25 kb step size)
Training and testing on two different cell types with reasonable performance
General comments
Good discussion points about their feature engineering decisions - namely, a non-linear feature extractor may work better (an autoencoder maybe?). I also think lack of gold standards here harm performance reports - something that could be a major problem when applying to supervised learning problems and (although less so) unsupervised tasks
https://dx.doi.org/10.1093/bioinformatics/btq248
The text was updated successfully, but these errors were encountered: