Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporated changes from my proofread of the study section #493

Merged
merged 5 commits into from
May 22, 2017

Conversation

cgreene
Copy link
Member

@cgreene cgreene commented May 21, 2017

No description provided.

@cgreene cgreene requested a review from agitter May 21, 2017 16:15
@agitter
Copy link
Collaborator

agitter commented May 22, 2017

I'll review this Monday morning

Copy link
Contributor

@agapow agapow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some typos and suggestions

@@ -56,7 +56,7 @@ approaches applied to gene expression data are powerful methods for
identifying gene signatures that may otherwise be overlooked.
An additional benefit of unsupervised approaches is that
ground truth labels, which are often difficult to acquire or are incorrect, are
nonessential. However, careful interpretation must be performed regarding how
nonessential. However, careful interpretation must be performed when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"careful interpretation must be performed" sounds way awkward to me. "interpretation must be careful when"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the genes that have been aggregated into features must be interpreted carefully"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded 👍

its links to complex disease, which will lead to novel diagnostics and
therapeutics.
therapies to correct splicing defects. However, to achieve this we expect that
methods to interpret the "black box" of deep neural networks and integrate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"integrate this with"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it as is. The "integrate" refers to multiple data sources.

would be very time consuming in a lab setting but was easy to simulate using
their model. As we learn to better visualize and analyze the hidden nodes within
base pairs in a sequence and see how the model changed its prediction. Though
time consuming to assay in a lab, this was easy to simulate the computational
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this was easy to simulate the computational": word missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

million base pairs upstream or downstream from the affected promoter, on either
strand, even within the introns of other genes [@doi:10.1038/nrg3458]. They do
million base pairs upstream or downstream from the affected promoter on either
strand even within the introns of other genes [@doi:10.1038/nrg3458]. They do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and even" / "or even"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

insights.

### Single-cell data

Single-cell methods are generating extreme excitement as biologists recognize
Single-cell methods are generating excitement as biologists recognize
the vast heterogeneity within unicellular species and between cells of the same
tissue type in the same organism [@tag:Gawad2016_singlecell]. For instance,
tumor cells and neurons can both harbor extensive somatic variation
[@tag:Lodato2015_neurons]. Understanding single-cell diversity in all its
dimensions — genetic, epigenetic, transcriptomic, proteomic, morphologic, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long dash or double dash? or will either work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with double dash. I think that's what we've done elsewhere 👍

specific individual, but also to specific pathological subsets of cells.
Single-cell methods also promise to uncover a wealth of new biological
knowledge. A sufficiently large population of single cells will have enough
representative "snapshots" to recreate timelines of dynamic biological processes.
If tracking processes over time is not the limiting factor, single-cell
techniques can provide maximal resolution compared to averaging across all cells
in bulk tissue, enabling the study of transcriptional bursting with single-cell
FISH or the heterogeneity of epigenetic patterns with single-cell Hi-C or
fluorescence in situ hybridization or the heterogeneity of epigenetic patterns with single-cell Hi-C or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

italicise in situ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, "in situ"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -586,23 +576,23 @@ for dealing with batch effects [@tag:Shaham2016_batch_effects].

Examining populations of single cells can reveal biologically meaningful subsets
of cells as well as their underlying gene regulatory networks
[@tag:Gaublomme2015_th17]. Unfortunately, machine learning generally struggles
[@tag:Gaublomme2015_th17]. Unfortunately, machine learning methods generally struggle
with imbalanced data — when there are many more examples of class 1 than class 2 —
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a single hyphen not a long dash. Suggest this could all be cleaned up near end with a simple search and replace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[@tag:Abe]. Then, researchers began to use techniques that could estimate
relative abundances from an entire sample, which is much faster than classifying
relative abundances from an entire sample more quickly than classifying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "faster" reads better than "more quickly"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[@tag:Word2Vec] in natural language processing) for protein family
classification have been introduced and classified with a skip-gram neural
network [@tag:Asgari]. Recurrent neural networks show good performance for
homology and protein family identification [@tag:Hochreiter @tag:Sonderby].
Interestingly, Hochreiter, who invented Long Short Term Memory (LSTM), delved
into homology/protein family classification in 2007, and therefore, deep
learning is deeply rooted in functional classification methods.

One of the first techniques of *de novo* genome binning used self-organizing
maps, a type of neural network [@tag:Abe]. Essinger et al. used Adaptive Resonance Theory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shift citation to just after Essinger et al.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments from me and @agapow, then looks good to me.

specific individual, but also to specific pathological subsets of cells.
Single-cell methods also promise to uncover a wealth of new biological
knowledge. A sufficiently large population of single cells will have enough
representative "snapshots" to recreate timelines of dynamic biological processes.
If tracking processes over time is not the limiting factor, single-cell
techniques can provide maximal resolution compared to averaging across all cells
in bulk tissue, enabling the study of transcriptional bursting with single-cell
FISH or the heterogeneity of epigenetic patterns with single-cell Hi-C or
fluorescence in situ hybridization or the heterogeneity of epigenetic patterns with single-cell Hi-C or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, "in situ"

outperforming logistic regression and distance-based outlier detection methods.
However, they did not benchmark against random forests, which tend to work better
for imbalanced data, and their data was
relatively low dimensional. Future work is needed to establish the utility of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of #495, I don't see how improvements in image classification tell us anything about cell subset identification. Can we stop the sentence after "cell subset identification."?

@@ -56,7 +56,7 @@ approaches applied to gene expression data are powerful methods for
identifying gene signatures that may otherwise be overlooked.
An additional benefit of unsupervised approaches is that
ground truth labels, which are often difficult to acquire or are incorrect, are
nonessential. However, careful interpretation must be performed regarding how
nonessential. However, careful interpretation must be performed when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the genes that have been aggregated into features must be interpreted carefully"?

its links to complex disease, which will lead to novel diagnostics and
therapeutics.
therapies to correct splicing defects. However, to achieve this we expect that
methods to interpret the "black box" of deep neural networks and integrate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it as is. The "integrate" refers to multiple data sources.

@cgreene cgreene merged commit c0cbf63 into greenelab:master May 22, 2017
@cgreene cgreene deleted the cgreene-study-proofread branch May 22, 2017 13:40
dhimmel pushed a commit that referenced this pull request May 22, 2017
This build is based on
c0cbf63.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/234828498
https://travis-ci.org/greenelab/deep-review/jobs/234828499

[ci skip]

The full commit message that triggered this build is copied below:

Incorporated changes from my proofread of the study section (#493)

* initial proofreads up to metagenomics

* finish proofread

* address comments

* address build failure
dhimmel pushed a commit that referenced this pull request May 22, 2017
This build is based on
c0cbf63.

This commit was created by the following Travis CI build and job:
https://travis-ci.org/greenelab/deep-review/builds/234828498
https://travis-ci.org/greenelab/deep-review/jobs/234828499

[ci skip]

The full commit message that triggered this build is copied below:

Incorporated changes from my proofread of the study section (#493)

* initial proofreads up to metagenomics

* finish proofread

* address comments

* address build failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants