Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorize study section minor additions/modifications #183

Merged
merged 6 commits into from
Jan 26, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 148 additions & 88 deletions sections/03_categorize.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,22 @@ improves.
*Would deep learning enable us to do this automatically in some principled way?
Are there reasons to believe that this would be advantageous? Would it be
positive to have disease categories changed by data, or would the changing
definition (i.e. as more data are accumulated) actually be harmful? What impacts
would this have on the training of physicians?*
definition (i.e. as more data are accumulated) actually be harmful? What
impacts would this have on the training of physicians?*

*What are the major challenges in this space, and does deep learning enable us to
tackle any of them? Are there example approaches whereby deep learning is
*What are the major challenges in this space, and does deep learning enable us
to tackle any of them? Are there example approaches whereby deep learning is
already having a transformational impact? I (Casey) have added some sections
below where I think we could contribute to the field with our discussion.*

### Major areas of existing contributions

*There are a number of major challenges in this space. How do we get data
together from multiple distinct systems? How do we find biologically meaningful
patterns in that data? How do we store and compute on this data at scale? How do
we share these data while respecting privacy? I've made a section for each of
these. Feel free to add more. I see each section as something on the order of
1-2 paragraphs in our context.*
patterns in that data? How do we store and compute on this data at scale? How
do we share these data while respecting privacy? I've made a section for each
of these. Feel free to add more. I see each section as something on the order
of 1-2 paragraphs in our context.*

#### Clinical care

Expand All @@ -52,18 +52,24 @@ high-quality labeled examples are also difficult to obtain
[@doi:10.1101/039800].

In addition to radiographic images, histology slides are also being analyzed
with deep learning approaches. In recent work, Wang et al.[@arxiv:1606.05718]
analyzed stained slides to identify cancers within slides of lymph node slices.
The approach provided a probability map for each slide. On this task a
pathologist has about a 3% error rate. The pathologist did not produce any false
positives, but did have a number of false negatives. Their algorithm had about
twice the error rate of a pathologist. However, their algorithms errors were not
strongly correlated with the pathologist. Theoretically, combining both could
reduce the error rate to under 1%. In this area, these algorithms may be ready
to incorporate into existing tools to aid pathologists. The authors' work
suggests that this could reduce the false negative rate of such evaluations.
`TODO: Incorporate #71 via @brettbj who has covered in journal club and has
notes.`
with deep learning approaches. Ciresan et al.
[@doi:10.1007/978-3-642-40763-5_51] developed one of the earliest examples,
winning the 2012 International Conference on Pattern Recognition's Contest on
Mitosis Detection while achieving human competitive accuracy. Their approach
uses what has become a standard convolutional neural network architecture
trained on public data. In more recent work, Wang et al.[@arxiv:1606.05718]
analyzed stained slides to identify cancers within slides of lymph node slices.
The approach provided a probability map for each slide. On this task a
pathologist has about a 3% error rate. The pathologist did not produce any
false positives, but did have a number of false negatives. Their algorithm had
about twice the error rate of a pathologist. However, their algorithms errors
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to keep rough chronological order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to be an improvement to me 👍

were not strongly correlated with the pathologist. Theoretically, combining
both could reduce the error rate to under 1%. In this area, these algorithms
may be ready to incorporate into existing tools to aid pathologists. The
authors' work suggests that this could reduce the false negative rate of such
evaluations. This theme of an ensemble between deep learning algorithm and
human expert may help overcome some of the challenges presented by data
limitations.

One source of training examples with rich clinical annotations is the electronic
health record. Recently Lee et al.[@doi:10.1101/094276] developed an approach to
Expand All @@ -74,31 +80,30 @@ network. Combining this data resource with standard deep learning techniques,
the authors reach greater than 93% accuracy. One item that is important to note
with regards to this work is that the authors used their test set for evaluating
when training had concluded. In other domains, this has resulted in a minimal
change in the estimated accuracy
[@url:http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf].
However, there is not yet a single accepted standard within the field of
biomedical research for such evaluations. We recommend the use of an independent
test set wherever it is feasible. Despite this minor limitation, the work
clearly illustrates the potential that can be unlocked from images stored in
electronic health records.

Potential remaining topics: #122 & #151 looked interesting from an early glance.
change in the estimated accuracy [@url:http://papers.nips.cc/paper/4824
-imagenet-classification-with-deep-convolutional-neural-networks.pdf]. However,
there is not yet a single accepted standard within the field of biomedical
research for such evaluations. We recommend the use of an independent test set
wherever it is feasible. Despite this minor limitation, the work clearly
illustrates the potential that can be unlocked from images stored in electronic
health records.

`TODO: Potential remaining topics: #122 & #151 looked interesting from an early
glance. - Do we want to make the point that most of the imaging examples don't
really do anything different/unique from standard image processing examples
(Imagenet etc.)`

#### Electronic health records

`TODO: @brettbj to incorporate
https://github.com/greenelab/deep-review/issues/78 and
https://github.com/greenelab/deep-review/issues/77`

EHR data include substantial amounts of free text, which remains challenging to
approach [@doi:10.1136/amiajnl-2011-000501]. Often, researchers developing
algorithms that perform well on specific tasks must design and implement
domain-specific features [@doi:10.1136/amiajnl-2011-000150]. These features
capture unique aspects of the literature being processed. Deep learning methods
are natural feature constructors. In recent work, the authors evaluated the
extent to which deep learning methods could be applied on top of generic
features for domain-specific concept extraction [@arxiv:1611.08373]. They found
that performance was in line with, but did not exceed, existing state of the art
algorithms that perform well on specific tasks must design and implement domain-
specific features [@doi:10.1136/amiajnl-2011-000150]. These features capture
unique aspects of the literature being processed. Deep learning methods are
natural feature constructors. In recent work, the authors evaluated the extent
to which deep learning methods could be applied on top of generic features for
domain-specific concept extraction [@arxiv:1611.08373]. They found that
performance was in line with, but did not exceed, existing state of the art
methods. The deep learning method had performance lower than the best performing
domain-specific method in their evaluation [@arxiv:1611.08373]. This highlights
the challenge of predicting the eventual impact of deep learning on the field.
Expand Down Expand Up @@ -128,6 +133,7 @@ tackles new challenges.

TODO: survival analysis/readmission prediction methods from EHR/EMR style data
(@sw1 + maybe @traversc). These include:

* https://github.com/greenelab/deep-review/issues/81
* https://github.com/greenelab/deep-review/issues/82
* https://github.com/greenelab/deep-review/issues/152
Expand All @@ -136,74 +142,128 @@ TODO: survival analysis/readmission prediction methods from EHR/EMR style data
Identifying consistent subgroups of individuals and individual health
trajectories from clinical tests is also an active area of research. Approaches
inspired by deep learning have been used for both unsupervised feature
construction and supervised prediction. In the unsupervised space, early work
demonstrated that unsupervised feature construction via denoising autoencoder
neural networks could dramatically reduce the number of labeled examples
required for subsequent supervised analyses [@doi:10.1101/039800]. A concurrent
large-scale analysis of an electronic health records system found that a deep
construction and supervised prediction. Early work by Lasko et al.
[@doi:10.1371/journal.pone.0066341], combined sparse autoencoders and Gaussian
processes to distinguish gout from leukemia from uric acid sequences. Later work
showed that unsupervised feature construction of many features via denoising
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Patient does nlp extraction to create features

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

autoencoder neural networks could dramatically reduce the number of labeled
examples required for subsequent supervised analyses
[@doi:10.1016/j.jbi.2016.10.007]. In addition, it pointed towards learned
features being useful for subtyping within a single disease. A concurrent large-
scale analysis of an electronic health records system found that a deep
denoising autoencoder architecture applied to the number and co-occurrence of
clinical test events, though not the results of those tests, constructed
features that were more useful for disease prediction than other existing
feature construction methods [@doi:10.1038/srep26094]. While each of these
touched on clinical tests, neither considered full text records. Taken together,
these results support the potential of unsupervised feature construction in this
domain. However, there are numerous challenges that will need to be overcome
before we can fully assess the potential of deep learning for this application
area.
feature construction methods [@doi:10.1038/srep26094]. Taken together, these
results support the potential of unsupervised feature construction in this
domain. However, numerous challenges including data integration (patient
demographics, family history, laboratory tests, text-based patient records,
image analysis, genomic data) and better handling of streaming temporal data
with many features, will need to be overcome before we can fully assess the
potential of deep learning for this application area.

##### Opportunities

However, significant work needs to be done to move these from conceptual
advances to practical game-changers.

* Large data resources (see sample # issues that mammography researchers are
working around)
working around)
* Semi-supervised methods to take advantage of large number of unlabeled
examples
examples
* Transfer learning.

##### Unique challenges

Additionally, unique barriers exist in this space that may hinder progress in
this field.

###### Data sharing and privacy?

*This is clearly a big issue. We should at least mention it. Deep learning likes
lots of data, and sharing restrictions don't allow that. Perhaps a paragraph on
current best practices and how they relate to deep learning. A lack of data (due
to privacy and sharing restrictions) may hamper deep learning's utility in this
area in ways that it doesn't for image analysis, etc. Perhaps this will be the
Achilles heal of deep learning in this area. A couple things to think about
[doi: 10.1126/science.1229566 doi:10.1016/j.cels.2016.04.013]*

###### Standardization/integration

*EHR standardization remains challenging. Even the most basic task of matching
patients can be challenging due to data entry issues [@pmid:27134610]. From
anecdotal conversations with colleagues, it sounds like the same information is
often entered in distinct fields in different departments and different health
care systems. It would be nice for someone to quickly survey the literature and
provide a 1-2 paragraph summary of the state of the field. References to recent
solid reviews would be great to include. A quick summary (with papers) of any
deep learning approaches used in this area would be great in the "where do we
see deep learning currently being used" section below.*

*How do we find meaningful patterns from health data (including EHR, clinical
trials, etc) that indicate categories of individuals? We should at least raise
the distinct challenges of snapshot in time data and dynamic data that capture
changes over time. It would be nice for someone to quickly survey the literature
and provide a 1-2 paragraph summary of the state of the field. References to
recent solid reviews would be great to include. A quick summary (with papers) of
any deep learning approaches used in this area would be great in the "where do
we see deep learning currently being used" section below.*

#### Storage/compute

*This bit I am less excited about. However, this recent preprint
[@arxiv:1608.05148] is pretty cool, so maybe we want to consider it. Storage is
expensive, so it may be helpful. I leave it here as a stub in case someone wants
to take it on.*
EHRs are designed and optimized primarily for patient care and billing purposes,
meaning research is at most a tertiary priority. This presents significant
challenges to EHR based research in general, and particularly to data intensive
deep learning research. EHRs are used differently even within the same health
care system [@pmid:PMC3797550, @pmid:PMC3041534]. Individual users have unique
usage patterns, and different departments have different priorities which
introduce missing data in a non-random fashion. Just et al. demonstrated that
even the most basic task of matching patients can be challenging due to data
entry issues [@pmid:27134610]. This is before considering challenges caused by
system migrations and health care system expansions through acquisitions.
Replication between hospital systems requires controlling for both these
systematic biases as well as for population and demographic effects.
Historically, rules-based algorithms have been popular in EHR-based research but
because these are developed at a single institution and trained with a specific
patient population they do not transfer easily to other populations
[@doi:10.1136/amiajnl-2013-001935 ]. Wiley et al.
[@doi:10.1142/9789813207813_0050] showed that warfarin dosing algorithms often
under perform in African Americans, illustrating that some of these issues are
unsolved even at a treatment best practices level. This may be a promising
application of deep learning, as rules-based algorithms were also the standard
in most natural language processing but have been superseded by machine learning
and in particular deep learning methods
[@url:https://aclweb.org/anthology/D/D13/D13-1079.pdf].

###### Temporal Patient Trajectories

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you 80char/line this. Otherwise we can only comment at the full paragraph level. Also looks like you have @doi and then a PMID.

Traditionally, physician training programs justified long training hours by
citing increased continuity of care and learning by following the progression of
a disease over time, despite the known consequences of decreased mental and
quality of life [@doi:10.1016/j.socscimed.2003.08.016,
@doi:10.1016/S1072-7515(03)00097-8, @pmid:2321788,
@doi:10.1016/S0277-9536(96)00227-4]. Yet, a common practice in EHR-based
research is to take a point in time snapshot and convert patient data to a
traditional vector for machine learning and statistical analysis. This results
in significant signal losses as timing and order of events provide insight into
a patient's disease and treatment. Efforts to account for the order of events
have shown promise [@doi:10.1038/ncomms5022] but require exceedingly large
patient sizes due to discrete combinatorial bucketing.

Lasko et al. [@doi:10.1371/annotation/0c88e0d5-dade-4376-8ee1-49ed4ff238e2] used
autoencoders on longitudinal sequences of serum urine acid measurements to
identify population subtypes. More recently, deep learning has shown promise
working with both sequences (Convolutional Neural Networks) [@arXiv:1607.07519]
and the incorporation of past and current state (Recurrent Neural Networks, Long
Short Term Memory Networks)[@arXiv:602.00357v1].

###### Data sharing and privacy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This touches a bit on a few topics (available data) that @XieConnect also raised in #194. After merge, it'll be important to check the ordering and make sure that the flow best takes advantage of these two complementary discussions.

@XieConnect nicely raises the challenge that physicians are expensive. @brettbj nicely raises the challenge that some data can't really be shared.


Early successes using deep learning involved very large training datasets
(ImageNet 1.4 million images) [@arXiv:1409.0575], but a responsibility to
protect patient privacy limits the ability openly share large patient datasets.
Limited dataset sizes may restrict the number of parameters that can be trained
in a model, but the lack of sharing may also hamper reproducibility and
confidence in results. Even without sharing data, algorithms trained on
confidential patient data may present security risks or accidentally allow for
the exposure of individual level patient data. Tramer et al. [@arXiv:1609.02943]
showed the ability to steal trained models via public APIs and Dwork and Roth
[@doi:10.1561/0400000042] demonstrate the ability to expose individual level
information from accurate answers in a machine learning model.

Training algorithms in a differentially private manner provides a limited
guarantee that the algorithms output will be equally likely to occur regardless
of the participation of any one individual. The limit is determined by a single
parameter which provides a quantification of privacy. Simmons et al.
[doi:doi:10.1016/j.cels.2016.04.013] present the ability to perform GWASs in a
differentially private manner and Abadi et al. [arXiv:1607.00133] show the
ability to train deep learning classifiers under the differential privacy
framework. Finally, Continuous Analysis [doi:10.1101/056473] allows for the
ability to automatically track and share intermediate results for the purposes
of reproducibility without sharing the original data.

###### Biomedical data is often "Wide"

*Biomedical studies typically deal with relatively small sample sizes but each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is important. This is definitely along the lines of what @XieConnect is getting at in #194 (limited standards - here you have limited examples - and maybe particularly labeled examples). Suggest again that you guys discuss and integrate this in a subsequent PR.

sample may have millions of measurements (genotypes and other omics data, lab
tests etc).*

*Classical machine learning recommendations were to have 10x samples per number
of parameters in the model.*

*Number of parameters in an MLP. Convolutions and similar strategies help but do
not solve*

*Bengio diet networks paper*

#### Has deep learning already induced a strategic inflection point for one or more aspects?

Expand Down