Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first draft of intro addition #246

Merged
merged 2 commits into from
Apr 28, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 82 additions & 35 deletions sections/02_intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,41 +34,88 @@ learning methods more challenging or less fruitful to apply.

### What is deep learning?

Deep learning is built on a biologically-inspired approach from machine learning
termed neural networks. Each neuron in a computational neural network, termed a
node, has inputs, an activation function, and outputs. Each value from the
inputs is usually multiplied by some weight and combined and summarized by the
activation function. The value of the activation function is then multiplied by
another set of weights to produce the output `TODO: we probably need a figure
here - I see no way that we don't include this type of description in our paper,
despite the fact that it's been done tons of times before. - I'm really partial
to this nature review's explanation about making non-linear problems linear -
figure 1 [@doi:10.1038/nature14539]` These neural networks are trained by
identifying weights that produce a desired output from some specific input.

Neural networks can also be stacked. The outputs from one network can be used as
inputs to another. This process produces a stacked, also known as a multi-layer,
neural network. The multi-layer neural network techniques that underlie deep
learning have a long history. Multi-layer methods have been discussed in the
literature for more than five decades [@doi:10.1103/RevModPhys.34.135]. Given
this context, it's challenging to consider "deep learning" as a new advance,
though the term has only become widespread to describe analysis methods in the
last decade. Much of the early history of neural networks has been extensively
covered in a recent review [@doi:10.1016/j.neunet.2014.09.003]. For the purposes
of this review, we identify deep learning approaches as those that use
multi-layer neural networks to construct complex features.

We also identify a class of algorithms that we term "shallow learning"
approaches. We do not use this as a pejorative term, but instead to denote
algorithms which have all of the hallmarks of deep approaches except that they
employ networks of limited depth. We found it valuable to include these as we
sought to identify the current contributions of deep learning and to predict its
future impact. Researchers may employ these shallow learning methods for a
number of reasons including: 1) shallow networks provide a degree of
interpretability that better matches their use case; 2) the available data are
insufficient to support deeper architectures, however new datasets that will
support deep methods are expected; 3) or as building blocks to be combined with
other non-neural-network-based approaches at subsequent stages.
[This section needs citations]

Deep Learning is a collection of new techniques that together have recently demonstrated
breakthrough gains over existing approaches in several fields.

Deep learning is built on a very old idea, neural networks, that was first
proposed in 1943 [doi:10.1007/BF02478259] as a model for how biological
brains proces information. Since then, interest in neural networks a computational
models has waxed and waned several times. This history is interesting in its own right [@doi:10.1103/RevModPhys.34.135, @doi:10.1103/RevModPhys.34.135],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the comma between references. Also, automatic reflow would help keep the lines to <= 80 characters, which has made it easier for us to make and review small edits to the text.

but in recent years, with the advances of Deep Learning, attention has shifted back.

Several important advances make the current surge of work done in this area possible.

First, several easy to use software packages (Tensorflow, Caffe, Theano) now enable a much broader range of scientists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important point that we shouldn't lose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Should we cite these with URLs though? Also maybe 'deep learning frameworks' better than 'software packages'. I'd add MxNet (and maybe even h20.ai) to that list.

to build and train complicated. In the past, neural networks required very specialized knoweldge to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing word after "complicated"

build and modify, including the ability to robustly code differentials of matrix
expressions. Errors here are often subtle and difficult to detect, so it could be
very difffult to tailor networks to specific problems without substantial experience
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"difficult"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "knowledge"

and training. Now, however, with these new packages, even very complex neural networks
are automatically differentiated, and high level scripting instructions can transparently
run very efficently on GPUs. The technology has progressed to the point that even
algorithms can be differentiated [cite neural stack and neural memory papers].

Second, key technical insight has been uncovered that guides the construction of
much more complicated networks that previously possible. In the past, most neural networks
included just a single network. A network with an arbitrary number of hidden nodes, but
just a single layer, can learn arbitrarily complex functions. And networks with more than one hidden
layer (deep networks), were hard to train. However, it turns out, deep networks can more
efficiently represent many tasks when they are built to mirror the underlying structure of the data.
Moroever, deep networks are more robust and trainable when employing several
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Moreover"

architectural innovations: weight replication, better behaived non-linearities like rectified-linear units, residual networks,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all very good to point out, but I wonder whether our readers will have any idea what these mean or if they will be unintuitive jargon?

and better weight initialization, and persistent memory. Likewise the central role of
dimensionality reduction as a strength of neural networks was elucidated, and this has
motivated designs built to capitalize on this strength [cite autoencoders and word2vec].

Third, several advances in training algorithms have enabled applications of neural networks in ways
not obviously possible. The number of training strategies for deep learning is growing rapidly and a complete
reveiw is beyond our scope. But these algorithms can train networks in domains where earlier algorithms struggled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"review"

For example, newer optimizers can very efficiently learn using batched training, where only a portion of the data
needs to be processed at a time. These optimizers more effectively optimize very large weight vectors where many weights are only
rarely updated. Noise constrastive error has proven particularly useful in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contrastive

modeling language. Reinforcement learning has enabled neural networks to learn how to play games
like chess, GO, and poker. Curriculumn learning enables networks to gradually build up expertise to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Curriculum"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"These optimizers more effectively optimize" feels very awkward. Perhaps "These approaches more effectively optimize" or "These optimizers are more effective where"

solve particularly challenging algorithmic problems. Dropout nodes and layers make networks much more
robust, even when the number of weights are dramatically increased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of jargon in here. If it's going to be included, citations need to exist to point readers towards a resource for each topic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Curriculum"

Fourth, the convergence of these factors currently makes deep learning extremely adaptable, and capable
of addressing the nuanced differences of each domain to which it is applied. Many of the advances in deep learning
were first developed for image analysis and text analysis, but the lessons and techniques learnt there
enable the construction of very powerful models specifically suited to the challenges presented by
each unique domain.

### Deep learning in scientific inquiry

Given the intrinsic flexibility of deep learning, it is important to consider the specific values and goals
that are particularly important in scientific inquiry.

First, in scientific contexts understanding the patterns in the data may be just as important as fitting the data.
For this reason, interpretability can be more important here than other domains. Scientific work often
aimes to understand the underlying principles behind the data we see, and architectures and techniques that expose the
non-obvious patterns in the data are particualrly important and very active area of research [cite examples from all sections].

Second, there are important and pressing questions about how to build networks that can efficently represent
the underlying logic of the data. This concern of "representability" is important, because it gives insight into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic -> structure?

the structure of scientific data, and when understood can guide the design of more efficent and effective networks. For example,
one particularly important study was published in Science, which demonstrates that a simple neural network can very efficiently and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to emphasize where it was published

accuratly model (i.e. respresent) a theoretically important quantum mechanical system [http://science.sciencemag.org/content/355/6325/602].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you introduced this paper, I was looking for a place to refer to it in the review. I think it is very relevant despite not being a biological example.

Can you please switch to the doi before merging?


Third, science is full of domain expertise, where there are deep traditions of thought stretching back decades and even centuries. Deep learning
will always be in dialogue with thise expertise, to understand the key problems, encode the most salient prior knoweledge, and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this"?

"knowledge"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"accurately"

understand how to judge success or failure. There is a great deal of excitement about deep learning, but in most scientific corners
careful thought needs to be put into bringing deep learning alongside existing experts and efforts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good place to mention the need to compare deep learning performance with existing best practices in a field?


Fourth, data availability and complexity is unevenly distributed accross science. Some areas of science like genomics and particle physics are swamped in petabytes and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can keep these domain examples to biomedicine to keep in theme? E.g. replace "chemistry" with "biochemistry" or "medicinal chemistry".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just the amount of data but the complexity of the required features + number of examples. I worry about focusing solely on scale.

exobytes of high quality data. Others, like chemistry, are comparatively data poor with well developed domain specific and effective algorithms. These
differences become consequential and define the most successul approachs. For example, the convergence of lower amounts of data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"approaches"

and important nuances to the domain might favor lower parameter networks that incorporate domain specific knowledge and fuse data of multiple
different types. This flexibility, remember, is one of the most striking strengths of neural networks. In the long run,
it is an open question the most effect strategies will be, but in this time of creative experimenation optimism is justified.

None of these scientific concerns should dampen enthusiasm about deep learning. Rather, because the approaches flexibility,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something missing in "approaches flexibility"

there is good reason to believe that carefully defined networks might enbable important scientific advances.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"enable"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay on this. I just submitted a grant yesterday and am returning here. Will fix soon.


### Will deep learning transform the study of human disease?

Expand Down