-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first draft of intro addition #246
Conversation
@cgreene added to intro as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'm very positive about the new ideas you've introduced here. I have some big picture comments in addition to the specific remarks inline.
- Some of the text you cut could be important. I am fine removing the shallow learning paragraph because you cover similar ideas when discussing low parameter networks. But we are now without an explanation of what a neural network is.
- Your extended definition of deep learning is great and should resolve the discussion we were having in Accurate and efficient target prediction using a potency-sensitive influence-relevance voter #229.
- In some places, I'm concerned that we are using too much jargon for anyone who isn't already working in deep learning. The challenge in being detailed about deep learning innovations is that we might have to take extra space to explain these concepts. An alternative is to refer to some of these papers (e.g. ReLU, residual networks) but not try to describe what exactly they are. I'm not advocating for either option.
Deep learning is built on a very old idea, neural networks, that was first | ||
proposed in 1943 [doi:10.1007/BF02478259] as a model for how biological | ||
brains proces information. Since then, interest in neural networks a computational | ||
models has waxed and waned several times. This history is interesting in its own right [@doi:10.1103/RevModPhys.34.135, @doi:10.1103/RevModPhys.34.135], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove the comma between references. Also, automatic reflow would help keep the lines to <= 80 characters, which has made it easier for us to make and review small edits to the text.
Several important advances make the current surge of work done in this area possible. | ||
|
||
First, several easy to use software packages (Tensorflow, Caffe, Theano) now enable a much broader range of scientists | ||
to build and train complicated. In the past, neural networks required very specialized knoweldge to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing word after "complicated"
to build and train complicated. In the past, neural networks required very specialized knoweldge to | ||
build and modify, including the ability to robustly code differentials of matrix | ||
expressions. Errors here are often subtle and difficult to detect, so it could be | ||
very difffult to tailor networks to specific problems without substantial experience |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"difficult"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: "knowledge"
just a single layer, can learn arbitrarily complex functions. And networks with more than one hidden | ||
layer (deep networks), were hard to train. However, it turns out, deep networks can more | ||
efficiently represent many tasks when they are built to mirror the underlying structure of the data. | ||
Moroever, deep networks are more robust and trainable when employing several |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Moreover"
layer (deep networks), were hard to train. However, it turns out, deep networks can more | ||
efficiently represent many tasks when they are built to mirror the underlying structure of the data. | ||
Moroever, deep networks are more robust and trainable when employing several | ||
architectural innovations: weight replication, better behaived non-linearities like rectified-linear units, residual networks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all very good to point out, but I wonder whether our readers will have any idea what these mean or if they will be unintuitive jargon?
Third, science is full of domain expertise, where there are deep traditions of thought stretching back decades and even centuries. Deep learning | ||
will always be in dialogue with thise expertise, to understand the key problems, encode the most salient prior knoweledge, and | ||
understand how to judge success or failure. There is a great deal of excitement about deep learning, but in most scientific corners | ||
careful thought needs to be put into bringing deep learning alongside existing experts and efforts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good place to mention the need to compare deep learning performance with existing best practices in a field?
understand how to judge success or failure. There is a great deal of excitement about deep learning, but in most scientific corners | ||
careful thought needs to be put into bringing deep learning alongside existing experts and efforts. | ||
|
||
Fourth, data availability and complexity is unevenly distributed accross science. Some areas of science like genomics and particle physics are swamped in petabytes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can keep these domain examples to biomedicine to keep in theme? E.g. replace "chemistry" with "biochemistry" or "medicinal chemistry".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just the amount of data but the complexity of the required features + number of examples. I worry about focusing solely on scale.
|
||
Fourth, data availability and complexity is unevenly distributed accross science. Some areas of science like genomics and particle physics are swamped in petabytes and | ||
exobytes of high quality data. Others, like chemistry, are comparatively data poor with well developed domain specific and effective algorithms. These | ||
differences become consequential and define the most successul approachs. For example, the convergence of lower amounts of data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"approaches"
it is an open question the most effect strategies will be, but in this time of creative experimenation optimism is justified. | ||
|
||
None of these scientific concerns should dampen enthusiasm about deep learning. Rather, because the approaches flexibility, | ||
there is good reason to believe that carefully defined networks might enbable important scientific advances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"enable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay on this. I just submitted a grant yesterday and am returning here. Will fix soon.
different types. This flexibility, remember, is one of the most striking strengths of neural networks. In the long run, | ||
it is an open question the most effect strategies will be, but in this time of creative experimenation optimism is justified. | ||
|
||
None of these scientific concerns should dampen enthusiasm about deep learning. Rather, because the approaches flexibility, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something missing in "approaches flexibility"
bd3cb76
to
9178a88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swamidass : do you want to come back to this? If not, I may reject and attempt to snag some of these ideas and merge into the existing intro. If so, please update this so that we can proceed from here.
|
||
Several important advances make the current surge of work done in this area possible. | ||
|
||
First, several easy to use software packages (Tensorflow, Caffe, Theano) now enable a much broader range of scientists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important point that we shouldn't lose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Should we cite these with URLs though? Also maybe 'deep learning frameworks' better than 'software packages'. I'd add MxNet (and maybe even h20.ai) to that list.
reveiw is beyond our scope. But these algorithms can train networks in domains where earlier algorithms struggled. | ||
For example, newer optimizers can very efficiently learn using batched training, where only a portion of the data | ||
needs to be processed at a time. These optimizers more effectively optimize very large weight vectors where many weights are only | ||
rarely updated. Noise constrastive error has proven particularly useful in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contrastive
like chess, GO, and poker. Curriculumn learning enables networks to gradually build up expertise to | ||
solve particularly challenging algorithmic problems. Dropout nodes and layers make networks much more | ||
robust, even when the number of weights are dramatically increased. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of jargon in here. If it's going to be included, citations need to exist to point readers towards a resource for each topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Curriculum"
non-obvious patterns in the data are particualrly important and very active area of research [cite examples from all sections]. | ||
|
||
Second, there are important and pressing questions about how to build networks that can efficently represent | ||
the underlying logic of the data. This concern of "representability" is important, because it gives insight into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic -> structure?
understand how to judge success or failure. There is a great deal of excitement about deep learning, but in most scientific corners | ||
careful thought needs to be put into bringing deep learning alongside existing experts and efforts. | ||
|
||
Fourth, data availability and complexity is unevenly distributed accross science. Some areas of science like genomics and particle physics are swamped in petabytes and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just the amount of data but the complexity of the required features + number of examples. I worry about focusing solely on scale.
The May 1st deadline is fast approaching. My suggestion would be either to merge this as is to allow further edits or follow @cgreene advice of rejecting and including some of the themes presented in a new PR. |
In the interest of making progress on this I have simply moved the merge forward by including the additional text alongside the previous text. The intro needs to be revamped, and this will let us move things forward. We're going to have to do some pretty extensive editing here to make things work. But at least we can move ahead. |
(deleted my last comment because @cgreene took care of the merge) To confirm I'm reading the merged version correctly, the draft lead by |
@agitter - yep that's it! |
This build is based on 0dd9b2c. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/227212533 https://travis-ci.org/greenelab/deep-review/jobs/227212534 [ci skip] The full commit message that triggered this build is copied below: Intro patches (#363) * Reflow intro * Fix typos and address some comments from #246 * Fix references * Partially address @agapow feedback
This build is based on 0dd9b2c. This commit was created by the following Travis CI build and job: https://travis-ci.org/greenelab/deep-review/builds/227212533 https://travis-ci.org/greenelab/deep-review/jobs/227212534 [ci skip] The full commit message that triggered this build is copied below: Intro patches (#363) * Reflow intro * Fix typos and address some comments from #246 * Fix references * Partially address @agapow feedback
I think this has my key points. Curious everyone's thoughts.
It does need to have references added.