-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Response Normalization[W.I.P] #312
Conversation
src/layers/normalise.jl
Outdated
return temp | ||
end | ||
|
||
children(LRN::LRNorm) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use treelike
here?
Looks generally good, but is it possible to have a vectorised implementation so it's more GPU-friendly? |
src/layers/normalise.jl
Outdated
function (LRN::LRNorm)(x) | ||
w,h,C,N = size(x) | ||
temp = zeros(size(x)) | ||
for z_=1:N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly cleaner to do for z_=1:N, x_=1:w, ...; end
here.
@MikeInnes I've made the required changes. I'm not sure if we could vectorise it. It'd be better to get this merged for now, and make changes if a more optimal way is found in the future. (As of now, we'd have to use these 5 loops, since we're visiting every element of the 4-D input and traveling adjacent cells for each of them ) |
Ok. Some tests will be essential, but we also need an implementation of the backward pass – at least on CPU – so that it can be used for training. Ideally we'd also have a GPU implementation, but I suppose we can punt on that for now. |
@MikeInnes sure, I'll add a few tests. Also, since the LRN implementation uses base operators, AD should be able to handle the backprop itself, right? So there's no need to explicitly write the gradients for them |
No, that's what I meant above about vectorising it. If you want gradients and GPU support to work automatically you need to use whole-array operations like matmul, broadcasting and reduction. They can't support general loops and scalar indexing, so if you write it like that you also need a gradient implementation (as well as GPU versions). |
Looks like LRN is a fairly involved operation to write with purely vectorized ops (although it would be straightforward with something like Tensor Comprehensions, and there might be a way to leverage a convolution operation to perform the local summation). |
Here's the best reference I can find for the manually-written gradient of LRN: |
I think we can rewrite this in terms of convolution. We are basically doing a 1x1 convolution where
This should make the backward pass much easier for the compiler to figure out, and is also probably more efficient than what you have hand-written. |
closing as very old for housecleaning |
Added a new LRNorm struct, which takes four hyper parameters and returns the normalized output.
Implemented by referring to the ImageNet paper which can be found here.