-
-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convolution with Bias CUDNN #335
Conversation
What's the status of this, is it still WIP? |
Would require some updates and small fixes. |
@@ -51,11 +51,13 @@ function Base.show(io::IO, l::Conv) | |||
print(io, ")") | |||
end | |||
|
|||
#= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these functions are present I am getting an "ambiguous function" exception. How do I fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit the same thing while trying to implement this.
One way to fix this is to let (c::Conv)(x)
directly call something like _conv(c::Conv, x)
and then have a specialized _conv(c::Conv, x::CuParam)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avik-pal Another solution is to change line 270 in cudnn.jl to something like
(m::Flux.Conv{<:Any, <:Any, W})(x::Union{CuParam{T,4},CuParam{T,5}}) where {T<:CUDNNFloat, W<:CuParam}
This should remove the "ambiguous function" exception.
alpha = alpha, workspace = workspace, algo = algo, activationMode = activationMode) | ||
end | ||
|
||
∇conv_bias(Δ::CuArray{T}, b::CuArray{T}; pad = 0, beta = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I had overlooked that function.
I get
using this, I think the |
JuliaGPU/CuArrays.jl#260 should fix this error. |
else | ||
workspace_size = length(workspace[]) | ||
end | ||
cudnnConvolutionBiasActivationForward(y, x, w, b, padding=pad, stride=stride, mode=flipkernel, alpha1=alpha, activationMode=activationMode, algo=algo, workspace=workspace, workspace_size=workspace_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, using the identity activation function here only works with one convolution algorithm (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnConvolutionBiasActivationForward):
Note: Only the CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algo is enabled with CUDNN_ACTIVATION_IDENTITY. In other words, in the cudnnActivationDescriptor_t structure of the input activationDesc, if the mode of the cudnnActivationMode_t field is set to the enum value CUDNN_ACTIVATION_IDENTITY, then the input cudnnConvolutionFwdAlgo_t of this function cudnnConvolutionBiasActivationForward() must be set to the enum value CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM. See also the documentation for the function cudnnSetActivationDescriptor().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That is correct.
So if we want to use any other algorithm, we should be using the cudnnAddTensor function for bias. (the @ grad for convbias can do it easily)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but if we are using relu
activation we want to call this with algo=1
and do the whole relu.(conv .+ b)
in one call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually a bit problematic for implementing the backward pass. https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnActivationBackward. From what I understand we need the intermediate state before applying the activation to calculate the gradient.
However, if we are guaranteed that the user doesn't want gradients (if there is no tracked arrays) we can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of relu (and probably identity) activation function, I am not sure the intermediate value is needed (since they are linear):
julia> m = Conv((2,2), 1=>1, relu)
Conv((2, 2), 1=>1, NNlib.relu)
julia> x = rand(Float32, 4, 4, 1, 1); xt = TrackedArray(x);
julia> t = m(xt);
julia> Flux.Tracker.back!(t, t.data);
julia> xt.grad # data gradient after relu
4×4×1×1 Array{Float32,4}:
[:, :, 1, 1] =
-0.0206988 0.323376 0.0 0.0
-0.211878 0.763559 0.27797 0.224908
-0.363751 0.3568 -0.0381117 0.540027
-0.138073 0.0981863 -0.239143 0.17006
julia> m.weight.grad # weight gradient after relu
2×2×1×1 Array{Float32,4}:
[:, :, 1, 1] =
1.17692 1.02465
1.54479 1.2704
julia> m.bias.grad # bias gradient after relu
1-element Array{Float32,1}:
2.5020714
julia> NNlib.∇conv_data(t.data, x, m.weight.data)
4×4×1×1 Array{Float32,4}:
[:, :, 1, 1] =
-0.0206988 0.323376 0.0 0.0
-0.211878 0.763559 0.27797 0.224908
-0.363751 0.3568 -0.0381117 0.540027
-0.138073 0.0981863 -0.239143 0.17006
julia> NNlib.∇conv_filter(t.data, x, m.weight.data)
2×2×1×1 Array{Float32,4}:
[:, :, 1, 1] =
1.17692 1.02465
1.54479 1.2704
julia> Flux.CUDA.∇conv_bias(cu(t.data), reshape(cu(m.bias.data), 1, 1, 1, 1))
1-element CuArray{Float32,1}:
2.5020714
So we get the same answer using backward propagation or the "analytical ones" on the value outputted from the activation function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically something like KristofferC@fd656bc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to redefine all the functions? Can't we simple pass a different argument in activationMode since we do not have to do anything different for backward pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's very possible that it can be done cleaner. The tricky part is getting the backward pass to dispatch properly.
This version is buggy and needs fixes before it can be merged. This directly depends on JuliaGPU/CuArrays.jl#100.
On the positive side the major speed issues discussed in avik-pal/DeepLearningBenchmarks#1 is resolved by this.