-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bilinear upsample layer #1180
Conversation
Added BilinearUpsample2d layer
@ltjkoomen this is awesome. I hope others can help review this PR so it can be merged quickly. Thank you for this contribution. I'm happy to close #1136 in favor of this one. |
Could we add GPU tests, I'd like to ensure that it doesn't cause trouble there |
I have noticed a problem when using
The result is that |
Also removed a faulty doctest
I haven't looked hard but I suspect that the problem is aliasing: won't you end up writing to two views of the same object, with shifted indices? When this gets done in parallel, I think that CuArrays is not careful about the order. |
Even when I remove the
It still doesn't work.
I've tried it like this:
And with a CuArray it still fails to find the correct gradient, while the CPU code is doing fine all this time. |
I have located the problem, it seems
Unsure whether this is a bug or expected behavior. Or is this actually the possible issue brought up by @mcabbott ? |
Gradient doesn't currently work when using CuArrays
Currently gradient does not work with BilinearUpsampling2d when using CuArrays
The issue I should have linked is https://github.com/JuliaGPU/CuArrays.jl/issues/684 . Was a bit confused by views etc, but the core idea is that you want to accumulate at indices like Surely there is some way around this, though. If you don't want to up/down-sample by factors larger than 2, can you divide up the work even/odd & be sure that there are no collisions? |
What about defining a custom adjoint function instead? The adjoint of upsampling is a downsampling operation, which could be implemented efficiently using the existing convolutional layer code. The forward part seems to work fine even with this current implementation, both the normal- and CUDA-array versions. |
Since Zygote isn't able to properly handle the current implementation, and moving to an iterative approach I believe would mean a very significant performance reduction, I have added a custom adjoint. Since the adjoint of upsampling is a downsampling operation, I have used Flux.Conv in combination with a downsample kernel and some manual edge-effect correction.
Effort to reduce memory footprint slightly
type instability occurred due to a variable dimension number, fixed by hard-coding everything to 4 dimensions
Only type-instability left is the fact that the weights of Flux.Conv are not type stable.
I have fixed some type-instabilities, mostly by hardcoding the fact that there are 4 dimensions to the input array and the first 2 are upsampled. This has fixed all type-instability issues in the forward pass. However, the backward pass is still not type-stable because of the fact that I have used the Flux.Conv layer to implement the downsampling operation in the custom adjoint. The type of the weights of Flux.Conv cannot be inferred, an issue which I have already brought up in #1178. |
Is there any way to make this work for more than 2D? I work with medical imaging so a lot of that is in 3 dimensions.... |
That would be trilinear upsampling. I guess the same method I use for bilinear upsampling could be used, but I haven't tried it yet. |
What is pending for this merge? If it just the type instability due to Flux.Conv can it be merged and then it will presumably be fixed when/if Flux.Conv fixes the type instability? |
@@ -30,6 +30,7 @@ Random.seed!(0) | |||
include("layers/normalisation.jl") | |||
include("layers/stateless.jl") | |||
include("layers/conv.jl") | |||
include("layers/upsample.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
I agree this should be merged and type instabilities issues addressed separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I had the review there but damn I guess I forgot to put it up
Typically we also want to match the indentation generally across the additions
wdiff1 = eltype(img).(wdiff1) | ||
wdiff2 = eltype(img).(wdiff2) | ||
|
||
if typeof(img) <: CuArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? The kernel should handle this case generically.
""" | ||
@nograd function construct_xq(n::T, m::T) where T<:Integer | ||
typed1 = one(n) | ||
typed2 = 2typed1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use T
and broadcasting perhaps.
ilow = floor.(Int, xq) | ||
ihigh = ceil.(Int, xq) | ||
|
||
wdiff = xq[:,:,:,:] .- ilow[:,:,:,:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the colons there if it's just everything?
Upsamples the first two dimensions of the 4-dimensional array `img` by the two upsample factors stored in `k_upsample`, | ||
using bilinear interpolation. The interpolation grid is identical to the one used by `imresize` from `Images.jl`. | ||
""" | ||
function bilinear_upsample2d(img::AbstractArray{T,4}, k_upsample::NTuple{2,<:Real}) where T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be needed to get us to be able to do it with 3d convs as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this....
The above holds as long as `idx` contains every index in `x`. | ||
""" | ||
@nograd function adjoint_of_idx(idx::Vector{T}) where T<:Integer | ||
d = trues(size(idx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be type unstable to use bools with floats and that might stop julia from using blas calls more efficiently
ihigh2_r = adjoint_of_idx(ilow2)[ihigh2] | ||
|
||
wdiff1 = eltype(img).(wdiff1) | ||
wdiff2 = eltype(img).(wdiff2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use T
from the method signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice, thanks! I think the API needs a bit of discussion but there are no major blockers.
Should the core kernels for this perhaps be in NNlib? Doesn't matter too much since we can move them later, but just thinking about things like adding specialised CUDA implementations and such down the line; this is what we've done with most other kernels like conv
.
@@ -0,0 +1,325 @@ | |||
""" | |||
BilinearUpsample2d(factors::Tuple{Integer,Integer}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary for this to be specific to 2D? Could it infer its dimension from factors
, like the Conv
layer, and be generic across dimension numbers? Also, could it not be considered BilinearInterpolate
in general (e.g. with fractional factors)?
(It is fine if dimensions other than 2, or fractional factors, are not currently implemented and throw an error, just nice if we can add them in future.)
[If we do keep 2D
it should be capitalised that way.]
end | ||
|
||
@adjoint function (c::T where T<:BilinearUpsample2d)(x::AbstractArray) | ||
(c::T where T<:BilinearUpsample2d)(x), c̄ -> (nothing, bilinear_upsample_adjoint(c̄, c.factors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The adjoint should probably be applied to the bilinear_upsample
function, not the layer, so that the function can be used directly where appropriate. As with Conv
it would be nice to support a non-layer version of this with a nice API.
The type on the second c::T where ...
seems unnecessary.
Hi everybody, just recently I blindly translated the pytorch bilinear upsampling to julia & cuda. Unfortunately I didn't have the time to finish it. The array indexing is still messed up and honestly I don't really know how the gradient calculation works. I've put everything in a gist, so that you can just copy it, if interested :) |
Hi! I would really like to give this a push. I could really need this (or nearest neighbor) for U-Net implementations. Have the reviews been addressed? If not, what would I have to do to work on them? In the mean time I have finished porting and testing the pytorch implementation core of bilinear upsampling here. It now produces the same output as pytorch for forward and backward (verified via nyan cat). Fractional upscaling factors are allowed, but still only 4D tensors - but in my opinion we dont need more for a start. |
I think the pr is in a good shape. Have you been able to benchmark the run time against pytorch as well? Also, there is UNet.jl if that helps. |
Well, benchmarking is a science for itself... It seems that timings are on par in many cases, but there is some caching going on in pytorch. This is how I benchmark: data = rand(Float32, 256,256,16,32) |> cu
@benchmark upsample_bilinear($data, 2,2) # min: 8ms, mean: 44ms up_op = nn.UpsamplingBilinear2d(scale_factor=2)
data = torch.rand((32,16,256,256)).to("cuda")
times = []
with torch.no_grad():
for i in range(100):
t0 = time()
up_op(data)
torch.cuda.synchronize()
torch.cuda.empty_cache()
times.append(time()-t0)
print(min(times)*1000) # 19ms with empty_cache() - 7.2ms without
print(np.mean(times)*1000) # 37ms with empty_cache() - 7.4 without CUDA 2.3 |
I just checked the changes in this PR, looks good indeed. Forward and backward pass work. However for the image size given in the above post the pytorch gpu kernel is another 3x faster. So maybe it makes sense to add it as dispatch for CuArrays. The pullback should also be straightforward to implement. The question is where this should happen... merge first, then add dispatch? Or the other way round? The kernel I ported is by the way from Caffee2, there are also the newer ATen kernels. As soon as I have figured out how to port them, I could output NearestNeighbour2d/3d & trilinear as well. Oh, and I dont know where to put the kernels - CUDA.jl? Here? NNlib? |
I just noted that there is an older, good attempt by @avik-pal here for CPU and here for GPU, but I don't know which method is used. He implemented UpsamplingDims along the lines of ConvDims to produce specialized code. Should we consider that or is it over the top? @ltjkoomen are you still enganged? Some time early next year might be sufficient 😉 |
@ltjkoomen has not been responsive for some time. If the cpu implementation here is fine I will move it to NNlib. Any performance optimization can come later, the important thing is to agree on the interface |
This should be fine: NNlib.bilinear_upsample(x::AbstractArray{<:Real,4}, k::NTuple{2,<:Real}) CUDA.jl will have to overload it. |
Indeed we need to finish this layer! I'm also writing a UNet (with |
Hi,
I've implemented a bilinear upsampling layer that upsamples the first 2 dimensions of a 4-dimensional array with an integer factor. I have seen the implementation in #1136, but:
Some possible TODOs:
using Flux
, which seems like pollution and is unnecessary. However, I am not quite sure what the correct way is in Julia to shield these "private" functions.