Add bilinear upsample layer #1180

ltjkoomen · 2020-05-13T14:49:10Z

Hi,

I've implemented a bilinear upsampling layer that upsamples the first 2 dimensions of a 4-dimensional array with an integer factor. I have seen the implementation in #1136, but:

This one is about 10 times faster on CPU on my laptop
This one uses broadcasting instead of scalar loop indexing, which makes it usable on the GPU
This one is directly compatible with Zygote

Some possible TODOs:

Currently the utility functions in src/layers/upsampling.jl used by the implementation are all added to the Flux namespace when doing using Flux, which seems like pollution and is unnecessary. However, I am not quite sure what the correct way is in Julia to shield these "private" functions.
With this implementation it is no problem to define an output size, instead of the quite limiting integer upscaling factors. Although usually upscaling is used to, for instance, increase the number of samples in each dimension by 2, maybe we should still expose this functionality somehow? This could be implemented through some specialized constructor perhaps.

Added BilinearUpsample2d layer

kczimm · 2020-05-13T14:53:41Z

@ltjkoomen this is awesome. I hope others can help review this PR so it can be merged quickly. Thank you for this contribution. I'm happy to close #1136 in favor of this one.

DhairyaLGandhi · 2020-05-13T15:27:04Z

Could we add GPU tests, I'd like to ensure that it doesn't cause trouble there

src/layers/upsample.jl

ltjkoomen · 2020-05-13T16:16:37Z

I have noticed a problem when using gradient when the input is a CuArray. Although the upsampled output array is correct (equal to the CPU output), the output of gradient is wrong.

using Flux: gradient, BilinearUpsample2d
using CuArrays

x = Float32.([1 2; 3 4])[:,:,:,:]
x_c = CuArray(x)

c = BilinearUpsample2d((2,2))

o = c(x)
o_c = c(x_c)

g = gradient(x -> sum(c(x)), x)
g_c = gradient(x -> sum(c(x)), x_c)

The result is that o and o_c are equal, but g and g_c are not.
Anyone have experience getting Zygote to work with CuArrays?

Also removed a faulty doctest

mcabbott · 2020-05-14T10:24:38Z

I haven't looked hard but I suspect that the problem is aliasing: won't you end up writing to two views of the same object, with shifted indices? When this gets done in parallel, I think that CuArrays is not careful about the order.

ltjkoomen · 2020-05-14T11:00:57Z

Even when I remove the @views and do it like this:

v1 = img[ilow1,ilow2,:,:]
v2 = img[ihigh1,ilow2,:,:]
imgupsampled1 = v1 .* (1 .- wdiff1) .+ v2 .* wdiff1
v3 = imgupsampled1[:,ihigh2_r,:,:]
imgupsampled2 = imgupsampled1 .* (1 .- wdiff2) .+ v3 .* wdiff2

It still doesn't work.

~~Wondering if it happens because the indices themselves are no CuArrays, which I did because it leads to a scalar indexing warning and slowdown. That's mentioned here.~~

I think it's because of the way I implemented the second interpolation step (in the 2nd dimension imgupsampled1[:,ihigh2_r,:,:], this will give the wrong gradients as ihigh2_r is copying a single column of imgupsampled1 multiple times. Will need a re-implementation.

I've tried it like this:

v1 = img[ilow1,:,:,:]
v2 = img[ihigh1,:,:,:]

imgupsampled1 = v1 .* (1 .- wdiff1) .+ v2 .* wdiff1

v3 = imgupsampled1[:,ilow2,:,:]
v4 = imgupsampled1[:,ihigh2,:,:]

imgupsampled2 = v3 .* (1 .- wdiff2) .+ v4 .* wdiff2

And with a CuArray it still fails to find the correct gradient, while the CPU code is doing fine all this time.

ltjkoomen · 2020-05-14T22:35:33Z

I have located the problem, it seems gradient doesn't work with a CuArray generated through indexing with an array like idx=[1, 1, 2], like so: out = x[idx]. I have included a minimal example:

using Flux
using CuArrays

function min_example(x, idx)
    out = x[idx]
    return out
end

x = Float32.([1, 2])
idx = [1, 1, 2]
x_c = CuArray(x)

# CPU version
fex(x) = min_example(x, idx)
f(x) = sum(fex(x))
df(x) = gradient(f, x)[1]

# CuArray version
fex_c(x_c) = min_example(x_c, idx)
f_c(x_c) = sum(fex_c(x_c))
df_c(x_c) = gradient(f, x_c)[1]

@assert fex(x) == cpu(fex_c(x_c)) # pass, forward is the same
@assert df(x) == cpu(df_c(x_c)) # fail, gradient is wrong for the CuArray version

Unsure whether this is a bug or expected behavior. Or is this actually the possible issue brought up by @mcabbott ?

Gradient doesn't currently work when using CuArrays

Currently gradient does not work with BilinearUpsampling2d when using CuArrays

mcabbott · 2020-05-15T11:00:45Z

The issue I should have linked is https://github.com/JuliaGPU/CuArrays.jl/issues/684 . Was a bit confused by views etc, but the core idea is that you want to accumulate at indices like idx=[1, 1, 2], which can easily go wrong. (I think the same happens for @strided, threaded broadcasting on CPU.)

Surely there is some way around this, though. If you don't want to up/down-sample by factors larger than 2, can you divide up the work even/odd & be sure that there are no collisions?

ltjkoomen · 2020-05-15T17:28:05Z

What about defining a custom adjoint function instead? The adjoint of upsampling is a downsampling operation, which could be implemented efficiently using the existing convolutional layer code. The forward part seems to work fine even with this current implementation, both the normal- and CUDA-array versions.

Since Zygote isn't able to properly handle the current implementation, and moving to an iterative approach I believe would mean a very significant performance reduction, I have added a custom adjoint. Since the adjoint of upsampling is a downsampling operation, I have used Flux.Conv in combination with a downsample kernel and some manual edge-effect correction.

Effort to reduce memory footprint slightly

type instability occurred due to a variable dimension number, fixed by hard-coding everything to 4 dimensions

Only type-instability left is the fact that the weights of Flux.Conv are not type stable.

ltjkoomen · 2020-05-21T12:28:31Z

I have fixed some type-instabilities, mostly by hardcoding the fact that there are 4 dimensions to the input array and the first 2 are upsampled. This has fixed all type-instability issues in the forward pass. However, the backward pass is still not type-stable because of the fact that I have used the Flux.Conv layer to implement the downsampling operation in the custom adjoint. The type of the weights of Flux.Conv cannot be inferred, an issue which I have already brought up in #1178.

andevellicus · 2020-05-28T03:51:10Z

Is there any way to make this work for more than 2D? I work with medical imaging so a lot of that is in 3 dimensions....

ltjkoomen · 2020-05-28T12:17:14Z

That would be trilinear upsampling. I guess the same method I use for bilinear upsampling could be used, but I haven't tried it yet.

kczimm · 2020-05-29T14:47:52Z

What is pending for this merge? If it just the type instability due to Flux.Conv can it be merged and then it will presumably be fixed when/if Flux.Conv fixes the type instability?

CarloLucibello · 2020-06-06T17:03:11Z

test/runtests.jl

@@ -30,6 +30,7 @@ Random.seed!(0)
    include("layers/normalisation.jl")
    include("layers/stateless.jl")
    include("layers/conv.jl")
+include("layers/upsample.jl")


indentation

CarloLucibello · 2020-06-06T17:06:28Z

I agree this should be merged and type instabilities issues addressed separately

DhairyaLGandhi

Heh, I had the review there but damn I guess I forgot to put it up

Typically we also want to match the indentation generally across the additions

DhairyaLGandhi · 2020-06-06T17:10:14Z

src/layers/upsample.jl

+    wdiff1 = eltype(img).(wdiff1)
+    wdiff2 = eltype(img).(wdiff2)
+
+    if typeof(img) <: CuArray


Why do we need this? The kernel should handle this case generically.

DhairyaLGandhi · 2020-06-06T17:10:49Z

src/layers/upsample.jl

+"""
+@nograd function construct_xq(n::T, m::T) where T<:Integer
+    typed1 = one(n)
+    typed2 = 2typed1


Use T and broadcasting perhaps.

DhairyaLGandhi · 2020-06-06T17:11:32Z

src/layers/upsample.jl

+    ilow = floor.(Int, xq)
+    ihigh = ceil.(Int, xq)
+
+    wdiff = xq[:,:,:,:] .- ilow[:,:,:,:]


Do we need the colons there if it's just everything?

DhairyaLGandhi · 2020-06-06T17:13:20Z

src/layers/upsample.jl

+Upsamples the first two dimensions of the 4-dimensional array `img` by the two upsample factors stored in `k_upsample`,
+using bilinear interpolation. The interpolation grid is identical to the one used by `imresize` from `Images.jl`.
+"""
+function bilinear_upsample2d(img::AbstractArray{T,4}, k_upsample::NTuple{2,<:Real}) where T


What would be needed to get us to be able to do it with 3d convs as well?

+1 to this....

DhairyaLGandhi · 2020-06-06T17:14:24Z

src/layers/upsample.jl

+The above holds as long as `idx` contains every index in `x`.
+ """
+@nograd function adjoint_of_idx(idx::Vector{T}) where T<:Integer
+    d = trues(size(idx))


It might be type unstable to use bools with floats and that might stop julia from using blas calls more efficiently

DhairyaLGandhi · 2020-06-06T17:15:37Z

src/layers/upsample.jl

+    ihigh2_r = adjoint_of_idx(ilow2)[ihigh2]
+
+    wdiff1 = eltype(img).(wdiff1)
+    wdiff2 = eltype(img).(wdiff2)


Use T from the method signature

MikeInnes

This looks really nice, thanks! I think the API needs a bit of discussion but there are no major blockers.

Should the core kernels for this perhaps be in NNlib? Doesn't matter too much since we can move them later, but just thinking about things like adding specialised CUDA implementations and such down the line; this is what we've done with most other kernels like conv.

MikeInnes · 2020-06-08T14:23:05Z

src/layers/upsample.jl

@@ -0,0 +1,325 @@
+"""
+    BilinearUpsample2d(factors::Tuple{Integer,Integer})


Is it necessary for this to be specific to 2D? Could it infer its dimension from factors, like the Conv layer, and be generic across dimension numbers? Also, could it not be considered BilinearInterpolate in general (e.g. with fractional factors)?

(It is fine if dimensions other than 2, or fractional factors, are not currently implemented and throw an error, just nice if we can add them in future.)

[If we do keep 2D it should be capitalised that way.]

MikeInnes · 2020-06-08T14:28:57Z

src/layers/upsample.jl

+end
+
+@adjoint function (c::T where T<:BilinearUpsample2d)(x::AbstractArray)
+    (c::T where T<:BilinearUpsample2d)(x), c̄ -> (nothing, bilinear_upsample_adjoint(c̄, c.factors))


The adjoint should probably be applied to the bilinear_upsample function, not the layer, so that the function can be used directly where appropriate. As with Conv it would be nice to support a non-layer version of this with a nice API.

The type on the second c::T where ... seems unnecessary.

AlfTetzlaff · 2020-06-17T10:10:18Z

Hi everybody, just recently I blindly translated the pytorch bilinear upsampling to julia & cuda. Unfortunately I didn't have the time to finish it. The array indexing is still messed up and honestly I don't really know how the gradient calculation works. I've put everything in a gist, so that you can just copy it, if interested :)

maxfreu · 2020-12-21T16:38:37Z

Hi! I would really like to give this a push. I could really need this (or nearest neighbor) for U-Net implementations. Have the reviews been addressed? If not, what would I have to do to work on them? In the mean time I have finished porting and testing the pytorch implementation core of bilinear upsampling here. It now produces the same output as pytorch for forward and backward (verified via nyan cat). Fractional upscaling factors are allowed, but still only 4D tensors - but in my opinion we dont need more for a start.

DhairyaLGandhi · 2020-12-21T17:18:44Z

I think the pr is in a good shape. Have you been able to benchmark the run time against pytorch as well? Also, there is UNet.jl if that helps.

maxfreu · 2020-12-22T13:48:40Z

Well, benchmarking is a science for itself... It seems that timings are on par in many cases, but there is some caching going on in pytorch. This is how I benchmark:

data = rand(Float32, 256,256,16,32) |> cu
@benchmark upsample_bilinear($data, 2,2) # min: 8ms, mean: 44ms

up_op = nn.UpsamplingBilinear2d(scale_factor=2)
data = torch.rand((32,16,256,256)).to("cuda")
times = []
with torch.no_grad():
    for i in range(100):
        t0 = time()
        up_op(data)
        torch.cuda.synchronize()
        torch.cuda.empty_cache()
        times.append(time()-t0)
print(min(times)*1000) # 19ms with empty_cache()  -  7.2ms without
print(np.mean(times)*1000) # 37ms with empty_cache()  -  7.4 without

CUDA 2.3
pytorch 1.5
GTX 980

maxfreu · 2020-12-22T21:10:57Z

I just checked the changes in this PR, looks good indeed. Forward and backward pass work. However for the image size given in the above post the pytorch gpu kernel is another 3x faster. So maybe it makes sense to add it as dispatch for CuArrays. The pullback should also be straightforward to implement. The question is where this should happen... merge first, then add dispatch? Or the other way round? The kernel I ported is by the way from Caffee2, there are also the newer ATen kernels. As soon as I have figured out how to port them, I could output NearestNeighbour2d/3d & trilinear as well. Oh, and I dont know where to put the kernels - CUDA.jl? Here? NNlib?

CarloLucibello · 2020-12-23T06:14:11Z

@maxfreu cpu implementations shoud go to NNlib, kernel's should go to CUDA.jl if @maleadt agrees, otherwise, we can add them to Flux. A simple layer wrapper should live in Flux.

maxfreu · 2020-12-28T17:26:32Z

I just noted that there is an older, good attempt by @avik-pal here for CPU and here for GPU, but I don't know which method is used. He implemented UpsamplingDims along the lines of ConvDims to produce specialized code. Should we consider that or is it over the top?

@ltjkoomen are you still enganged? Some time early next year might be sufficient 😉

CarloLucibello · 2020-12-30T10:43:22Z

@ltjkoomen has not been responsive for some time. If the cpu implementation here is fine I will move it to NNlib. Any performance optimization can come later, the important thing is to agree on the interface

CarloLucibello · 2020-12-30T10:47:23Z

This should be fine:

NNlib.bilinear_upsample(x::AbstractArray{<:Real,4}, k::NTuple{2,<:Real})

CUDA.jl will have to overload it.

maleadt · 2021-01-04T14:44:12Z

@maxfreu cpu implementations shoud go to NNlib, kernel's should go to CUDA.jl if @maleadt agrees, otherwise, we can add them to Flux. A simple layer wrapper should live in Flux.

GPU implementations of NNlib interfaces can go in CUDA.jl, we already have a couple.

qin-yu · 2021-01-04T18:32:23Z

Indeed we need to finish this layer! I'm also writing a UNet (with repeat(), see #270) right now 😆 I guess the GPU support for repeat() has not even been done (see JuliaGPU/GPUArrays.jl#126)!

ltjkoomen added 2 commits May 13, 2020 15:37

Added upsample layer

afb4099

Added BilinearUpsample2d layer

Added tests

6172279

kczimm mentioned this pull request May 13, 2020

add bilinear upsample layer #1136

Closed

Fixed test error

a1d78a3

DhairyaLGandhi reviewed May 13, 2020

View reviewed changes

src/layers/upsample.jl Outdated Show resolved Hide resolved

Cast weights to CuArray if needed

a7a54c6

Moved T to type signature, moved constructors

9b5ca6f

Also removed a faulty doctest

ltjkoomen added 2 commits May 15, 2020 12:42

Added CUDA gradient test

4d921c4

Gradient doesn't currently work when using CuArrays

Added GPU tests

52ecfd9

Currently gradient does not work with BilinearUpsampling2d when using CuArrays

ltjkoomen added 7 commits May 16, 2020 16:38

Merge branch 'gpu-gradient-fix' into add-upsample-layer

bdc366b

Added some @view macros

66dd999

Effort to reduce memory footprint slightly

More @view macro use

a265d21

Fixed type-instability

b67d4e8

type instability occurred due to a variable dimension number, fixed by hard-coding everything to 4 dimensions

Fixed some type-instabilities in adjoint

0a65268

Only type-instability left is the fact that the weights of Flux.Conv are not type stable.

Merge branch 'fix-type-instability' into add-upsample-layer

cea188b

CarloLucibello reviewed Jun 6, 2020

View reviewed changes

DhairyaLGandhi requested changes Jun 6, 2020

View reviewed changes

MikeInnes reviewed Jun 8, 2020

View reviewed changes

CarloLucibello mentioned this pull request Dec 19, 2020

PyTorch feature parity #1431

Open

92 tasks

maxfreu mentioned this pull request Dec 28, 2020

Implement (bilinear) upsampling JuliaGPU/CUDA.jl#613

Open

CarloLucibello mentioned this pull request Dec 30, 2020

add bilinear upsampling FluxML/NNlib.jl#262

Merged

6 tasks

CarloLucibello closed this Jan 8, 2021

		@@ -0,0 +1,325 @@
		"""
		BilinearUpsample2d(factors::Tuple{Integer,Integer})

Add bilinear upsample layer #1180

Add bilinear upsample layer #1180

Conversation

ltjkoomen commented May 13, 2020

kczimm commented May 13, 2020

DhairyaLGandhi commented May 13, 2020

ltjkoomen commented May 13, 2020

mcabbott commented May 14, 2020

ltjkoomen commented May 14, 2020 • edited Loading

ltjkoomen commented May 14, 2020

mcabbott commented May 15, 2020

ltjkoomen commented May 15, 2020

ltjkoomen commented May 21, 2020

andevellicus commented May 28, 2020

ltjkoomen commented May 28, 2020

kczimm commented May 29, 2020

Choose a reason for hiding this comment

CarloLucibello commented Jun 6, 2020

DhairyaLGandhi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeInnes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlfTetzlaff commented Jun 17, 2020 • edited Loading

maxfreu commented Dec 21, 2020

DhairyaLGandhi commented Dec 21, 2020

maxfreu commented Dec 22, 2020 • edited Loading

maxfreu commented Dec 22, 2020 • edited Loading

CarloLucibello commented Dec 23, 2020 • edited Loading

maxfreu commented Dec 28, 2020

CarloLucibello commented Dec 30, 2020

CarloLucibello commented Dec 30, 2020 • edited Loading

maleadt commented Jan 4, 2021

qin-yu commented Jan 4, 2021

ltjkoomen commented May 14, 2020 •

edited

Loading

AlfTetzlaff commented Jun 17, 2020 •

edited

Loading

maxfreu commented Dec 22, 2020 •

edited

Loading

maxfreu commented Dec 22, 2020 •

edited

Loading

CarloLucibello commented Dec 23, 2020 •

edited

Loading

CarloLucibello commented Dec 30, 2020 •

edited

Loading