Upsampling GPU Kernel for Flux #293

avik-pal · 2019-02-24T08:35:14Z

Current Issues:

Line 47 in upsample.jl is not a safe operation without atomic addition. It results in incorrect and inconsistent answers in the backward pass.

@maleadt suggested on slack to use CUDAatomics for now, but I could not get it working. Concerns regarding that

It needs CuDeviceArray, I am not entirely sure how to pass CuArray to it.
Should we be adding a dependency on a package which is not recommended.

Stuff to handle before it is ready for merging:

Open the corresponding pull requests in Flux.jl and NNlib.jl to test the complete working
Add tests to verify for the correctness of the code.

I am not entirely sure if this kernel belongs at this location (inside CUDNN). I can shift it to any other location if needed.

cc @MikeInnes

dbadrian · 2019-09-02T14:48:54Z

Okay, so I had a crack at this today.

Firstly, CUDAnative itself now supports atomics, thus we can make a direct call ala

    @inbounds CUDAnative.atomic_add!(pointer(x, x_idx), y[y_idx])

For sake of interest the call to pointer (dunno where its actually defined, I guess CUDAnative), probably only does

  ptr = convert(CuArrays.CuPtr{T}, CuArrays.buffer(x, x_idx))
  CUDAnative.DevicePtr{T,AS.Global}(ptr)

However, the code itself for the ∇upsample has a two bugs: firstly, after the call to similar, we need to zero out the tensor or we add to the initial random values and, secondly, the height/width needs to be recalculated, or we have shifted indices (unless I fundamentally misunderstood how the gradients will propagate).

tl;dr:

function ∇upsample_kernel(state, y, x, height, width, channels, batch, stride, scale)
    i = @linearidx y state

    y_idx = i
    y_h = (i - 1) % (height * stride[1]) + 1
    i = (i - 1) ÷ (height * stride[1])
    y_w = i % (width * stride[2]) + 1
    i = i ÷ (width * stride[2])
    y_c = i % channels + 1
    i = i ÷ channels
    y_b = i % batch + 1

    x_h = (y_h - 1) ÷ stride[1] + 1
    x_w = (y_w - 1) ÷ stride[2] + 1
    x_c = y_c
    x_idx = (y_b - 1) * width * height * channels + (x_c - 1) * width * height + (x_w - 1) * height + x_h

    @inbounds CUDAnative.atomic_add!(pointer(x, x_idx), y[y_idx])

    return nothing
end

function ∇upsample(dy::CuArray, stride, scale = 1)
    (height, width, channels, batch) = size(dy)
    @assert height % 2 == 0
    @assert width % 2 == 0
    dx = similar(dy, (height ÷ stride[1], width ÷ stride[2], channels, batch))
    dx .= 0.0
    (height, width, channels, batch) = size(dx)
    gpu_call(∇upsample_kernel, dy, (dy, dx, height, width, channels, batch, stride, scale))
    return dx
end

Maybe the call to CUDAnative can be made smarter, that's simply how I got it to work locally quickly. Maybe @maleadt can give some advice. I guess the correct usage would be by doing a @atomic? I did see a recent bug https://github.com/JuliaGPU/CUDAnative.jl/issues/421 so I didnt test it for now.

One problem I can imagine with the current code. If scale, e.g., is not of the same T as CuArray elements, then the call to CUDAnative will fail (whether it should be caught beforehand/made impossible by changing the function signature of upsample - dunno).

For tests, I didn't write any yet. I manually verified what x_idx/y_idx are generated for a simple (2,2,1,1) -> (4,4,1,1) upsampling case and empirically my model where I use it (with all dims >1), it seems to work (comparing to the alternative of abusing a ConvTranspose as upsampling layer).

However, I am happy to contribute some test cases as well soon. Let's crack down on some stale PR :)

vchuravy · 2019-09-02T16:56:25Z

Great! Please do open a PR and also take a look at @atomic from CUDAnative.

dbadrian · 2019-09-02T17:30:46Z

Okay, will do so soon (currently semi-homeless, so a tad bit difficult to get things done these days).

Also just took a look at the code in FluxML/NNlib.jl#95 by pshask, which seems to work as well, including on CuArrays. Quick test didn't yield big speed differences, but need to do some proper benchmarking. Guess, based on the results, one solution should be picked. Although his works on both Cu/Arrays, which is neat and clean.

maleadt · 2019-09-03T14:11:45Z

Maybe the call to CUDAnative can be made smarter, that's simply how I got it to work locally quickly. Maybe @maleadt can give some advice. I guess the correct usage would be by doing a @atomic? I did see a recent bug JuliaGPU/CUDAnative.jl#421 so I didnt test it for now.

To clarify, @atomic disable bounds checking, so that's not an issue in your case. It's just syntactic sugar though, so it's perfectly fine to just call atomic_add! explicitly.

sdewaele · 2020-01-30T13:58:40Z

Thanks for working on this! A fully functional upsampling layer will be a very important addition to Flux.

Add Upsampling GPU Kernel

6fa8b76

maleadt changed the title ~~Upsampling GPU Kernel for Flux~~ WIP: Upsampling GPU Kernel for Flux Apr 15, 2019

maleadt force-pushed the master branch from 2e25b28 to 16ebad8 Compare December 20, 2019 16:03

maleadt force-pushed the master branch 2 times, most recently from fc487fd to fced436 Compare January 22, 2020 15:17

maleadt force-pushed the master branch from 2c08630 to 6ca9b4f Compare February 7, 2020 11:23

maleadt force-pushed the master branch from 8259597 to 1fe1883 Compare March 10, 2020 11:42

maleadt added enhancement needs tests labels May 8, 2020

maleadt marked this pull request as draft May 8, 2020 10:35

maleadt changed the title ~~WIP: Upsampling GPU Kernel for Flux~~ Upsampling GPU Kernel for Flux May 8, 2020

This was referenced Dec 28, 2020

Implement (bilinear) upsampling JuliaGPU/CUDA.jl#613

Open

Add bilinear upsample layer FluxML/Flux.jl#1180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upsampling GPU Kernel for Flux #293

Upsampling GPU Kernel for Flux #293

avik-pal commented Feb 24, 2019 •

edited

Loading

dbadrian commented Sep 2, 2019

vchuravy commented Sep 2, 2019

dbadrian commented Sep 2, 2019

maleadt commented Sep 3, 2019

sdewaele commented Jan 30, 2020

Upsampling GPU Kernel for Flux #293

Are you sure you want to change the base?

Upsampling GPU Kernel for Flux #293

Conversation

avik-pal commented Feb 24, 2019 • edited Loading

Current Issues:

Stuff to handle before it is ready for merging:

dbadrian commented Sep 2, 2019

vchuravy commented Sep 2, 2019

dbadrian commented Sep 2, 2019

maleadt commented Sep 3, 2019

sdewaele commented Jan 30, 2020

avik-pal commented Feb 24, 2019 •

edited

Loading