improve bilinear upsampling #266

maxfreu · 2021-01-11T16:39:48Z

Sorry to bother you again! But this stuff didn't let me sleep, so here is a CPU implementation. See GPU PR here.

parallelized
faster than current implementation
open-cv and pytorch compliant
tests pass <- breaking tests seem unrelated (?)

I reviewed the tests of the current implementation and found it a bit strange, actually. So this one comes with a bit different tests.

Mean benchmark times in ms, upsampling by a factor of 2,
tested on 12 threads @3.7Ghz, julia 1.7

32x32x1024x1

	before	after
forward	53	2.7
backward	94	0.8

196x196x128x1

	before	after
forward	318	21
backward	64	3.6

single threaded:

32x32x1024x1

	before	after
forward	50	27
backward	71	7.5

196x196x128x1

	before	after
forward	300	150
backward	56	35

Single core would rock with cwhn tensor layout, but parallelized they are more or less the same.

Kind regards! :)

CarloLucibello · 2021-01-12T16:51:55Z

Well, that's an awesome improvement!
Can you benchmark single-threaded performance against current master as well?
Maybe we should also start adding benchmark scripts to some perf/ folder, even if they are not very polished.
Can you annotate with a comment tests where the outputs matches the pytorch one, so that we avoid deleting them in the future?

src/upsample.jl

maxfreu · 2021-01-14T14:57:29Z

In case you didn't see: I posted the single core benchmark in the original post.

CarloLucibello · 2021-01-16T11:01:58Z

ufff, maxpool tests are passing again, it's incredible how unstable they are. Can you change @test_broken to @test there?

maxfreu · 2021-01-16T17:11:44Z

I couldn't find any occurences of @test_broken in my pooling.jl test file. Did you mean I should remove broken=xy in gradtest? I just got a glimpse of what you mean by unstable; sometimes the AutoDiff: spatial_rank test passes (locally), sometimes not.

CarloLucibello · 2021-01-16T17:32:52Z

Yes, maybe change

gradtest(x -> maxpool(x, pdims), x; broken=spatial_rank <= 2)

to

gradtest(x -> maxpool(x, pdims), x; broken=spatial_rank <= 0)

since I have the impression we will have to change it back soon

test/pooling.jl

CarloLucibello · 2021-01-17T09:18:46Z

Can you add a test with non-integer scale?

It's a bit sad we lose CuArray support until JuliaGPU/CUDA.jl#636 is merged, but I cannot think of a way around it.

Now that you have thought a bit more about this, would you be able to extend the code to support the 1d and 3d case (in a later PR)?

Following the discussion in FluxML/Flux.jl#1468, can you add support for integer scale?
You are welcome to chime in the discussion on the interface btw

maxfreu · 2021-01-18T09:38:59Z

Supporting 1D and 3D is easy; just one less or more for-loop. Nearest neighbour is also easy to implement this way, as only the source index calculation is changed.

CarloLucibello · 2021-01-18T12:17:55Z

Supporting 1D and 3D is easy; just one less or more for-loop.

a PR would be much appreciated

src/upsample.jl

CarloLucibello · 2021-01-20T05:40:54Z

can rebase and remove the changes to the pooling tests now

maxfreu · 2021-01-20T10:05:33Z

can rebase and remove the changes to the pooling tests now

Sorry I don't know what rebase is. Do you mean I should simply undo the changes to the pooling tests? Probably I shouldn't have commited to the master branch of my fork. (?)

mcabbott · 2021-01-20T10:18:32Z

You might be able to merge with this website, maybe the button says "resolve conflicts". On master there is now an alternative fix to the problem solved by "broken=spatial_rank == 0) # was == 2 before" etc.

src/upsample.jl

mcabbott · 2021-01-28T09:50:37Z

Again, the way this API was set was for syntactic consistency
I would keep the consistency here.

Which other functions do you have in mind? These place the mutated dx first:

methods(NNlib.∇maxpool!)
methods(NNlib.∇softmax!)
methods(NNlib.∇conv_data!) # clearer in comments than in variable names

DhairyaLGandhi · 2021-01-28T10:05:16Z

I'm pretty certain that is context dependent? And maybe some of those functions need to be updated accordingly then.

maxfreu · 2021-01-28T10:40:01Z

I think the if size(dx) == size(Δ) check (trivial case) can move earlier, to be done before allocating dx at all.

I was only thinking about the overhead of the comparison itself, not the allocation, right.

Do ∇upsample_bilinear_whcn_kernel! and ∇upsample_bilinear! need to be distinct functions at all? Even without this step, the other PR could overload ∇upsample_bilinear!(::CuArray, ...) and be sure to get the answer.

The arguments to the GPU kernel and this one are a bit different. I could bring both in line, but this would require to hoist some of the logic out of the CPU kernel, which would make things less clear maybe, but it works. My thoughts about the API go like this:

const NDA = NamedDimsArray
upsample_bilinear!(y::AbstractArray..., x) = upsample_bilinear_whcn_kernel!(y, x) # backwards compatibility
# these two could be fused into one, yes. The parent() call would have to go into the kernel then.
upsample_bilinear!(y::NDA{(:w,:h,:c,:n)}, x::...) = upsample_bilinear_whcn_kernel!(parent(y), parent(x))
upsample_bilinear!(y::NDA{(:c,:w,:h,:n)}, x)      = upsample_bilinear_cwhn_kernel!(parent(y), parent(x))

function upsample_bilinear!(y::NDA{(:w,:h,:c,:n),T,N,A}, x::...) where {T, N, A<:CuArray}
  a,b,c = ...
  threads = ...
  blocks = ...
  @cuda threads blocks upsample_bilinear_whcn_kernel!(a,b,c, parent(x), parent(y))  # <- the GPU kernel args are a bit different
  return y
end

upsample_bilinear!(y::NDA{(:c,:w,:h,:n)}, x) where {T, N, A<:CuArray} = ...

# gradient analogously

Edit: I basically don't care about the argument order. Should we vote or do you have a dictator? 🤣

mcabbott · 2021-01-28T11:05:42Z

OK. My suggestion here would look more like this:

upsample_bilinear!(y, x) = upsample_bilinear_whcn!(y, x)  # maybe? 
# Not sure this function need exist, `upsample_bilinear(x)` can call `upsample_bilinear_whcn!(y,x)` directly?

function upsample_bilinear_whcn!(y::AbstractArray, x:: AbstractArray)
  # direct implementation as in this PR
end

# This worker has one job, very simply dispatch, will never change:
function upsample_bilinear_whcn!(y::CuArray, x::...)
  a,b,c = ...
  threads = ...
  blocks = ...
  @cuda threads blocks upsample_bilinear_whcn_kernel!(a,b,c, parent(x), parent(y))  # the real GPU kernel
  return y
end

# These two workers can be added later, without breaking anything:
upsample_bilinear_cwhn!(y::AbstractArray, x:: AbstractArray)
upsample_bilinear_cwhn!(y::CuArray, x::...) = ...

const NDA = NamedDimsArray
# Only one function dispatches on NDA, and it does not need to load CUDA:
upsample_bilinear(x::NDA{(:w,:h,:c,:n)}, scale) = begin ... upsample_bilinear_whcn!(parent(y), parent(x))
upsample_bilinear(x::NDA{(:c,:w,:h,:n)}, scale) = begin ...  upsample_bilinear_cwhn!(parent(y), parent(x))

Re argument order, it looks fine I think, Dhairya got me worried that we were all over the map in this package, but the examples I can find seem pretty consistent. So this PR should match those, and it does.

maxfreu · 2021-01-28T11:46:06Z

OK. My suggestion here would look more like this:

Aah yes, molto bene :) Will massage it tomorrow. The rest should maybe be discussed on slack or zulip or so - where are you?

src/upsample.jl

CarloLucibello · 2021-02-01T08:24:19Z

Tried to commit some of the suggestions but don't have the rights

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

src/upsample.jl

CarloLucibello · 2021-02-02T07:30:54Z

@DhairyaLGandhi I lost write access

DhairyaLGandhi · 2021-02-02T08:51:14Z

Haven't changed anything, what does it say?

CarloLucibello · 2021-02-02T09:07:19Z

I don't know what's happening, I lost write access yesterday, couldn't see the Merge Pull Request button, but it reappeared right now

DhairyaLGandhi · 2021-02-02T09:16:57Z

Please address the comment on the api before merging

maxfreu · 2021-02-02T13:08:11Z

Hi, I'm a bit late to the party: There have been many comments on the API - which changes do you refer to? This one #266 (comment)?

CarloLucibello · 2021-02-04T08:08:45Z

Hi, I'm a bit late to the party: There have been many comments on the API - which changes do you refer to? This one #266 (comment)?

The comment is in this thread #266 (comment), but I don't think there is anything to address. I'll merge in 1 day if no objections arise.

DhairyaLGandhi · 2021-02-04T09:34:25Z

Well, since it's an api change, I'd be careful not to merge without proper checks.

maxfreu · 2021-02-05T12:10:18Z

Thanks everybody for your time and efforts in making this better! :) I'll try to finish the GPU PR next week, depending on Tim's occupancy.

maxfreu · 2021-02-11T13:44:05Z

JuliaGPU/CUDA.jl#636 has been merged. After the next release tag we'll be at warp speed :)

A quick test with (32,32,1024,1) on my GTX980 shows 3.3us for bilinear upsampling vs 4.4us for nearest, so I recommend the former for now (some day nearest will be faster). On CPU single threaded they are about the same, but bilinear can take advantage of more cores.

CarloLucibello · 2021-02-11T13:48:04Z

That was some great work!

roflmaostc · 2021-05-05T21:21:16Z

Reading all these several issues/PRs about the bilinear upsampling layer is like reading a good book. It was quite exciting to follow the whole discussions and seeing the final performance and outcome 😆
Great work!

maxfreu added 2 commits January 11, 2021 16:11

improve bilinear upsampling

6520ed3

cleanup

6f63fce

maxfreu mentioned this pull request Jan 11, 2021

bilinear upsampling JuliaGPU/CUDA.jl#636

Merged

remove unused code/comments, relax test atol

37f0eec

annotate tests, improve doc

010cfb9

CarloLucibello reviewed Jan 13, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jan 13, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

update docs

36eaa25

change AutoDiff spatial rank test

05ba078

maxfreu commented Jan 16, 2021

View reviewed changes

test/pooling.jl Outdated Show resolved Hide resolved

CarloLucibello mentioned this pull request Jan 17, 2021

add Upsample and PixelShuffle layers FluxML/Flux.jl#1468

Merged

4 tasks

maxfreu mentioned this pull request Jan 18, 2021

Add upsample_nearest #269

Merged

introduce single-number arg and test for real upscaling factors

2763d9d

mcabbott reviewed Jan 18, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

mcabbott reviewed Jan 18, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

maxfreu and others added 2 commits January 21, 2021 12:35

outsize -> size, improve grad test

079e71b

Merge branch 'master' into master

15d1f94

mcabbott reviewed Jan 21, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

mcabbott approved these changes Jan 21, 2021

View reviewed changes

finalize internal API (?)

266adda

CarloLucibello reviewed Feb 1, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Feb 1, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Feb 1, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Feb 1, 2021

View reviewed changes

src/upsample.jl Show resolved Hide resolved

CarloLucibello reviewed Feb 1, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

maxfreu and others added 2 commits February 1, 2021 11:35

improve docs, clean up grad signature

1827d41

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

split function into two methods for size/scale

7240ecf

CarloLucibello reviewed Feb 2, 2021

View reviewed changes

src/upsample.jl Outdated Show resolved Hide resolved

fix rrule

8e8e0dd

CarloLucibello approved these changes Feb 2, 2021

View reviewed changes

CarloLucibello merged commit 3623541 into FluxML:master Feb 5, 2021

maxfreu mentioned this pull request Feb 15, 2021

Hint: NNlib now contains code for nearest and bilinear upsampling JuliaImages/ImageTransformations.jl#113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve bilinear upsampling #266

improve bilinear upsampling #266

maxfreu commented Jan 11, 2021 •

edited

Loading

CarloLucibello commented Jan 12, 2021

maxfreu commented Jan 14, 2021

CarloLucibello commented Jan 16, 2021

maxfreu commented Jan 16, 2021

CarloLucibello commented Jan 16, 2021

CarloLucibello commented Jan 17, 2021

maxfreu commented Jan 18, 2021

CarloLucibello commented Jan 18, 2021

CarloLucibello commented Jan 20, 2021

maxfreu commented Jan 20, 2021

mcabbott commented Jan 20, 2021

mcabbott commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

maxfreu commented Jan 28, 2021 •

edited

Loading

mcabbott commented Jan 28, 2021 •

edited

Loading

maxfreu commented Jan 28, 2021

CarloLucibello commented Feb 1, 2021

CarloLucibello commented Feb 2, 2021

DhairyaLGandhi commented Feb 2, 2021

CarloLucibello commented Feb 2, 2021

DhairyaLGandhi commented Feb 2, 2021 •

edited

Loading

maxfreu commented Feb 2, 2021

CarloLucibello commented Feb 4, 2021

DhairyaLGandhi commented Feb 4, 2021

maxfreu commented Feb 5, 2021

maxfreu commented Feb 11, 2021

CarloLucibello commented Feb 11, 2021

roflmaostc commented May 5, 2021

improve bilinear upsampling #266

improve bilinear upsampling #266

Conversation

maxfreu commented Jan 11, 2021 • edited Loading

CarloLucibello commented Jan 12, 2021

maxfreu commented Jan 14, 2021

CarloLucibello commented Jan 16, 2021

maxfreu commented Jan 16, 2021

CarloLucibello commented Jan 16, 2021

CarloLucibello commented Jan 17, 2021

maxfreu commented Jan 18, 2021

CarloLucibello commented Jan 18, 2021

CarloLucibello commented Jan 20, 2021

maxfreu commented Jan 20, 2021

mcabbott commented Jan 20, 2021

mcabbott commented Jan 28, 2021

DhairyaLGandhi commented Jan 28, 2021

maxfreu commented Jan 28, 2021 • edited Loading

mcabbott commented Jan 28, 2021 • edited Loading

maxfreu commented Jan 28, 2021

CarloLucibello commented Feb 1, 2021

CarloLucibello commented Feb 2, 2021

DhairyaLGandhi commented Feb 2, 2021

CarloLucibello commented Feb 2, 2021

DhairyaLGandhi commented Feb 2, 2021 • edited Loading

maxfreu commented Feb 2, 2021

CarloLucibello commented Feb 4, 2021

DhairyaLGandhi commented Feb 4, 2021

maxfreu commented Feb 5, 2021

maxfreu commented Feb 11, 2021

CarloLucibello commented Feb 11, 2021

roflmaostc commented May 5, 2021

maxfreu commented Jan 11, 2021 •

edited

Loading

maxfreu commented Jan 28, 2021 •

edited

Loading

mcabbott commented Jan 28, 2021 •

edited

Loading

DhairyaLGandhi commented Feb 2, 2021 •

edited

Loading