WIP: add lazy finite difference `DiffView` and `fdiv` #19

johnnychen94 · 2021-10-11T17:32:48Z

The purpose is to move Images.div here: JuliaImages/Images.jl#971 (comment)

This PR implements a lazy version of fdiff so that we can compose multiple pipeline together into fdiv without the need to worry about memory allocation.

A small benchmark on its performance shows that we still have some room for optimization:

using ImageFiltering
using ImageBase

laplacian(X) = fdiv(ntuple(i->DiffView(X, dims=i), ndims(X))...)
ref_laplacian(X) = imfilter(X, Kernel.Laplacian(ntuple(x->true, ndims(X))), "circular")

X = rand(Float32, 256, 256);
fdiff(X, dims=1) == DiffView(X, dims=1) # true
@btime fdiff($X, dims=1); # 18.822 μs (2 allocations: 256.05 KiB)
@btime collect(DiffView($X, dims=1)); # 151.683 μs (2 allocations: 256.05 KiB)

laplacian(X) == ref_laplacian(X) # true
@btime laplacian($X); # 612.230 μs (2 allocations: 256.05 KiB)
@btime ref_laplacian($X); # 323.276 μs (16 allocations: 521.02 KiB)

X = rand(Float32, 1024, 1024);
fdiff(X, dims=1) == DiffView(X, dims=1) # true
@btime fdiff($X, dims=1); # 326.182 μs (2 allocations: 4.00 MiB)
@btime collect(DiffView($X, dims=1)); # 2.559 ms (2 allocations: 4.00 MiB)

laplacian(X) == ref_laplacian(X) # true
@btime laplacian($X); # 9.336 ms (2 allocations: 4.00 MiB)
@btime ref_laplacian($X); # 5.720 ms (16 allocations: 8.03 MiB)

closes #20

codecov · 2021-10-11T17:45:58Z

Codecov Report

Merging #19 (39058e6) into master (bd3db5b) will increase coverage by 1.74%.
The diff coverage is 97.22%.

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   90.59%   92.33%   +1.74%     
==========================================
  Files           6        6              
  Lines         202      274      +72     
==========================================
+ Hits          183      253      +70     
- Misses         19       21       +2

Impacted Files	Coverage Δ
src/diff.jl	`97.08% <97.22%> (+0.31%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd3db5b...39058e6. Read the comment docs.

johnnychen94 · 2021-10-11T20:59:31Z

src/diff.jl

@@ -1,3 +1,94 @@
+abstract type BoundaryCondition end
+struct Periodic <: BoundaryCondition end


I picked :periodic when I implemented fdiff in #11, and now I realize that imfilter uses "circular" for the same thing. Should I change this PR to use Circular()?

Interesting! Alternatively, should we switch imfilter to "periodic"? I suspect I may have gotten "circular" via my experience using Matlab, but the me of today seems to think that "periodic" is the more obvious choice. Thoughts?

timholy

This looks fantastic. It does seem to blur the line a bit between ImageFiltering and ImageBase, but I think this is a reasonable choice. We could even consider moving Laplacian here I suppose.

I've just left a couple of comments about potential ways to improve performance; everything else is elegant and well designed.

timholy · 2021-10-12T10:50:47Z

src/diff.jl

+        i == A.dims || return p
+        p == first(r) && return last(r)
+        p - 1


This has two branches in it per dimension, and I would bet this is the origin of the performance problems. (Branches are bad on their own and also block @simd.) If you can write this out in ternary cond ? val1 : (cond2 ? val2 : val3) form I bet you'll get substantially better performance. Ternaries get converted to LLVM select statements which are non-branching and SIMD-compatible. (It's like ifelse, except usually the ternary form generates better code.)

If you've never seen this, it's a fun read.

timholy · 2021-10-12T11:00:04Z

src/diff.jl

+
+The in-place version of [`fdiv`](@ref).
+"""
+function fdiv!(dst::AbstractArray, Vs::AbstractArray...)


For performance, it seems possible you'll want to force specialize this with Vs::Vararg{<:AbstractArray,N}. Only the loop that computes sum is sensitive to this, so alternatively you could split that out and force-specialize just that part. But I'm not sure it's bad (i.e., unnecessarily latency-inducing) to specialize the whole thing (it's pretty short).

johnnychen94 · 2021-10-12T13:36:03Z

I noticed that we can move a large number of if-checks to compilation time, and after applying generated functions, I get ~6x faster on the Laplacian operator.
And I have to admit it's quite surprising that it's even faster than the current imfilter-based version:

X = rand(Float32, 256, 256);
laplacian(X) ≈ ref_laplacian(X) # true
@btime laplacian($X); # 108.335 μs (2 allocations: 256.05 KiB)
@btime ref_laplacian($X); # 319.856 μs (17 allocations: 521.05 KiB)

X = rand(Float32, 1024, 1024);
laplacian(X) ≈ ref_laplacian(X) # true
@btime laplacian($X); # 1.792 ms (2 allocations: 4.00 MiB)
@btime ref_laplacian($X); # 6.483 ms (17 allocations: 8.03 MiB)

I'm not very convinced about the implementation, though...

Edit: Urghh... after I pushed the new commits and start a new Julia session, I no longer get the performance boost.. 😢

Maintaining old compatibility becomes quite troublesome especially when we have 1.6 as the new LTS version now.

Also add `laplacian` and `laplacian!`

johnnychen94 · 2021-10-12T17:16:27Z

This lazy view strategy is perhaps more complicated than I thought it was. IIUC the reason that fdiff is faster than DiffView is mainly due to the fact that fdiff eagerly handles the boundaries so that in the inner region, it doesn't need to do the extra boundary check, and thus calls efficient broadcasting implementation. In the contrast, DiffView checks if given position I is on the boundary, and it makes generating @simd codes more difficult than I expect.

I also tried the cond ? val1 : (cond2 ? val2 : val3) way, but they generate the same llvm codes.

I plan to pause this PR for a while and take the plain solution in https://github.com/JuliaImages/Images.jl/blob/f50f59b68895debc6882eebc70f70ded62befdf3/src/algorithms.jl#L505-L537 in a separate PR.

timholy · 2021-10-16T19:23:23Z

No need to pick this up again, but if/when you or someone else does: one thought I had is whether the "wrap-around" boundary condition was to blame for the performance gap. That seems worth testing specifically, perhaps by temporarily disabling it and seeing if you get much better performance. If that proves true, then I wonder if it would help to add something that might trick LLVM into hoisting it. I wonder if adding a field for v = view(data, _the_slice_at_the_array_edge_along_dim...) and handle all wraparound behavior through v?

johnnychen94 changed the title ~~WIP: add ajoint_fdiff and div~~ WIP: add ajoint_fdiff and fdiv Oct 11, 2021

johnnychen94 force-pushed the jc/fdiv branch from e7b9f12 to 7da6e86 Compare October 11, 2021 17:41

johnnychen94 force-pushed the jc/fdiv branch from 7da6e86 to 4b181ed Compare October 11, 2021 20:39

johnnychen94 changed the base branch from master to jc/fdiff_fix October 11, 2021 20:40

johnnychen94 changed the title ~~WIP: add ajoint_fdiff and fdiv~~ WIP: add lazy finite difference DiffView and fdiv Oct 11, 2021

johnnychen94 force-pushed the jc/fdiv branch from 4b181ed to f49ef3d Compare October 11, 2021 20:56

johnnychen94 commented Oct 11, 2021

View reviewed changes

johnnychen94 force-pushed the jc/fdiv branch from f49ef3d to 1148767 Compare October 11, 2021 21:01

timholy reviewed Oct 12, 2021

View reviewed changes

Base automatically changed from jc/fdiff_fix to master October 12, 2021 12:06

johnnychen94 added 4 commits October 12, 2021 22:59

implement DiffView and fdiv

39d784a

CI compatibility fix for Julia < 1.3

8ca4b6f

disable meta quality test in old Julia versions

e72cb9a

Maintaining old compatibility becomes quite troublesome especially when we have 1.6 as the new LTS version now.

move part of getindex computation to compilation time

39058e6

Also add `laplacian` and `laplacian!`

johnnychen94 force-pushed the jc/fdiv branch from 1148767 to 39058e6 Compare October 12, 2021 15:09

This was referenced Oct 14, 2021

implement FiniteDiffs submodule: gradient, divergence and laplacian operators #22

Merged

rename "circular" to "periodic" when reworking padarray JuliaImages/ImageFiltering.jl#232

Open

johnnychen94 closed this in #22 Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add lazy finite difference `DiffView` and `fdiv` #19

WIP: add lazy finite difference `DiffView` and `fdiv` #19

johnnychen94 commented Oct 11, 2021 •

edited

Loading

codecov bot commented Oct 11, 2021 •

edited

Loading

johnnychen94 Oct 11, 2021 •

edited

Loading

timholy Oct 12, 2021

timholy left a comment

timholy Oct 12, 2021

timholy Oct 12, 2021

timholy Oct 12, 2021

johnnychen94 commented Oct 12, 2021 •

edited

Loading

johnnychen94 commented Oct 12, 2021 •

edited

Loading

timholy commented Oct 16, 2021

		@@ -1,3 +1,94 @@
		abstract type BoundaryCondition end
		struct Periodic <: BoundaryCondition end

WIP: add lazy finite difference DiffView and fdiv #19

WIP: add lazy finite difference DiffView and fdiv #19

Conversation

johnnychen94 commented Oct 11, 2021 • edited Loading

codecov bot commented Oct 11, 2021 • edited Loading

Codecov Report

johnnychen94 Oct 11, 2021 • edited Loading

Choose a reason for hiding this comment

timholy Oct 12, 2021

Choose a reason for hiding this comment

timholy left a comment

Choose a reason for hiding this comment

timholy Oct 12, 2021

Choose a reason for hiding this comment

timholy Oct 12, 2021

Choose a reason for hiding this comment

timholy Oct 12, 2021

Choose a reason for hiding this comment

johnnychen94 commented Oct 12, 2021 • edited Loading

johnnychen94 commented Oct 12, 2021 • edited Loading

timholy commented Oct 16, 2021

WIP: add lazy finite difference `DiffView` and `fdiv` #19

WIP: add lazy finite difference `DiffView` and `fdiv` #19

johnnychen94 commented Oct 11, 2021 •

edited

Loading

codecov bot commented Oct 11, 2021 •

edited

Loading

johnnychen94 Oct 11, 2021 •

edited

Loading

johnnychen94 commented Oct 12, 2021 •

edited

Loading

johnnychen94 commented Oct 12, 2021 •

edited

Loading