-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add lazy finite difference DiffView
and fdiv
#19
Conversation
ajoint_fdiff
and div
ajoint_fdiff
and fdiv
Codecov Report
@@ Coverage Diff @@
## master #19 +/- ##
==========================================
+ Coverage 90.59% 92.33% +1.74%
==========================================
Files 6 6
Lines 202 274 +72
==========================================
+ Hits 183 253 +70
- Misses 19 21 +2
Continue to review full report at Codecov.
|
ajoint_fdiff
and fdiv
DiffView
and fdiv
@@ -1,3 +1,94 @@ | |||
abstract type BoundaryCondition end | |||
struct Periodic <: BoundaryCondition end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked :periodic
when I implemented fdiff
in #11, and now I realize that imfilter
uses "circular"
for the same thing. Should I change this PR to use Circular()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Alternatively, should we switch imfilter to "periodic"
? I suspect I may have gotten "circular"
via my experience using Matlab, but the me of today seems to think that "periodic"
is the more obvious choice. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic. It does seem to blur the line a bit between ImageFiltering and ImageBase, but I think this is a reasonable choice. We could even consider moving Laplacian
here I suppose.
I've just left a couple of comments about potential ways to improve performance; everything else is elegant and well designed.
src/diff.jl
Outdated
i == A.dims || return p | ||
p == first(r) && return last(r) | ||
p - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has two branches in it per dimension, and I would bet this is the origin of the performance problems. (Branches are bad on their own and also block @simd
.) If you can write this out in ternary cond ? val1 : (cond2 ? val2 : val3)
form I bet you'll get substantially better performance. Ternaries get converted to LLVM select
statements which are non-branching and SIMD-compatible. (It's like ifelse
, except usually the ternary form generates better code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you've never seen this, it's a fun read.
|
||
The in-place version of [`fdiv`](@ref). | ||
""" | ||
function fdiv!(dst::AbstractArray, Vs::AbstractArray...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance, it seems possible you'll want to force specialize this with Vs::Vararg{<:AbstractArray,N}
. Only the loop that computes sum
is sensitive to this, so alternatively you could split that out and force-specialize just that part. But I'm not sure it's bad (i.e., unnecessarily latency-inducing) to specialize the whole thing (it's pretty short).
I noticed that we can move a large number of if-checks to compilation time, and after applying generated functions, I get ~6x faster on the Laplacian operator. X = rand(Float32, 256, 256);
laplacian(X) ≈ ref_laplacian(X) # true
@btime laplacian($X); # 108.335 μs (2 allocations: 256.05 KiB)
@btime ref_laplacian($X); # 319.856 μs (17 allocations: 521.05 KiB)
X = rand(Float32, 1024, 1024);
laplacian(X) ≈ ref_laplacian(X) # true
@btime laplacian($X); # 1.792 ms (2 allocations: 4.00 MiB)
@btime ref_laplacian($X); # 6.483 ms (17 allocations: 8.03 MiB) I'm not very convinced about the implementation, though... Edit: Urghh... after I pushed the new commits and start a new Julia session, I no longer get the performance boost.. 😢 |
Maintaining old compatibility becomes quite troublesome especially when we have 1.6 as the new LTS version now.
Also add `laplacian` and `laplacian!`
This lazy view strategy is perhaps more complicated than I thought it was. IIUC the reason that I also tried the I plan to pause this PR for a while and take the plain solution in https://github.com/JuliaImages/Images.jl/blob/f50f59b68895debc6882eebc70f70ded62befdf3/src/algorithms.jl#L505-L537 in a separate PR. |
No need to pick this up again, but if/when you or someone else does: one thought I had is whether the "wrap-around" boundary condition was to blame for the performance gap. That seems worth testing specifically, perhaps by temporarily disabling it and seeing if you get much better performance. If that proves true, then I wonder if it would help to add something that might trick LLVM into hoisting it. I wonder if adding a field for |
The purpose is to move
Images.div
here: JuliaImages/Images.jl#971 (comment)This PR implements a lazy version of
fdiff
so that we can compose multiple pipeline together intofdiv
without the need to worry about memory allocation.A small benchmark on its performance shows that we still have some room for optimization:
closes #20