-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should there be @fastmath maximum
etc?
#48082
Comments
Oh nice. #47814 is roughly a 3x improvement here:
|
Oh that's not ideal, I expected that LLVM could vectorize those instructions well when perform reduction. But the bonus looks too small. Just for reference. With #45581 julia> let x = randn(Float32, 4096)
@btime maximum($x)
@btime fast_maximum($x;dims = :)
@btime sum($x)
end;
254.787 ns (0 allocations: 0 bytes)
146.675 ns (0 allocations: 0 bytes)
268.731 ns (0 allocations: 0 bytes) |
Not that it's appropriate for Base, but if one's interest is Of course, autodiff will not work since it's LoopVectorization-based. My general thinking on folds is that packing as much optimization into Base as possible is desirable. The reduce generics take care to handle quite a few special cases ( In any case, a comparison: julia> let x = randn(Float32, 10, 100)
@btime maximum($x; dims=1)
@btime fast_maximum($x; dims=1)
@btime sum($x; dims=1) # obviously different result!
@btime vvmaximum($x, dims=1)
@btime extrema($x, dims=1)
@btime vvextrema($x, dims=1)
end;
3.037 μs (4 allocations: 608 bytes)
689.692 ns (1 allocation: 496 bytes)
449.164 ns (1 allocation: 496 bytes)
193.622 ns (1 allocation: 496 bytes)
4.709 μs (1 allocation: 896 bytes)
518.063 ns (3 allocations: 1.84 KiB)
julia> let x = randn(Float32, 1000)
@btime maximum($x)
@btime fast_maximum($x;dims = :)
@btime extrema($x)
@btime vvmaximum($x)
@btime vvextrema($x)
end;
1.521 μs (0 allocations: 0 bytes)
1.056 μs (0 allocations: 0 bytes)
5.003 μs (0 allocations: 0 bytes)
26.746 ns (0 allocations: 0 bytes)
39.075 ns (0 allocations: 0 bytes) |
Because
max
is very careful about signed zeros & NaN,maximum
is quite slow. Should we make@fastmath maximum
the way to avoid this, or is there a better way?At present you can do
@fastmath reduce(max, x)
, but must supplyinit
. A minimal improvement would be to provide this method:Some quick benchmarks:
The same concerns probably apply to
extrema
too.The text was updated successfully, but these errors were encountered: