Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should there be @fastmath maximum etc? #48082

Closed
mcabbott opened this issue Jan 2, 2023 · 4 comments
Closed

Should there be @fastmath maximum etc? #48082

mcabbott opened this issue Jan 2, 2023 · 4 comments

Comments

@mcabbott
Copy link
Contributor

mcabbott commented Jan 2, 2023

Because max is very careful about signed zeros & NaN, maximum is quite slow. Should we make @fastmath maximum the way to avoid this, or is there a better way?

At present you can do @fastmath reduce(max, x), but must supply init. A minimal improvement would be to provide this method:

MethodError: no method matching reducedim_init(::typeof(identity), ::typeof(Base.FastMath.max_fast), ::Matrix{Float64}, ::Int64)

Some quick benchmarks:

julia> fast_maximum(x::AbstractArray{T}; dims) where {T} = @fastmath reduce(max, x; dims, init = float(T)(-Inf));

julia> let x = randn(Float32, 10, 100)
         @btime maximum($x; dims=1)
         @btime fast_maximum($x; dims=1)
         @btime sum($x; dims=1)  # obviously different result!
       end;
  min 3.151 μs, mean 4.278 μs (4 allocations, 608 bytes)
  min 348.730 ns, mean 1.234 μs (1 allocation, 496 bytes)
  min 347.599 ns, mean 1.245 μs (1 allocation, 496 bytes)

julia> let x = randn(Float32, 1000)
         @btime maximum($x)
         @btime fast_maximum($x; dims=:)
         @btime extrema($x)
       end;
  min 2.667 μs, mean 2.689 μs (0 allocations)
  min 94.243 ns, mean 94.790 ns (0 allocations)
  min 8.111 μs, mean 8.225 μs (0 allocations)

The same concerns probably apply to extrema too.

@N5N3
Copy link
Member

N5N3 commented Jan 2, 2023

Seems reasonable. Although the gap should be much smaller once we land #45581 (for X86) or #47814 (for M1)

@mcabbott
Copy link
Contributor Author

mcabbott commented Jan 2, 2023

Oh nice. #47814 is roughly a 3x improvement here:

  min 1.146 μs, mean 2.041 μs (4 allocations, 608 bytes)  # dims=1 case above
  min 1.000 μs, mean 1.020 μs (0 allocations)  # dims=: case above

@N5N3
Copy link
Member

N5N3 commented Jan 2, 2023

Oh that's not ideal, I expected that LLVM could vectorize those instructions well when perform reduction. But the bonus looks too small.

Just for reference. With #45581

julia> let x = randn(Float32, 4096)
           @btime maximum($x)
           @btime fast_maximum($x;dims = :)
           @btime sum($x)
       end;
  254.787 ns (0 allocations: 0 bytes)
  146.675 ns (0 allocations: 0 bytes)
  268.731 ns (0 allocations: 0 bytes)

@andrewjradcliffe
Copy link
Contributor

Not that it's appropriate for Base, but if one's interest is mapreduce on dense numeric arrays, then my package is more/less a drop-in replacement for anything that can be written as a fold (including varargs, multidimensional, etc.).

Of course, autodiff will not work since it's LoopVectorization-based.

My general thinking on folds is that packing as much optimization into Base as possible is desirable. The reduce generics take care to handle quite a few special cases (missing, NaN, etc.), an approach with which I agree, as the machinery must handle everything thrown at it.
However, where to draw a line?

In any case, a comparison:

julia> let x = randn(Float32, 10, 100)
           @btime maximum($x; dims=1)
           @btime fast_maximum($x; dims=1)
           @btime sum($x; dims=1)  # obviously different result!
           @btime vvmaximum($x, dims=1)
           @btime extrema($x, dims=1)
           @btime vvextrema($x, dims=1)
       end;

  3.037 μs (4 allocations: 608 bytes)
  689.692 ns (1 allocation: 496 bytes)
  449.164 ns (1 allocation: 496 bytes)
  193.622 ns (1 allocation: 496 bytes)
  4.709 μs (1 allocation: 896 bytes)
  518.063 ns (3 allocations: 1.84 KiB)

julia> let x = randn(Float32, 1000)
           @btime maximum($x)
           @btime fast_maximum($x;dims = :)
           @btime extrema($x)
           @btime vvmaximum($x)
           @btime vvextrema($x)
       end;
  1.521 μs (0 allocations: 0 bytes)
  1.056 μs (0 allocations: 0 bytes)
  5.003 μs (0 allocations: 0 bytes)
  26.746 ns (0 allocations: 0 bytes)
  39.075 ns (0 allocations: 0 bytes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants