Should there be `@fastmath maximum` etc? #48082

mcabbott · 2023-01-02T15:35:27Z

Because max is very careful about signed zeros & NaN, maximum is quite slow. Should we make @fastmath maximum the way to avoid this, or is there a better way?

At present you can do @fastmath reduce(max, x), but must supply init. A minimal improvement would be to provide this method:

MethodError: no method matching reducedim_init(::typeof(identity), ::typeof(Base.FastMath.max_fast), ::Matrix{Float64}, ::Int64)

Some quick benchmarks:

julia> fast_maximum(x::AbstractArray{T}; dims) where {T} = @fastmath reduce(max, x; dims, init = float(T)(-Inf));

julia> let x = randn(Float32, 10, 100)
         @btime maximum($x; dims=1)
         @btime fast_maximum($x; dims=1)
         @btime sum($x; dims=1)  # obviously different result!
       end;
  min 3.151 μs, mean 4.278 μs (4 allocations, 608 bytes)
  min 348.730 ns, mean 1.234 μs (1 allocation, 496 bytes)
  min 347.599 ns, mean 1.245 μs (1 allocation, 496 bytes)

julia> let x = randn(Float32, 1000)
         @btime maximum($x)
         @btime fast_maximum($x; dims=:)
         @btime extrema($x)
       end;
  min 2.667 μs, mean 2.689 μs (0 allocations)
  min 94.243 ns, mean 94.790 ns (0 allocations)
  min 8.111 μs, mean 8.225 μs (0 allocations)

The same concerns probably apply to extrema too.

The text was updated successfully, but these errors were encountered:

N5N3 · 2023-01-02T16:44:54Z

Seems reasonable. Although the gap should be much smaller once we land #45581 (for X86) or #47814 (for M1)

mcabbott · 2023-01-02T17:28:34Z

Oh nice. #47814 is roughly a 3x improvement here:

  min 1.146 μs, mean 2.041 μs (4 allocations, 608 bytes)  # dims=1 case above
  min 1.000 μs, mean 1.020 μs (0 allocations)  # dims=: case above

N5N3 · 2023-01-02T17:38:23Z

Oh that's not ideal, I expected that LLVM could vectorize those instructions well when perform reduction. But the bonus looks too small.

Just for reference. With #45581

julia> let x = randn(Float32, 4096)
           @btime maximum($x)
           @btime fast_maximum($x;dims = :)
           @btime sum($x)
       end;
  254.787 ns (0 allocations: 0 bytes)
  146.675 ns (0 allocations: 0 bytes)
  268.731 ns (0 allocations: 0 bytes)

andrewjradcliffe · 2023-01-06T04:52:50Z

Not that it's appropriate for Base, but if one's interest is mapreduce on dense numeric arrays, then my package is more/less a drop-in replacement for anything that can be written as a fold (including varargs, multidimensional, etc.).

Of course, autodiff will not work since it's LoopVectorization-based.

My general thinking on folds is that packing as much optimization into Base as possible is desirable. The reduce generics take care to handle quite a few special cases (missing, NaN, etc.), an approach with which I agree, as the machinery must handle everything thrown at it.
However, where to draw a line?

In any case, a comparison:

julia> let x = randn(Float32, 10, 100)
           @btime maximum($x; dims=1)
           @btime fast_maximum($x; dims=1)
           @btime sum($x; dims=1)  # obviously different result!
           @btime vvmaximum($x, dims=1)
           @btime extrema($x, dims=1)
           @btime vvextrema($x, dims=1)
       end;

  3.037 μs (4 allocations: 608 bytes)
  689.692 ns (1 allocation: 496 bytes)
  449.164 ns (1 allocation: 496 bytes)
  193.622 ns (1 allocation: 496 bytes)
  4.709 μs (1 allocation: 896 bytes)
  518.063 ns (3 allocations: 1.84 KiB)

julia> let x = randn(Float32, 1000)
           @btime maximum($x)
           @btime fast_maximum($x;dims = :)
           @btime extrema($x)
           @btime vvmaximum($x)
           @btime vvextrema($x)
       end;
  1.521 μs (0 allocations: 0 bytes)
  1.056 μs (0 allocations: 0 bytes)
  5.003 μs (0 allocations: 0 bytes)
  26.746 ns (0 allocations: 0 bytes)
  39.075 ns (0 allocations: 0 bytes)

mcabbott mentioned this issue Jan 6, 2023

@fastmath maximum #48153

Merged

mcabbott closed this as completed Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should there be `@fastmath maximum` etc? #48082

Should there be `@fastmath maximum` etc? #48082

mcabbott commented Jan 2, 2023

N5N3 commented Jan 2, 2023 •

edited

Loading

mcabbott commented Jan 2, 2023

N5N3 commented Jan 2, 2023 •

edited

Loading

andrewjradcliffe commented Jan 6, 2023

Should there be @fastmath maximum etc? #48082

Should there be @fastmath maximum etc? #48082

Comments

mcabbott commented Jan 2, 2023

N5N3 commented Jan 2, 2023 • edited Loading

mcabbott commented Jan 2, 2023

N5N3 commented Jan 2, 2023 • edited Loading

andrewjradcliffe commented Jan 6, 2023

Should there be `@fastmath maximum` etc? #48082

Should there be `@fastmath maximum` etc? #48082

N5N3 commented Jan 2, 2023 •

edited

Loading

N5N3 commented Jan 2, 2023 •

edited

Loading