Using AcuteBenchmark system #5

aminya · 2020-01-01T14:27:35Z

I created a package for benchmarking functions called AcuteBenchmark.
https://github.com/aminya/AcuteBenchmark.jl

I particularly made it for IntelVectorMath (VML)

If you want we can switch the benchmarking system to AcuteBenchmark. It is very easy to use. It automatically generates random vectors based on the limits and the size given and then benchmarks and plots the result.

Other than the Acutebenchmark doc, there is a fully working example available here: https://github.com/JuliaMath/VML.jl/blob/AcuteBenchmark/benchmark/benchmark.jl

chriselrod · 2020-01-02T01:58:05Z

It could be nice to clean code up, and set up automated performance testing so I don't accidentally cause regressions.

Any chance you can support a way of defining GFLOPS as a function of size?

I'd like to be able to present performance in terms of GFLOPS (billion floating point operations per second).
Primarilly, that has the advantage of making comparisons across sizes more clear.

It can also give you a rough idea of how well the CPU is being utilized. For example, if a CPU with avx512 runs at 3.6GHz while performing avx512 operations (configurable in the bios):

julia> GHz = 3.6;

julia> fma_per_clock = 2;

julia> dflop_per_fma = 16;

julia> GHz * fma_per_clock * dflop_per_fma
115.2

which tells you that anything short of 115.2 is under-utilizing the CPU, and the question becomes framed in terms of just how short the code is falling?
Aside from (IMO) being more informative across sizes, it can also be more informative across functions.

aminya · 2020-01-05T14:45:00Z

@chriselrod There is this https://github.com/triscale-innov/GFlops.jl package. Does this do what you want? If so I can integrate it, into AcuteBenchmark

If not, do you have an example code of what you want?

chriselrod · 2020-01-06T07:25:57Z

Here is an example, although that function should be called gflop_gemm.
I defined gflops manually for each, so doing what I wanted would just require either passing an anonymous function that calculates GFLOPS as a function of size, or a vector of numbers you can divide by nanoseconds to get GFLOPS.

That said, your idea of using GFLops.jl would be much cooler and more convenient.
It (currently) won't work with @avx, and definitely not with the C or Fortran code. But it should really only be run for one thing per size anyway -- the number of floating point operations shouldn't be changing.
Perhaps the routine could be: (1) use GFLOPS.jl to count FLOPS of the first function, and then benchmark all functions?

aminya · 2020-04-17T00:05:14Z

GFlops.jl returns an incorrect result for the broadcasted functions.

What should we do? Can we calculate the gflops for a scalar call and then multiply it by the dimension?

I did it in https://github.com/aminya/AcuteBenchmark.jl/blob/a6b5c0a4591513d06af764bd38fd1558c946f797/src/benchmarks.jl#L197 for example

ffevotte · 2020-04-17T11:25:16Z

Perhaps the routine could be: (1) use GFLOPS.jl to count FLOPS of the first function, and then benchmark all functions?

That's actually what @gflops does: it first counts all ops, then benchmarks using BenchmarkTools. But you're right, in your case it would make much more sense to count ops for one function (and one data size), and reuse this count for all subsequent benchmarks.

I'm not sure what to do about vector operations. Would you like me to add explicit support for counting SIMD.jl and/or SIMDPirates.jl vector operations?

Until now, I did not do anything, mostly for lack of use cases that would guide the design of the API. But I'm open to any suggestion if you know what kind of features you'd like to see in GFlops' API.

ffevotte · 2020-04-17T11:34:58Z

GFlops.jl returns an incorrect result for the broadcasted functions.

What should we do? Can we calculate the gflops for a scalar call and then multiply it by the dimension?

Hopefully this should be fixed as soon as Cassette releases a new version (see JuliaLabs/Cassette.jl#171).

In the meantime, you could either use Julia 1.3 or locally Pkg.dev the current master of Cassette. In both cases, this would allow you to develop new features in AcuteBenchmark without hitting this issue. (But you won't be able to release the feature before Cassette releases a new patch, and GFlops bumps its compatibility requirements)

chriselrod · 2020-04-19T15:00:39Z

That's actually what @gflops does: it first counts all ops, then benchmarks using BenchmarkTools. But you're right, in your case it would make much more sense to count ops for one function (and one data size), and reuse this count for all subsequent benchmarks.

Okay, that sounds great.
This would also be required for the C and Fortran benchmarks.

I'm not sure what to do about vector operations. Would you like me to add explicit support for counting SIMD.jl and/or SIMDPirates.jl vector operations?

I would personally find SIMDPirates and LoopVectorization support very convenient!
How do you handle special functions? I'm not sure to what the standard practice is with respect to them / what meaning "floating point operations" have.

aminya mentioned this issue Jan 20, 2020

GFlops plot aminya/AcuteBenchmark.jl#2

Closed

aminya mentioned this issue Apr 16, 2020

Wrong result with broadcasted function triscale-innov/GFlops.jl#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using AcuteBenchmark system #5

Using AcuteBenchmark system #5

aminya commented Jan 1, 2020 •

edited

Loading

chriselrod commented Jan 2, 2020

aminya commented Jan 5, 2020 •

edited

Loading

chriselrod commented Jan 6, 2020 •

edited

Loading

aminya commented Apr 17, 2020 •

edited

Loading

ffevotte commented Apr 17, 2020

ffevotte commented Apr 17, 2020

chriselrod commented Apr 19, 2020

Using AcuteBenchmark system #5

Using AcuteBenchmark system #5

Comments

aminya commented Jan 1, 2020 • edited Loading

chriselrod commented Jan 2, 2020

aminya commented Jan 5, 2020 • edited Loading

chriselrod commented Jan 6, 2020 • edited Loading

aminya commented Apr 17, 2020 • edited Loading

ffevotte commented Apr 17, 2020

ffevotte commented Apr 17, 2020

chriselrod commented Apr 19, 2020

aminya commented Jan 1, 2020 •

edited

Loading

aminya commented Jan 5, 2020 •

edited

Loading

chriselrod commented Jan 6, 2020 •

edited

Loading

aminya commented Apr 17, 2020 •

edited

Loading