Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure of BenchmarkTools #19

Open
ArrogantGao opened this issue Sep 28, 2023 · 3 comments
Open

Failure of BenchmarkTools #19

ArrogantGao opened this issue Sep 28, 2023 · 3 comments

Comments

@ArrogantGao
Copy link
Collaborator

ArrogantGao commented Sep 28, 2023

BenchmarkTools are not working correctly:

julia> using TropicalNumbers, CUDA, BenchmarkTools, LinearAlgebra, CuTropicalGEMM

julia> a = Tropical.(CUDA.randn(4096, 4096));

julia> @btime $a * $a;
  3.375 μs (7 allocations: 256 bytes)

julia> @benchmark $a * $a
BenchmarkTools.Trial: 158 samples with 8 evaluations.
 Range (min  max):   3.554 μs     1.733 s  ┊ GC (min  max): 0.00%  0.07%
 Time  (median):      3.976 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.475 ms ± 137.779 ms  ┊ GC (mean ± σ):  0.06% ± 0.01%

  █                                                          ▄
  █▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▄
  3.55 μs       Histogram: log(frequency) by time      13.5 ms <

 Memory estimate: 256 bytes, allocs estimate: 7.

Comparing to results directly from the C-CUDA tests, the result of @ benchmark is correct.

@GiggleLiu
Copy link
Member

GiggleLiu commented Sep 28, 2023

Is it possible that the issue is related to CuStream? Because CUDA.synchronize() function takes CuStream as the input argument.
Maybe you need to pass CUDA.jl CuStream to the C code.

In your current code, the stream id seems to be the default value 0.

@GiggleLiu
Copy link
Member

julia> @time (mul!(b, a, a); CUDA.synchronize(CUDA.context()))
  0.093036 seconds (2 allocations: 48 bytes)

This one works. Using the CuContext as the input will synchronize all streams in the context, which is a heavy API.
It would be great if the stream id can be an input.

@GiggleLiu
Copy link
Member

GiggleLiu commented Oct 14, 2023

The current benchmark in the RAEDME does not look good. When we read benchmark, we always read the min-time, because it reflects the true performance.

julia> using CuTropicalGEMM

julia> @benchmark CUDA.@sync $a * $a
BenchmarkTools.Trial: 93 samples with 4 evaluations.
 Range (min  max):   6.653 μs  158.961 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     13.535 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.499 ms ±  15.867 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                             █  
  ▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  6.65 μs         Histogram: frequency by time         13.5 ms <

 Memory estimate: 256 bytes, allocs estimate: 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants