perf(CuBLAS): explore reduction in launch overhead via CUDA graphs #1192

jon-chuang · 2023-04-26T17:19:37Z

See https://developer.nvidia.com/blog/cuda-graphs/ for reference.

One can take one of two approaches:

Within operator.
Spanning multiple operators (operator fusion)

jon-chuang · 2023-04-26T17:32:00Z

Any thoughts @slaren ?

jon-chuang · 2023-04-26T17:45:37Z

Looking at #1129 (comment)

It seems that inter-operator fusion is required.

This means we need a concept of a device tensor. Looks like we are slowly reimplementing PyTorch...

slaren · 2023-04-26T18:06:51Z

I don't think that we launch enough kernels for this to make a meaningful difference.

dfyz · 2023-04-27T03:24:38Z

Using CUDA graphs would make sense if the duration of our kernels were comparable with the launch overhead (a couple of microseconds). As far as I understand, we intentionally use GPU only for large GEMMs that take at least a couple of milliseconds.

github-actions · 2024-04-09T01:09:49Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

jon-chuang mentioned this issue Apr 26, 2023

Improve cuBLAS performance by dequantizing on the GPU #1065

Merged

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(CuBLAS): explore reduction in launch overhead via CUDA graphs #1192

perf(CuBLAS): explore reduction in launch overhead via CUDA graphs #1192

jon-chuang commented Apr 26, 2023

jon-chuang commented Apr 26, 2023

jon-chuang commented Apr 26, 2023

slaren commented Apr 26, 2023

dfyz commented Apr 27, 2023

github-actions bot commented Apr 9, 2024

perf(CuBLAS): explore reduction in launch overhead via CUDA graphs #1192

perf(CuBLAS): explore reduction in launch overhead via CUDA graphs #1192

Comments

jon-chuang commented Apr 26, 2023

jon-chuang commented Apr 26, 2023

jon-chuang commented Apr 26, 2023

slaren commented Apr 26, 2023

dfyz commented Apr 27, 2023

github-actions bot commented Apr 9, 2024