Fix KV chunking for POD. #1054

AKKamath · 2025-05-13T01:19:40Z

For some reason cudaOccupancyMaxActiveBlocksPerMultiprocessor returns 0, so manually calculate the value instead.

…PerMultiprocessor returns 0, so manually calculate the value instead.

Edenzzzz · 2025-05-13T01:32:50Z

Confirmed that this combined with setting prefill bs to 1 does make the kernel faster. (H100)

yzh119 · 2025-05-13T03:50:45Z

Confirmed that this combined with setting prefill bs to 1 does make the kernel faster.

Which is the GPU architecture you are testing on?

yzh119 · 2025-05-13T05:18:30Z

include/flashinfer/attention/pod.cuh

                cudaDeviceGetAttribute(&num_sm, cudaDevAttrMultiProcessorCount, dev_id));
-            FLASHINFER_CUDA_CALL(cudaOccupancyMaxActiveBlocksPerMultiprocessor(
-                &num_blocks_per_sm, kernel, num_threads_p, smem_size_p));
+            // FLASHINFER_CUDA_CALL(cudaOccupancyMaxActiveBlocksPerMultiprocessor(


It's interesting to me, and likely a bug of cudaOccupancyMaxActiveBlocksPerMultiprocessor.
Let's merge this first, thanks for the contribution!

cudaOccupancyMaxActiveBlocksPerMultiprocessor is buggy on both A100 40G and H100 for me

Edenzzzz · 2025-05-13T11:05:07Z

Which is the GPU architecture you are testing on?

The one above was H100, this one is A100 40G

yzh119 · 2025-05-13T12:11:24Z

@Edenzzzz there might be some problem with bandwidth measure because they exceed hardware limit (for H100, the maximum bandwidth is 3352 GB/s).

Edenzzzz · 2025-05-13T14:32:58Z

Yeah, it's not obvious to me why. Will try NCU

Edenzzzz · 2025-05-14T16:40:09Z

@yzh119 The kernel timing is correct according to nsys

Bandwidth not so much

AKKamath · 2025-05-14T17:05:39Z

Huh, this is leading me to believe there's still some bug in POD. Looking into it.

AKKamath · 2025-05-14T20:24:42Z

There was a bug: #1059

Fix KV chunking for POD. For some reason cudaOccupancyMaxActiveBlocks…

86b6fc7

…PerMultiprocessor returns 0, so manually calculate the value instead.

AKKamath mentioned this pull request May 13, 2025

Low performance of POD Attention compared to BatchPrefillWithPagedKVCache #1022

Open

Run pre-commit and fix styling

dfc9af1

yzh119 approved these changes May 13, 2025

View reviewed changes

yzh119 merged commit 25fb405 into flashinfer-ai:main May 13, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix KV chunking for POD. #1054

Fix KV chunking for POD. #1054

Uh oh!

AKKamath commented May 13, 2025

Uh oh!

Edenzzzz commented May 13, 2025 •

edited

Loading

Uh oh!

yzh119 commented May 13, 2025

Uh oh!

yzh119 May 13, 2025

Uh oh!

Edenzzzz May 13, 2025

Uh oh!

Uh oh!

Edenzzzz commented May 13, 2025

Uh oh!

yzh119 commented May 13, 2025

Uh oh!

Edenzzzz commented May 13, 2025 •

edited

Loading

Uh oh!

Edenzzzz commented May 14, 2025

Uh oh!

AKKamath commented May 14, 2025

Uh oh!

AKKamath commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix KV chunking for POD. #1054

Fix KV chunking for POD. #1054

Uh oh!

Conversation

AKKamath commented May 13, 2025

Uh oh!

Edenzzzz commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yzh119 commented May 13, 2025

Uh oh!

yzh119 May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Edenzzzz May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Edenzzzz commented May 13, 2025

Uh oh!

yzh119 commented May 13, 2025

Uh oh!

Edenzzzz commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Edenzzzz commented May 14, 2025

Uh oh!

AKKamath commented May 14, 2025

Uh oh!

AKKamath commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Edenzzzz commented May 13, 2025 •

edited

Loading

Edenzzzz commented May 13, 2025 •

edited

Loading