Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 48 additions & 5 deletions .github/workflows/pr-perfbench-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:
- created

permissions:
contents: read
contents: write

concurrency:
group: "${{ github.workflow }}-${{ github.ref }}"
Expand All @@ -16,7 +16,9 @@ env:
PYTHONDEVMODE: "1"
PYTHONUNBUFFERED: "1"
PYTHONPATH: "" # explicit cleanup
PIP_USER: "" # explicit cleanup
PIP_USER: "0"
PIP_NO_USER: "1"
PIP_DISABLE_PIP_VERSION_CHECK: "1"
COLUMNS: "100"
FORCE_COLOR: "1"
CLICOLOR_FORCE: "1"
Expand Down Expand Up @@ -72,17 +74,58 @@ jobs:
run: |
source tl/bin/activate
python maint/scripts/ci_performance.py
- name: Read markdown table
id: read_md
run: |
echo "content<<EOF" >> $GITHUB_OUTPUT
cat bench.md >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Upload PNG to GitHub and get URL
Comment on lines +77 to +83
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Image upload step will not produce a valid embeddable URL

There are a couple of correctness issues in the new PNG upload/comment flow that will break the embedded image:

  1. Filename mismatch between tree entry and URL (definite bug).

    • Tree entry path: perf_plot_${context.runId}.png (Line 101).
    • Raw URL path: bench_${context.runId}.png (Line 108).
      The URL points to a file that does not exist in the created tree, so the image link in the PR comment will 404.
  2. Insufficient token permissions for git APIs.

    • Top-level workflow permissions set contents: read only, but git.createBlob / git.createTree need write-level repo contents permissions. As written, the upload step will likely fail with a 403.
  3. ci_performance.py is executed from main, not from the PR merge commit.

    • You checkout refs/pull/.../merge, then install the merged version, but later git checkout main and never switch back before running python maint/scripts/ci_performance.py. That means the script version used at runtime is the one from main, not the PR. The new bench.md/bench.png generation introduced in this PR won’t be exercised for open-PR /perf runs.

Suggested fixes

  • Make the tree path and URL consistent (choose one naming scheme). For example:
-            const tree = await github.rest.git.createTree({
+            const tree = await github.rest.git.createTree({
               owner: context.repo.owner,
               repo: context.repo.repo,
               tree: [{
-                path: `perf_plot_${context.runId}.png`,
+                path: `bench_${context.runId}.png`,
                 mode: '100644',
                 type: 'blob',
                 sha: blob.data.sha
               }]
             });
-            const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
+            const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
  • Relax or adjust workflow permissions so the job can create blobs/trees, e.g.:
 permissions:
-  contents: read
+  contents: write
  • Ensure ci_performance.py runs from the PR merge commit so the new bench.md/bench.png logic is used. For instance:
    • Move git checkout main earlier and then re-checkout refs/pull/.../merge before the “Run performance test” step, or
    • Run the script with ./tll/bin/python and import it from the installed package instead of the working tree, or
    • After “Install original version”, explicitly git checkout refs/pull/${{ github.event.issue.number }}/merge before step “Run performance test”.

Also applies to: 81-109, 116-128


🏁 Script executed:

#!/bin/bash
# Locate and display the workflow file with line numbers
if [ -f ".github/workflows/pr-perfbench-bot.yml" ]; then
  echo "=== Workflow file found ==="
  cat -n ".github/workflows/pr-perfbench-bot.yml" | head -150
else
  echo "Workflow file not found at expected path"
  find . -name "pr-perfbench-bot.yml" -type f 2>/dev/null
fi

Length of output: 5204


🏁 Script executed:

#!/bin/bash
# Search for git.createBlob and git.createTree calls to understand the upload step
rg -n "createBlob|createTree|perf_plot|bench_\$\{context" ".github/workflows/" -A 2 -B 2

Length of output: 1189


🏁 Script executed:

#!/bin/bash
# Check for permissions block and git checkout sequences
rg -n "permissions:|git checkout|ci_performance.py" ".github/workflows/pr-perfbench-bot.yml" -B 1 -A 1

Length of output: 273


Image upload step will not produce a valid embeddable URL and will fail with permissions error

All three correctness issues in the new PNG upload/comment flow are confirmed and must be fixed:

  1. Filename mismatch between tree entry and URL (confirmed bug).

    • Line 101: Tree entry path uses perf_plot_${context.runId}.png
    • Line 108: Raw URL references bench_${context.runId}.png
    • The URL points to a file that does not exist in the created tree, causing a 404 on the embedded image.
  2. Insufficient token permissions for git APIs (confirmed bug).

    • Line 8-9: Workflow permissions set contents: read only
    • Lines 90 and 97: Calls to github.rest.git.createBlob and github.rest.git.createTree require write-level permissions
    • The upload step will fail with a 403 error.
  3. ci_performance.py executes from main, not the PR merge commit (confirmed bug).

    • Line 64: git checkout main switches to main branch
    • Lines 73-74: Script runs from the main branch working directory via the tl venv
    • The new bench.md/bench.png generation logic introduced in this PR will not be exercised for /perf runs.

Required fixes:

  • Make the tree path and URL consistent (both should use bench_ prefix):

    -                path: `perf_plot_${context.runId}.png`,
    +                path: `bench_${context.runId}.png`,
  • Upgrade workflow permissions to allow blob/tree creation:

     permissions:
    -  contents: read
    +  contents: write
  • Ensure ci_performance.py runs from the PR merge commit: after line 68, add git checkout refs/pull/${{ github.event.issue.number }}/merge before the "Run performance test" step.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
.github/workflows/pr-perfbench-bot.yml lines 75-81 and surrounding steps: the
image upload flow has three confirmed issues — the tree entry filename and the
constructed raw URL are inconsistent (use perf_plot_ vs bench_), the workflow
permissions only grant contents: read while createBlob/createTree need write,
and the perf script is run from main instead of the PR merge commit; to fix,
make the filename used when creating the blob/tree and the URL construction use
the same bench_${{ github.run_id }} (or context.runId) prefix, update workflow
permissions to grant contents: write for the job (or at least for the step that
calls git APIs), and before the "Run performance test" step add a checkout to
refs/pull/${{ github.event.issue.number }}/merge so ci_performance.py runs
against the PR merge commit.

id: upload_png
uses: actions/github-script@v8
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const content = fs.readFileSync('bench.png').toString('base64');
// Create blob in the repo
const blob = await github.rest.git.createBlob({
owner: context.repo.owner,
repo: context.repo.repo,
content: content,
encoding: "base64",
});
// Attach blob as a tree item
const tree = await github.rest.git.createTree({
owner: context.repo.owner,
repo: context.repo.repo,
tree: [{
path: `bench_${context.runId}.png`,
mode: '100644',
type: 'blob',
sha: blob.data.sha
}]
});
// Raw file URL (works for embedding image)
const url = `https://raw.githubusercontent.com/${context.repo.owner}/${context.repo.repo}/${tree.data.sha}/bench_${context.runId}.png`
core.setOutput("url", url);

- name: Post test results as PR comment
uses: actions/github-script@v8
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const md = `${{ steps.read_md.outputs.content }}`;
const img = `${{ steps.upload_png.outputs.url }}`;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: '📊 ​**Performance Test Results** (triggered by @' + context.payload.comment.user.login + '):\n\n' +
'Run listed here: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}\n\n' +
"${{ steps.perfbench.outputs.stdout }}"
body:
'📊 **Performance Test Results** (triggered by @' +
context.payload.comment.user.login + ')\n\n' +
'Run: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}\n\n' +
md +
'\n\n📈 **Speedup Plot:**\n\n' +
`![Speedup Plot](${img})`
})
15 changes: 15 additions & 0 deletions examples/analyze/bench_example_analyze.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_conv_analyze
import example_gemm_analyze


def bench_example_gemm_analyze():
tilelang.tools.bench.process_func(example_gemm_analyze.main)


def bench_example_conv_analyze():
tilelang.tools.bench.process_func(example_conv_analyze.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
52 changes: 52 additions & 0 deletions examples/attention_sink/bench_example_attention_sink.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import tilelang.tools.bench
import example_gqa_sink_bwd_bhsd
import example_gqa_sink_fwd_bhsd_wgmma_pipelined
import example_mha_sink_bwd_bhsd
import example_mha_sink_fwd_bhsd
import example_mha_sink_fwd_bhsd_wgmma_pipelined


def bench_example_mha_sink_fwd_bhsd():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd.main)


def bench_example_mha_sink_fwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd.main, window_size=128)


def bench_example_mha_sink_fwd_bhsd_wgmma_pipelined():
tilelang.tools.bench.process_func(example_mha_sink_fwd_bhsd_wgmma_pipelined.main)


def bench_example_mha_sink_fwd_bhsd_wgmma_pipelined_sliding_window():
tilelang.tools.bench.process_func(
example_mha_sink_fwd_bhsd_wgmma_pipelined.main, window_size=128)


def bench_example_gqa_sink_fwd_bhsd_wgmma_pipelined():
tilelang.tools.bench.process_func(example_gqa_sink_fwd_bhsd_wgmma_pipelined.main)


def bench_example_gqa_sink_fwd_bhsd_wgmma_pipelined_sliding_window():
tilelang.tools.bench.process_func(
example_gqa_sink_fwd_bhsd_wgmma_pipelined.main, window_size=128)


def bench_example_mha_sink_bwd_bhsd():
tilelang.tools.bench.process_func(example_mha_sink_bwd_bhsd.main)


def bench_example_mha_sink_bwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_mha_sink_bwd_bhsd.main, window_size=128)


def bench_example_gqa_sink_bwd_bhsd():
tilelang.tools.bench.process_func(example_gqa_sink_bwd_bhsd.main)


def bench_example_gqa_sink_bwd_bhsd_sliding_window():
tilelang.tools.bench.process_func(example_gqa_sink_bwd_bhsd.main, window_size=128)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import tilelang.tools.bench
import block_sparse_attn_triton
import example_tilelang_block_sparse_attn
import example_tilelang_sparse_gqa_decode_varlen_indice
import example_tilelang_sparse_gqa_decode_varlen_mask
import example_triton_sparse_gqa_decode_varlen_indice
import example_triton_sparse_gqa_decode_varlen_mask


def bench_block_sparse_attn_triton():
tilelang.tools.bench.process_func(block_sparse_attn_triton.main)


def bench_example_tilelang_block_sparse_attn():
tilelang.tools.bench.process_func(example_tilelang_block_sparse_attn.main)


def bench_example_tilelang_sparse_gqa_decode_varlen_indice():
tilelang.tools.bench.process_func(
example_tilelang_sparse_gqa_decode_varlen_indice.main, batch=1, max_cache_seqlen=2048)


def bench_example_tilelang_sparse_gqa_decode_varlen_mask():
tilelang.tools.bench.process_func(
example_tilelang_sparse_gqa_decode_varlen_mask.main, batch=1, max_cache_seqlen=2048)


def bench_example_triton_sparse_gqa_decode_varlen_indice():
tilelang.tools.bench.process_func(
example_triton_sparse_gqa_decode_varlen_indice.main,
batch=8,
heads=8,
heads_kv=4,
max_cache_seqlen=2048,
dim=128,
dim_v=128,
sparse_ratio=0.8,
block_size=32)


def bench_example_triton_sparse_gqa_decode_varlen_mask():
tilelang.tools.bench.process_func(
example_triton_sparse_gqa_decode_varlen_mask.main,
batch=8,
heads=8,
heads_kv=4,
max_cache_seqlen=2048,
dim=128,
dim_v=128,
sparse_ratio=0.8,
block_size=32)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/blocksparse_gemm/bench_example_blocksparse_gemm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_blocksparse_gemm


def bench_example_blocksparse_gemm():
tilelang.tools.bench.process_func(example_blocksparse_gemm.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
21 changes: 21 additions & 0 deletions examples/cast/bench_example_cast.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import tilelang.tools.bench
import example_group_per_split_token_cast_to_fp8
import example_per_token_cast_to_fp8


def bench_example_group_per_split_token_cast_to_fp8():
tilelang.tools.bench.process_func(
example_group_per_split_token_cast_to_fp8.main,
M=1024,
N=1024,
BG=2,
blk_m=4,
batch_sizes=[128, 896])


def bench_example_per_token_cast_to_fp8():
tilelang.tools.bench.process_func(example_per_token_cast_to_fp8.main, M=2048, N=512, blk_m=8)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
15 changes: 15 additions & 0 deletions examples/convolution/bench_example_convolution.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_convolution
import example_convolution_autotune


def bench_example_convolution():
tilelang.tools.bench.process_func(example_convolution.main)


def bench_example_convolution_autotune():
tilelang.tools.bench.process_func(example_convolution_autotune.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_deepgemm_fp8_2xAcc
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing import for decorator module.

Lines 5-6 use @tilelang.testing.requires_cuda and @tilelang.testing.requires_cuda_compute_version_eq decorators, but tilelang.testing is not imported.

Add the missing import:

 import tilelang.tools.bench
+import tilelang.testing
 import example_deepgemm_fp8_2xAcc
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import tilelang.tools.bench
import example_deepgemm_fp8_2xAcc
import tilelang.tools.bench
import tilelang.testing
import example_deepgemm_fp8_2xAcc
🤖 Prompt for AI Agents
In examples/deepseek_deepgemm/bench_example_deepgemm_fp8_2xAcc.py around lines 1
to 6, the decorators @tilelang.testing.requires_cuda and
@tilelang.testing.requires_cuda_compute_version_eq are used but tilelang.testing
is not imported; add an import for the testing module (e.g., import
tilelang.testing or from tilelang import testing) near the top of the file
before the decorator usage so the decorators resolve correctly.



def bench_example_deepgemm_fp8_2xAcc():
tilelang.tools.bench.process_func(example_deepgemm_fp8_2xAcc.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
10 changes: 10 additions & 0 deletions examples/deepseek_mla/bench_example_mla_decode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import tilelang.tools.bench
import example_mla_decode
Comment on lines +1 to +2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Missing import for decorators.

The decorators on lines 5-6 reference tilelang.testing but the module is not imported, which will cause a NameError when the module loads.

Apply this diff to add the missing import:

 import tilelang.tools.bench
+import tilelang.testing
 import example_mla_decode
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import tilelang.tools.bench
import example_mla_decode
import tilelang.tools.bench
import tilelang.testing
import example_mla_decode
🤖 Prompt for AI Agents
In examples/deepseek_mla/bench_example_mla_decode.py around lines 1 to 2, the
decorators later reference tilelang.testing but the module isn't imported,
causing a NameError; add an import for the testing module (e.g., import
tilelang.testing) at the top of the file so the decorators can resolve
correctly.



def bench_example_mla_decode():
tilelang.tools.bench.process_func(example_mla_decode.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
15 changes: 15 additions & 0 deletions examples/deepseek_nsa/bench_example_tilelang_nsa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import tilelang.tools.bench
import example_tilelang_nsa_fwd
import example_tilelang_nsa_decode
Comment on lines +2 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that the example modules are importable.

The relative imports assume the example modules are in the same directory or on PYTHONPATH. Ensure that these modules exist and can be imported when the script is run.

Run the following script to verify the modules exist:


🏁 Script executed:

#!/bin/bash
# Description: Verify the example modules exist in the expected location

# Check if the example modules exist
fd -t f "example_tilelang_nsa_fwd.py" examples/deepseek_nsa/
fd -t f "example_tilelang_nsa_decode.py" examples/deepseek_nsa/

Length of output: 102



---

🏁 Script executed:

```shell
#!/bin/bash
# Check package structure and verify imports
fd -t f "__init__.py" examples/deepseek_nsa/
fd -t f "__init__.py" examples/

# Check the actual bench_example file structure
head -20 examples/deepseek_nsa/bench_example_tilelang_nsa.py

Length of output: 400



---

🏁 Script executed:

```shell
#!/bin/bash
# Check for setup files or documentation about running examples
fd -t f "setup.py|pyproject.toml|README|run" examples/deepseek_nsa/ -o examples/

# Check if there's any sys.path manipulation in the script or parent modules
rg -n "sys.path" examples/deepseek_nsa/

Length of output: 137



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for setup/documentation files
find examples/deepseek_nsa -type f -name "*.py" | head -20
find examples -maxdepth 2 -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "README*" \)

# Check for sys.path manipulation
rg -n "sys.path" examples/

Length of output: 2643



---

🏁 Script executed:

```shell
#!/bin/bash
# Check how test_example_tilelang_nsa.py handles imports
head -10 examples/deepseek_nsa/test_example_tilelang_nsa.py

# Also check if any other benchmark files exist and how they handle imports
find examples/deepseek_nsa -name "bench_*.py" -type f
head -10 examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py 2>/dev/null || echo "File not found or error"

Length of output: 464


Fix bare imports to use relative syntax or add sys.path setup.

Lines 2-3 use bare absolute imports that will fail at runtime unless examples/deepseek_nsa/ is on sys.path. Since this directory is not a package (no __init__.py), use relative imports instead:

from . import example_tilelang_nsa_fwd
from . import example_tilelang_nsa_decode

Alternatively, add to the top of the file:

import sys
import os
sys.path.insert(0, os.path.dirname(__file__))

The test file (test_example_tilelang_nsa.py) in the same directory has the same issue.

🤖 Prompt for AI Agents
In examples/deepseek_nsa/bench_example_tilelang_nsa.py around lines 2-3 the bare
absolute imports will fail because the directory is not on sys.path and isn’t a
package; change to relative imports (from . import example_tilelang_nsa_fwd and
from . import example_tilelang_nsa_decode) or, if you prefer module-style
imports, add at the top: import sys, os and sys.path.insert(0,
os.path.dirname(__file__)) so the local modules can be found (apply the same fix
to test_example_tilelang_nsa.py).



def bench_example_tilelang_nsa_fwd():
tilelang.tools.bench.process_func(example_tilelang_nsa_fwd.main)


def bench_example_tilelang_nsa_fwd_decode():
tilelang.tools.bench.process_func(example_tilelang_nsa_decode.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
64 changes: 64 additions & 0 deletions examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import tilelang.tools.bench
import fp8_lighting_indexer
import sparse_mla_bwd
import sparse_mla_fwd
import sparse_mla_fwd_pipelined
import topk_selector
Comment on lines +1 to +6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing import for decorator module.

Lines 17-18, 23-24, and 29-30 use @tilelang.testing decorators, but tilelang.testing is not imported.

Add the missing import:

 import tilelang.tools.bench
+import tilelang.testing
 import fp8_lighting_indexer
 import sparse_mla_bwd
 import sparse_mla_fwd
 import sparse_mla_fwd_pipelined
 import topk_selector
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import tilelang.tools.bench
import fp8_lighting_indexer
import sparse_mla_bwd
import sparse_mla_fwd
import sparse_mla_fwd_pipelined
import topk_selector
import tilelang.tools.bench
import tilelang.testing
import fp8_lighting_indexer
import sparse_mla_bwd
import sparse_mla_fwd
import sparse_mla_fwd_pipelined
import topk_selector
🤖 Prompt for AI Agents
In examples/deepseek_v32/bench_tilelang_example_deepseek_v32.py around lines 1-6
(decorators used at lines ~17-30), the module providing the @tilelang.testing
decorator isn't imported; add an import for the testing module (e.g., import
tilelang.testing) near the other top-level imports so the @tilelang.testing
decorators resolve.



def bench_topk_selector():
tilelang.tools.bench.process_func(topk_selector.test_topk_selector)


def bench_fp8_lighting_indexer():
tilelang.tools.bench.process_func(
fp8_lighting_indexer.test_fp8_lighting_indexer,
S=512,
SKV=1024,
H=32,
HKV=1,
D=64,
kv_stride=1)


def bench_sparse_mla_fwd():
tilelang.tools.bench.process_func(
sparse_mla_fwd.test_sparse_mla_fwd,
S=256,
SKV=1024,
H=64,
HKV=1,
DQK=576,
DV=512,
topk=256,
check_correctness=False)


def bench_sparse_mla_fwd_pipelined():
tilelang.tools.bench.process_func(
sparse_mla_fwd_pipelined.test_sparse_mla_fwd_pipelined,
S=256,
SKV=512,
H=64,
HKV=1,
DQK=576,
DV=512,
topk=256,
check_correctness=False)


def bench_sparse_mla_bwd():
tilelang.tools.bench.process_func(
sparse_mla_bwd.test_sparse_mla_bwd,
S=256,
SKV=512,
H=64,
HKV=1,
DQKV=576,
DV=512,
topk=256,
check_correctness=False)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
35 changes: 35 additions & 0 deletions examples/dequantize_gemm/bench_example_dequantize_gemm.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import tilelang.tools.bench
import example_dequant_gemm_bf16_mxfp4_hopper
import example_dequant_gemm_bf16_mxfp4_hopper_tma
import example_dequant_gemm_fp4_hopper
import example_dequant_gemm_w4a8
import example_dequant_gemv_fp16xint4
import example_dequant_groupedgemm_bf16_mxfp4_hopper


def bench_example_dequant_gemv_fp16xint4():
tilelang.tools.bench.process_func(example_dequant_gemv_fp16xint4.main)


def bench_example_dequant_gemm_fp4_hopper():
tilelang.tools.bench.process_func(example_dequant_gemm_fp4_hopper.main)


def bench_example_dequant_gemm_bf16_mxfp4_hopper():
tilelang.tools.bench.process_func(example_dequant_gemm_bf16_mxfp4_hopper.main)


def bench_example_dequant_gemm_bf16_mxfp4_hopper_tma():
tilelang.tools.bench.process_func(example_dequant_gemm_bf16_mxfp4_hopper_tma.main)


def bench_example_dequant_groupedgemm_bf16_mxfp4_hopper():
tilelang.tools.bench.process_func(example_dequant_groupedgemm_bf16_mxfp4_hopper.main)


def bench_example_dequant_gemm_w4a8():
tilelang.tools.bench.process_func(example_dequant_gemm_w4a8.main)


if globals().get("__name__") == "__main__":
tilelang.tools.bench.main()
Loading
Loading