Skip to content

Commit d18a342

Browse files
authored
Add Sparsify overhead benchmark (#3021)
* Summary: This PR adds sparsify overhead benchmark, omitted in ICLR workshop paper: https://arxiv.org/abs/2503.16672 In the paper, there are two parts for the benchmark: 1) Sparsify operation overhead, 2) Sparse-GEMM kernel performance. Part 1) was omitted from the original benchmark, so this PR adds the missing sparsify-only benchmark comparing `torchao.sparse24_sm90_sparsify` against `torch._cslt_compress` (cuSPASRELt) baseline. Test plan: CI * remove lambda, scale for fair comparison * rename attributes to prevent duplicate naming
1 parent 6c1f503 commit d18a342

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

benchmarks/benchmark_e2e_fp8_sparse_linear.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,18 @@ def benchmark(num_tokens, hidden_size=8192, intermediate_size=8192):
4040
input_tensor = torch.randn(num_tokens, hidden_size).to(torch.bfloat16).cuda()
4141
fp16_time = benchmark_microseconds(ffn_ref, input_tensor)
4242

43+
# Sparsify-only benchmarks
44+
ao_fast_sparsification_time = benchmark_microseconds(
45+
torch.ops.torchao.sparse24_sm90_sparsify(
46+
input_tensor,
47+
"cutlass",
48+
"identity",
49+
"largest",
50+
dtype=torch.float8_e4m3fn,
51+
)
52+
)
53+
cusparselt_time = benchmark_microseconds(torch._cslt_compress, input_tensor)
54+
4355
# bf16
4456
ffn_clone = (
4557
nn.Sequential(
@@ -117,7 +129,10 @@ def benchmark(num_tokens, hidden_size=8192, intermediate_size=8192):
117129
"fp8_c_time (us)": fp8_c_time,
118130
"fp8_c_sparse_time (us)": fp8_c_sparse_time,
119131
"fp8_c_activation_sparse_time (us)": fp8_c_activation_sparse_time,
132+
"ao_fast_sparsification_time (us)": ao_fast_sparsification_time,
133+
"cusparselt_compress_time (us)": cusparselt_time,
120134
"speedup": fp8_c_time / fp8_c_activation_sparse_time,
135+
"sparsify_speedup": cusparselt_time / ao_fast_sparsification_time,
121136
}
122137

123138

0 commit comments

Comments
 (0)