-
Notifications
You must be signed in to change notification settings - Fork 333
[Lint] Enable pyupgrade linter in ruff #963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughComprehensive type annotation modernization across the entire tilelang codebase, replacing legacy typing constructs with Python 3.10+ syntax (PEP 604 unions, built-in generics), adding postponed evaluation, and removing explicit Changes
Sequence Diagram(s)sequenceDiagram
participant code as Python Code
participant parser as Type Parser<br/>(Future Annotations)
participant runtime as Runtime
note over code: Before<br/>from typing import List, Optional<br/>def foo(x: Optional[List[int]]) → List[str]:
code->>parser: Parse with future annotations enabled
parser->>parser: Postpone evaluation
note over parser: PEP 563 deferment<br/>Annotations stored as strings initially
code->>runtime: __annotations__ resolved on access
runtime->>runtime: Modern syntax: int | None<br/>list[str] recognized natively
note over runtime: Python 3.10+ built-ins<br/>leverage __class_getitem__
note over code: After<br/>from __future__ import annotations<br/>def foo(x: int | list[int] | None) → list[str]:
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The changes are largely homogeneous—systematic application of the same type-modernization pattern across 80+ files. However, the scale, combined with a handful of behavioral modifications scattered throughout (runtime gating in Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5b9312f to
0973611
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/jit/adapter/utils.py (1)
78-81: Guardattrs["global_symbol"]access on PrimFunc
Accessingfunc_or_mod.attrs["global_symbol"]may KeyError if the attribute isn’t set (e.g. PrimFunc created in parser). Useattrs.get("global_symbol")with a clear fallback or assert its presence before indexing.
🧹 Nitpick comments (2)
tilelang/carver/roller/shape_inference/tir.py (1)
354-354: LGTM: Modernized string formatting.The f-string is clearer and more readable than the old
%formatting style.For completeness, the static analysis tool suggests considering
TypeErrorfor invalid type errors and potentially creating a custom exception class, but these are minor style improvements that can be deferred.tilelang/carver/roller/node.py (1)
304-304: LGTM: Simplified decorator syntax.The simplified
@functools.lru_cache(without parentheses) is valid and cleaner in Python 3.8+.Note: Static analysis warns that using
lru_cacheon instance methods can lead to memory leaks because the cache holds references toself, preventing garbage collection. This is an existing pattern in the codebase and not introduced by this change, but consider whether these methods truly need caching on instance methods or if the cache should be cleared when instances are no longer needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (72)
docs/conf.py(1 hunks)examples/amd/example_amd_flash_attn_bwd.py(2 hunks)examples/attention_sink/example_gqa_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/attention_sink/example_mha_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd.py(2 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/bitnet-1.58b/configuration_bitnet.py(0 hunks)examples/bitnet-1.58b/eval_ppl.py(1 hunks)examples/bitnet-1.58b/maint/create_bitblas_ckpt.py(1 hunks)examples/bitnet-1.58b/modeling_bitnet.py(1 hunks)examples/bitnet-1.58b/tokenization_bitnet.py(0 hunks)examples/bitnet-1.58b/utils_quant.py(1 hunks)examples/bitnet-1.58b/vllm_workspace/conftest.py(1 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py(2 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py(2 hunks)examples/cast/example_group_per_split_token_cast_to_fp8.py(1 hunks)examples/cast/example_per_token_cast_to_fp8.py(2 hunks)examples/deepseek_mla/example_mla_decode_paged.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py(1 hunks)examples/flash_attention/example_gqa_bwd.py(2 hunks)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_gqa_fwd_bshd.py(1 hunks)examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_bwd.py(2 hunks)examples/flash_attention/example_mha_bwd_bhsd.py(2 hunks)examples/flash_attention/example_mha_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_mha_fwd_bhsd.py(1 hunks)examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_decoding/example_gqa_decode.py(1 hunks)examples/flash_decoding/example_mha_inference.py(1 hunks)examples/grouped_gemm/example_grouped_gemm_bwd.py(1 hunks)examples/hadamard_transform/example_hadamard.py(1 hunks)examples/linear_attention/example_mamba_chunk_scan.py(1 hunks)examples/linear_attention/example_mamba_chunk_state.py(2 hunks)examples/minference/example_vertical_slash_sparse_attn.py(1 hunks)examples/norm/rms_norm.py(1 hunks)pyproject.toml(1 hunks)setup.py(6 hunks)testing/python/kernel/test_tilelang_kernel_gemm.py(1 hunks)testing/python/kernel/test_tilelang_kernel_gemm_simt.py(1 hunks)testing/python/language/test_tilelang_language_pipeline.py(1 hunks)tilelang/autotuner/param.py(3 hunks)tilelang/cache/kernel_cache.py(1 hunks)tilelang/carver/arch/cuda.py(1 hunks)tilelang/carver/arch/metal.py(1 hunks)tilelang/carver/roller/bestfit.py(1 hunks)tilelang/carver/roller/hint.py(1 hunks)tilelang/carver/roller/node.py(4 hunks)tilelang/carver/roller/rasterization.py(1 hunks)tilelang/carver/roller/shape_inference/common.py(2 hunks)tilelang/carver/roller/shape_inference/tir.py(3 hunks)tilelang/contrib/hipcc.py(1 hunks)tilelang/intrinsics/mfma_macro_generator.py(1 hunks)tilelang/intrinsics/mma_macro_generator.py(2 hunks)tilelang/intrinsics/wgmma_macro_generator.py(1 hunks)tilelang/jit/adapter/cython/adapter.py(3 hunks)tilelang/jit/adapter/libgen.py(1 hunks)tilelang/jit/adapter/utils.py(3 hunks)tilelang/jit/adapter/wrapper.py(6 hunks)tilelang/jit/kernel.py(1 hunks)tilelang/language/proxy.py(4 hunks)tilelang/quantize/lop3.py(1 hunks)tilelang/quantize/quantization.py(2 hunks)tilelang/tileop/gemm/gemm_base.py(1 hunks)tilelang/version.py(1 hunks)
💤 Files with no reviewable changes (2)
- examples/bitnet-1.58b/tokenization_bitnet.py
- examples/bitnet-1.58b/configuration_bitnet.py
🧰 Additional context used
🧬 Code graph analysis (25)
examples/minference/example_vertical_slash_sparse_attn.py (1)
tilelang/language/builtin.py (1)
mbarrier_wait_parity(172-219)
examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/carver/roller/shape_inference/common.py (1)
tilelang/carver/roller/shape_inference/tir.py (2)
Statement(7-43)InputShapeInference(169-318)
examples/flash_decoding/example_mha_inference.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/intrinsics/wgmma_macro_generator.py (1)
tilelang/utils/language.py (1)
is_fragment(68-78)
examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/attention_sink/example_mha_sink_bwd_bhsd.py (2)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (2)
maybe_contiguous(356-359)tl_bwd(484-485)
examples/linear_attention/example_mamba_chunk_state.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (4)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)examples/flash_attention/example_gqa_bwd.py (1)
maybe_contiguous(377-380)examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
maybe_contiguous(134-137)
examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_decoding/example_gqa_decode.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/linear_attention/example_mamba_chunk_scan.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/intrinsics/mma_macro_generator.py (2)
tilelang/intrinsics/wgmma_macro_generator.py (1)
TensorCoreIntrinEmitter(64-519)tilelang/utils/language.py (1)
is_fragment(68-78)
examples/flash_attention/example_mha_bwd_bhsd.py (1)
examples/flash_attention/example_mha_bwd.py (1)
maybe_contiguous(258-261)
examples/flash_attention/example_mha_fwd_bhsd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (3)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)
maybe_contiguous(364-367)examples/flash_attention/example_gqa_bwd.py (1)
maybe_contiguous(377-380)
examples/flash_attention/example_mha_bwd.py (9)
examples/amd/example_amd_flash_attn_bwd.py (2)
maybe_contiguous(242-245)run1(340-341)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)
maybe_contiguous(364-367)examples/flash_attention/example_gqa_bwd.py (2)
maybe_contiguous(377-380)run1(514-515)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (2)
maybe_contiguous(399-402)run1(536-537)examples/flash_attention/example_mha_bwd_bhsd.py (2)
maybe_contiguous(259-262)run1(336-337)examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (2)
maybe_contiguous(268-271)run1(343-344)examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
maybe_contiguous(134-137)tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/amd/example_amd_flash_attn_bwd.py (1)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)
examples/flash_attention/example_gqa_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/jit/adapter/utils.py (1)
tilelang/language/ast/ir.py (1)
target(1682-1713)
examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (1)
examples/flash_attention/example_gqa_bwd.py (2)
maybe_contiguous(377-380)run1(514-515)
examples/flash_attention/example_gqa_bwd.py (1)
examples/amd/example_amd_flash_attn_bwd.py (2)
maybe_contiguous(242-245)run1(340-341)
🪛 GitHub Actions: CI
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: Reformatted by formatter. Please review and stage the changes.
docs/conf.py
[error] 1-1: Reformatted by formatter. Please review and stage the changes.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: Reformatted by formatter. Please review and stage the changes.
🪛 GitHub Actions: CI Test on AMD
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: Reformatted files. Please review and stage the changes.
docs/conf.py
[error] 1-1: Reformatted files. Please review and stage the changes.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: Reformatted files. Please review and stage the changes.
🪛 GitHub Actions: CI Test on Metal
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: Reformatted by formatting script. Changes not staged for commit.
docs/conf.py
[error] 1-1: Reformatted by formatting script. Changes not staged for commit.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: Reformatted by formatting script. Changes not staged for commit.
🪛 Ruff (0.13.3)
tilelang/carver/roller/node.py
304-304: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
421-421: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
tilelang/quantize/lop3.py
1189-1189: Avoid specifying long messages outside the exception class
(TRY003)
setup.py
312-312: Unused noqa directive (non-enabled: SIM115)
Remove unused noqa directive
(RUF100)
tilelang/carver/roller/shape_inference/tir.py
354-354: Prefer TypeError exception for invalid type
(TRY004)
354-354: Create your own exception
(TRY002)
354-354: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (96)
examples/linear_attention/example_mamba_chunk_state.py (1)
171-175: Print modernisation looks goodNice switch to f-strings—same formatting semantics, cleaner syntax.
examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
198-202: Nice f-string modernization.Formatting stays identical while matching the new lint rule scope. 👍
examples/flash_attention/example_gqa_fwd_bshd.py (1)
251-255: F-string migration looks good.Formatting precision and computed values are unchanged; the output stays consistent while modernizing the style.
examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
211-215: F-string conversion looks good.Formatting specifiers are preserved, so output remains identical while complying with pyupgrade.
examples/bitnet-1.58b/eval_ppl.py (1)
57-57: LGTM! Clean f-string conversion.The modernization from legacy string formatting to f-strings improves readability and aligns with Python 3.6+ best practices. This change is consistent with the PR's goal of enabling pyupgrade linter rules.
examples/hadamard_transform/example_hadamard.py (1)
154-154: LGTM! Clean modernization to f-string.The conversion from
.format()to f-string improves readability and aligns with modern Python best practices while maintaining identical functionality.tilelang/carver/roller/bestfit.py (1)
20-20: LGTM! Modernization to f-string improves readability.The conversion from
str.formatto an f-string is correct and aligns with the PR objective to enable pyupgrade linter. F-strings are more readable and typically faster than older formatting methods.examples/bitnet-1.58b/utils_quant.py (1)
219-219: LGTM! Modernized super() call syntax.The change from
super(BitLinear, self).__init__(*kargs, **kwargs)tosuper().__init__(*kargs, **kwargs)correctly modernizes the code to use Python 3+ idiomatic syntax. The behavior remains identical.tilelang/carver/arch/metal.py (1)
1-1: LGTM!Adding
from __future__ import annotationsenables postponed evaluation of type annotations (PEP 563), which is a modern Python practice and aligns with the PR's objective to modernize the codebase. This works well with the existing PEP 604 union syntax (Target | str) on line 12.examples/deepseek_mla/example_mla_decode_paged.py (1)
403-404: LGTM! Clean modernization to f-strings.The conversion from
format()to f-strings is correct and improves readability while maintaining identical output semantics.examples/flash_decoding/example_gqa_decode.py (1)
476-480: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct and follows Python best practices. F-strings are more readable, concise, and performant.examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (2)
225-226: LGTM! Clean conversion to f-strings.The migration from
.format()to f-strings is correct and follows modern Python conventions. The formatting specifiers and output remain identical.
228-229: LGTM! Consistent f-string conversion.The conversion maintains identical output while improving code readability.
examples/linear_attention/example_mamba_chunk_scan.py (1)
232-236: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings improves readability while preserving the exact formatting (.2f). This aligns with the PR objective to enable pyupgrade linting and follows Python 3.6+ best practices.examples/flash_attention/example_mha_fwd_bhsd.py (1)
206-210: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct, maintains identical output formatting (.2fprecision), and improves readability. These changes align with the PR objective of enabling pyupgrade linter rules.examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py (1)
537-538: LGTM! Clean f-string conversion.The formatting is preserved (
.2ffor 2 decimal places) and the f-string syntax is more readable and Pythonic.examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py (1)
553-554: LGTM! Consistent f-string conversion.The change maintains identical formatting and improves code readability.
examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py (1)
436-437: LGTM! Consistent f-string conversion.The formatting is identical and the f-string syntax is cleaner and more maintainable.
examples/flash_attention/example_mha_fwd_bshd.py (1)
193-197: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct and improves readability. The formatting specifications and variable references are properly preserved.examples/cast/example_per_token_cast_to_fp8.py (1)
103-117: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings is correct and follows modern Python best practices. The format specifications are preserved, and the output remains identical.examples/cast/example_group_per_split_token_cast_to_fp8.py (1)
202-205: LGTM! Consistent f-string modernization.The f-string conversions are correct and consistent with the broader codebase modernization in this PR.
tilelang/intrinsics/mfma_macro_generator.py (1)
14-14: LGTM! Modern Python 3 class declaration.Removing the explicit
objectbase class aligns with Python 3 best practices, as all classes implicitly inherit fromobject.tilelang/intrinsics/wgmma_macro_generator.py (1)
408-408: LGTM! Cleaner assertion message with f-string.The f-string format improves readability and aligns with modern Python conventions.
tilelang/intrinsics/mma_macro_generator.py (2)
28-28: LGTM! Modern Python 3 class declaration.Removing the explicit
objectbase class aligns with Python 3 best practices, consistent with the modernization inmfma_macro_generator.py.
521-521: LGTM! Cleaner assertion message with f-string.The f-string format improves readability and is consistent with the same change in
wgmma_macro_generator.py.examples/amd/example_amd_flash_attn_bwd.py (2)
247-247: LGTM! Generator expression improves efficiency.The change from list comprehension to generator expression is a good modernization. Since the generator is consumed immediately during unpacking, behavior is unchanged while memory efficiency is slightly improved.
346-350: LGTM! F-string conversions improve readability.The conversion from
.format()to f-strings is correct, with all formatting specifications (.2f) properly preserved. This modernization improves code readability without changing behavior.examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (2)
273-273: LGTM! Modern generator expression for unpacking.The change from a list comprehension to a generator expression is a good modernization. The generator is more memory-efficient since the values are immediately consumed during unpacking, and this aligns with pyupgrade recommendations.
349-353: LGTM! F-string conversions are correct.The conversion to f-strings is a modern Python best practice that improves readability. All formatting specifications (
.2f) are correctly preserved, and the logic remains unchanged.examples/flash_attention/example_mha_bwd_bhsd.py (2)
264-264: LGTM! Generator expression for unpacking is correct.The change from list comprehension to generator expression is a valid modernization. While the memory benefit is minimal for 5 items, this aligns with pyupgrade's recommendations and works correctly.
342-346: LGTM! F-string conversions are correct.The print statements have been properly converted from
.format()to f-strings with correct formatting specifiers preserved.tilelang/quantize/quantization.py (2)
226-226: LGTM! Redundant parentheses removed.The removal of the outer parentheses is a safe cosmetic improvement. The expression remains functionally identical, and the remaining parentheses correctly ensure the bitwise AND operation is evaluated before the shift.
235-235: LGTM! Consistent style improvement.The redundant outer parentheses have been removed, making the expression cleaner while maintaining the correct evaluation order. This change is consistent with the improvement on line 226.
examples/flash_attention/example_mha_bwd.py (2)
263-263: LGTM! Generator expression modernization.The change from list comprehension to generator expression for unpacking is a safe, standard Python modernization. Both are functionally equivalent when unpacking, but the generator expression is more memory-efficient.
339-343: LGTM! F-string conversion.The conversion from
.format()to f-strings is a standard Python modernization that improves readability. All format specifiers and expressions are correctly preserved.examples/flash_attention/example_gqa_bwd.py (2)
382-382: LGTM! Generator expression is more memory-efficient.The change from list comprehension to generator expression for unpacking is a valid modernization. Both are functionally equivalent, and the generator expression avoids creating an intermediate list.
520-524: LGTM! F-strings improve readability.The conversion to f-strings modernizes the code and improves readability while correctly preserving the format specifiers for floating-point precision.
examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
139-139: LGTM! Standard pyupgrade optimization.Replacing the list comprehension with a generator expression is correct and avoids allocating an intermediate list. This is a standard pyupgrade rule (UP015) for immediate unpacking.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (4)
183-183: LGTM! Modernized super() call.The simplified
super().__init__()syntax is the Python 3 standard and is preferred over the explicit class/self parameters.
357-357: LGTM! F-string conversion.The f-string syntax is preferred over
.format()and improves readability.
362-362: LGTM! F-string conversion.The f-string syntax with multiple interpolations is more readable than the equivalent
.format()call.
1-1: Stage formatting changes.CI is failing due to unstaged formatter updates. Run
ruff --fix(orblack .if used) and commit all modified files.examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (3)
196-196: LGTM! Modernized super() call.The simplified
super().__init__()syntax is the Python 3 standard and is preferred over the explicit class/self parameters.
369-369: LGTM! F-string conversion.The f-string syntax is preferred over
.format()and improves readability.
372-372: LGTM! F-string conversion.The f-string syntax with multiple interpolations is more readable than the equivalent
.format()call.tilelang/contrib/hipcc.py (1)
57-57: F-string update preserves behaviorSwitching to the f-string keeps the target path identical while satisfying pyupgrade.
testing/python/language/test_tilelang_language_pipeline.py (1)
106-107: Redundant parentheses removal is safeDropping the extra parentheses leaves the casting logic untouched; nice stylistic cleanup.
tilelang/carver/roller/shape_inference/common.py (1)
7-22: Modern class declarations look goodRemoving the explicit
(object)base aligns with Python 3 style and has no runtime impact.examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1)
3-427: Formatting modernization retains semanticsThe added annotations import supports future typing tweaks, and the f-strings mirror the prior formatting exactly.
tilelang/version.py (1)
24-25: Default text-modeopenis equivalentDropping the explicit
'r'keeps text reading behavior unchanged while satisfying the linter.examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
37-39: Simplified file open stays readableUsing the default read mode is fine here and keeps the helper neat.
testing/python/kernel/test_tilelang_kernel_gemm.py (1)
98-99: Parentheses trim keeps conversion intactThe tensor bitcast still executes exactly the same, so this satisfies the formatter without behavioral change.
tilelang/carver/roller/rasterization.py (1)
91-93: F-string emission is straightforwardInjecting
panel_widthvia an f-string preserves the generated code verbatim.pyproject.toml (1)
34-34: LGTM! Pyupgrade linter enabled successfully.The addition of
"UP"and"FA102"to the ruff lint selection enables pyupgrade rules, which modernize Python syntax. This change aligns with the PR objectives and the style updates throughout the codebase.tilelang/jit/adapter/cython/adapter.py (1)
105-105: LGTM! Simplified file open calls.Removing the explicit
"r"mode is consistent with Python 3 conventions, where text read mode is the default. The behavior remains identical.Also applies to: 118-118, 135-135
tilelang/autotuner/param.py (1)
249-249: LGTM! Simplified file open calls.Removing the explicit
"r"mode follows Python 3 conventions. The default text read mode preserves the intended behavior.Also applies to: 318-318, 330-330
setup.py (1)
140-140: LGTM! Simplified file open calls.Removing the explicit
"r"mode follows Python 3 conventions and is consistent with pyupgrade recommendations.Also applies to: 171-171, 711-711, 723-723, 740-740
tilelang/carver/arch/cuda.py (1)
94-94: LGTM! Removed redundant base class.In Python 3, all classes implicitly inherit from
object, making the explicit base class declaration unnecessary. This modernization aligns with PEP 8 and pyupgrade recommendations.examples/bitnet-1.58b/modeling_bitnet.py (1)
1515-1515: LGTM! Simplified file open call.Removing the explicit
"r"mode is consistent with Python 3 conventions and the modernization pattern applied throughout the codebase.examples/bitnet-1.58b/maint/create_bitblas_ckpt.py (1)
88-88: LGTM! Simplified file open call.The change follows Python 3 conventions and is consistent with the modernization applied across the codebase.
examples/norm/rms_norm.py (1)
74-74: LGTM! Modernized string formatting.Replacing
.format()with f-strings improves readability and performance. This is a recommended Python 3.6+ practice and aligns with pyupgrade recommendations.Also applies to: 76-76
tilelang/cache/kernel_cache.py (1)
366-368: No behavior change from dropping"r"
open()defaults to text-read mode, so this remains a safe, no-op cleanup.tilelang/quantize/lop3.py (1)
1189-1193: F-string conversion looks goodSwitching to an f-string keeps the message identical while matching the new lint expectations.
docs/conf.py (1)
5-9: Docs tweaks are soundBoth the f-string and the implicit read mode are clean modernizations with no functional impact.
tilelang/carver/roller/hint.py (1)
150-152: Class declaration cleanup is correctRemoving
(object)is idiomatic in Python 3 and keeps semantics identical.tilelang/jit/kernel.py (1)
20-32: Python 3 class style looks goodDropping the explicit
objectbase keeps behavior the same while modernizing the declaration.examples/attention_sink/example_mha_sink_fwd_bhsd.py (2)
2-2: LGTM: Future annotations import added.Enables PEP 563 postponed evaluation of annotations, consistent with modern Python 3 practices.
289-293: LGTM: Format strings modernized to f-strings.The conversion from
.format()to f-strings is correct and improves readability. The numeric formatting and calculations remain unchanged.examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (1)
3-3: LGTM: Modern Python formatting applied.The addition of future annotations and conversion to f-strings are standard modernizations that improve code readability without changing functionality.
Also applies to: 438-446
testing/python/kernel/test_tilelang_kernel_gemm_simt.py (1)
109-109: LGTM: Removed redundant parentheses.The extra parentheses around the integer expression were unnecessary and have been correctly removed.
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (2)
2-2: LGTM: Modern Python patterns applied.The future annotations import and generator expression (instead of list comprehension) are appropriate modernizations. The generator expression is safe here since values are immediately unpacked into separate variables.
Also applies to: 361-361
488-492: LGTM: Format strings modernized to f-strings.The conversion maintains the same output formatting while improving readability.
examples/flash_decoding/example_mha_inference.py (1)
321-325: LGTM: Format strings modernized to f-strings.The conversion correctly maintains the formatting precision (
.2fand.4f) while improving code readability.examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (2)
404-404: LGTM: Generator expression applied.The switch from list comprehension to generator expression is more memory efficient and safe here since values are immediately unpacked.
542-546: LGTM: Format strings modernized to f-strings.The conversion maintains the same output formatting while improving readability.
examples/attention_sink/example_mha_sink_bwd_bhsd.py (3)
2-2: LGTM: Future annotations import added.Enables PEP 563 forward reference support, consistent with modern Python typing practices.
369-369: LGTM: Generator expression unpacking.More memory-efficient than creating an intermediate list for unpacking, while maintaining the same functionality.
492-496: LGTM: Modernized to f-strings.Cleaner and more readable than
.format()calls, consistent with modern Python style.tilelang/carver/roller/shape_inference/tir.py (2)
50-50: LGTM: Removed redundant explicit base class.In Python 3, all classes implicitly inherit from
object, so the explicit base is unnecessary.
79-79: LGTM: Removed redundant explicit base class.Consistent with modern Python 3 style where
objectinheritance is implicit.tilelang/carver/roller/node.py (3)
32-32: LGTM: Removed redundant explicit base class.Modern Python 3 style where
objectinheritance is implicit.
93-93: LGTM: Removed redundant explicit base class.Consistent with Python 3 conventions.
421-421: LGTM: Simplified decorator syntax.Same modernization as line 304, consistent with Python 3.8+ conventions.
tilelang/jit/adapter/wrapper.py (6)
179-179: LGTM: Removed redundant explicit base class.Modern Python 3 style.
325-326: LGTM: Modernized to f-strings.Cleaner and more readable for constructing the
dim3strings.
350-351: LGTM: Modernized to f-strings.Consistent string formatting improvements for kernel launch and error checking.
895-895: LGTM: Removed redundant explicit base class.Consistent with Python 3 conventions.
993-993: LGTM: Modernized to f-string.Cleaner string formatting for the function call construction.
1062-1062: LGTM: Removed redundant explicit base class.Consistent Python 3 style.
tilelang/jit/adapter/utils.py (2)
4-4: LGTM: Modernized type hints.Updated to use PEP 604 union syntax (
|) and built-intupleinstead ofUnionandTuple, consistent with Python 3.10+ conventions.Also applies to: 68-72, 110-110
75-76: LGTM: Added input validation.Early validation ensures
model_typeis one of the expected values, providing clear error messages.tilelang/language/proxy.py (5)
4-4: LGTM: Updated imports.Removed unnecessary imports as the built-in
tupleis now used instead ofTuple.
146-146: LGTM: Modernized type hint.Using built-in
tupleinstead ofTuplefrom typing module, consistent with Python 3.9+ style.
154-154: LGTM: Modernized type hint.Using PEP 604 union syntax (
|) and built-intuple, consistent with modern Python conventions.
175-176: LGTM: Modernized type hints.Consistent use of built-in
tupletype.
276-276: LGTM: Modernized type hint.Using PEP 604 union syntax (
str | None) instead ofOptional[str].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (1)
1-1: Fix formatting issues before merging.Pipeline failures indicate that this file does not match the project's formatting standards:
- CI: "Reformatted by formatter. Changes not staged for commit."
- CI Test on Metal: "Reformatted files. Please review and stage the changes."
- CI Test on AMD: "clang-format reformatted file. Please review and stage the changes."
Please run the project's formatter (likely
yapfbased on pyproject.toml) on this file and commit the changes.
🧹 Nitpick comments (1)
examples/flash_attention/example_mha_bwd_bhsd.py (1)
264-264: Consider reverting to list comprehension for clarity.While unpacking a generator expression is syntactically valid, it's less common and arguably less clear than the list comprehension. For 5 elements, the memory/performance benefit is negligible.
If you prefer the more conventional pattern, apply this diff:
- do, q, k, v, o = (maybe_contiguous(x) for x in (do, q, k, v, o)) + do, q, k, v, o = [maybe_contiguous(x) for x in (do, q, k, v, o)]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (72)
docs/conf.py(1 hunks)examples/amd/example_amd_flash_attn_bwd.py(2 hunks)examples/attention_sink/example_gqa_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/attention_sink/example_mha_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd.py(2 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/bitnet-1.58b/configuration_bitnet.py(0 hunks)examples/bitnet-1.58b/eval_ppl.py(1 hunks)examples/bitnet-1.58b/maint/create_bitblas_ckpt.py(1 hunks)examples/bitnet-1.58b/modeling_bitnet.py(1 hunks)examples/bitnet-1.58b/tokenization_bitnet.py(0 hunks)examples/bitnet-1.58b/utils_quant.py(1 hunks)examples/bitnet-1.58b/vllm_workspace/conftest.py(1 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py(2 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py(2 hunks)examples/cast/example_group_per_split_token_cast_to_fp8.py(1 hunks)examples/cast/example_per_token_cast_to_fp8.py(2 hunks)examples/deepseek_mla/example_mla_decode_paged.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py(1 hunks)examples/flash_attention/example_gqa_bwd.py(2 hunks)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_gqa_fwd_bshd.py(1 hunks)examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_bwd.py(2 hunks)examples/flash_attention/example_mha_bwd_bhsd.py(2 hunks)examples/flash_attention/example_mha_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_mha_fwd_bhsd.py(1 hunks)examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_decoding/example_gqa_decode.py(1 hunks)examples/flash_decoding/example_mha_inference.py(1 hunks)examples/grouped_gemm/example_grouped_gemm_bwd.py(1 hunks)examples/hadamard_transform/example_hadamard.py(1 hunks)examples/linear_attention/example_mamba_chunk_scan.py(1 hunks)examples/linear_attention/example_mamba_chunk_state.py(2 hunks)examples/minference/example_vertical_slash_sparse_attn.py(1 hunks)examples/norm/rms_norm.py(1 hunks)pyproject.toml(1 hunks)setup.py(6 hunks)testing/python/kernel/test_tilelang_kernel_gemm.py(1 hunks)testing/python/kernel/test_tilelang_kernel_gemm_simt.py(1 hunks)testing/python/language/test_tilelang_language_pipeline.py(1 hunks)tilelang/autotuner/param.py(3 hunks)tilelang/cache/kernel_cache.py(1 hunks)tilelang/carver/arch/cuda.py(1 hunks)tilelang/carver/arch/metal.py(1 hunks)tilelang/carver/roller/bestfit.py(1 hunks)tilelang/carver/roller/hint.py(1 hunks)tilelang/carver/roller/node.py(4 hunks)tilelang/carver/roller/rasterization.py(1 hunks)tilelang/carver/roller/shape_inference/common.py(2 hunks)tilelang/carver/roller/shape_inference/tir.py(3 hunks)tilelang/contrib/hipcc.py(1 hunks)tilelang/intrinsics/mfma_macro_generator.py(1 hunks)tilelang/intrinsics/mma_macro_generator.py(2 hunks)tilelang/intrinsics/wgmma_macro_generator.py(1 hunks)tilelang/jit/adapter/cython/adapter.py(3 hunks)tilelang/jit/adapter/libgen.py(1 hunks)tilelang/jit/adapter/utils.py(3 hunks)tilelang/jit/adapter/wrapper.py(6 hunks)tilelang/jit/kernel.py(1 hunks)tilelang/language/proxy.py(4 hunks)tilelang/quantize/lop3.py(1 hunks)tilelang/quantize/quantization.py(2 hunks)tilelang/tileop/gemm/gemm_base.py(1 hunks)tilelang/version.py(1 hunks)
💤 Files with no reviewable changes (2)
- examples/bitnet-1.58b/tokenization_bitnet.py
- examples/bitnet-1.58b/configuration_bitnet.py
🧰 Additional context used
🧬 Code graph analysis (23)
examples/flash_attention/example_mha_fwd_bhsd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/intrinsics/mma_macro_generator.py (2)
tilelang/intrinsics/wgmma_macro_generator.py (1)
TensorCoreIntrinEmitter(64-519)tilelang/utils/language.py (1)
is_fragment(68-78)
examples/flash_decoding/example_gqa_decode.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/intrinsics/wgmma_macro_generator.py (1)
tilelang/utils/language.py (1)
is_fragment(68-78)
examples/flash_decoding/example_mha_inference.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_bwd.py (2)
examples/amd/example_amd_flash_attn_bwd.py (2)
maybe_contiguous(242-245)run1(340-341)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (2)
maybe_contiguous(399-402)run1(536-537)
examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/attention_sink/example_mha_sink_bwd_bhsd.py (3)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)examples/flash_attention/example_mha_bwd.py (1)
maybe_contiguous(258-261)
examples/minference/example_vertical_slash_sparse_attn.py (1)
tilelang/language/builtin.py (1)
mbarrier_wait_parity(172-219)
examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/linear_attention/example_mamba_chunk_scan.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/jit/adapter/utils.py (1)
tilelang/language/ast/ir.py (1)
target(1682-1713)
examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (1)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)
examples/amd/example_amd_flash_attn_bwd.py (1)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)
examples/flash_attention/example_mha_bwd_bhsd.py (3)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)examples/flash_attention/example_mha_bwd.py (1)
maybe_contiguous(258-261)
examples/linear_attention/example_mamba_chunk_state.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (2)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)examples/attention_sink/example_mha_sink_bwd_bhsd.py (2)
maybe_contiguous(364-367)tl_bwd(488-489)
examples/flash_attention/example_mha_bwd.py (2)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (1)
examples/flash_attention/example_gqa_bwd.py (2)
maybe_contiguous(377-380)run1(514-515)
🪛 GitHub Actions: CI
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: Reformatted by formatter. Changes not staged for commit.
docs/conf.py
[error] 1-1: Reformatted by formatter. Changes not staged for commit.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: Reformatted by formatter. Changes not staged for commit.
🪛 GitHub Actions: CI Test on AMD
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: clang-format reformatted file. Please review and stage the changes.
docs/conf.py
[error] 1-1: clang-format reformatted file. Please review and stage the changes.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: clang-format reformatted file. Please review and stage the changes.
🪛 GitHub Actions: CI Test on Metal
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
[error] 1-1: Reformatted files. Please review and stage the changes. Changes not staged for commit.
docs/conf.py
[error] 1-1: Reformatted files. Please review and stage the changes. Changes not staged for commit.
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
[error] 1-1: Reformatted files. Please review and stage the changes. Changes not staged for commit.
🪛 Ruff (0.13.3)
tilelang/quantize/lop3.py
1189-1189: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/carver/roller/shape_inference/tir.py
354-354: Prefer TypeError exception for invalid type
(TRY004)
354-354: Create your own exception
(TRY002)
354-354: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/carver/roller/node.py
304-304: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
421-421: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
setup.py
312-312: Unused noqa directive (non-enabled: SIM115)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (70)
examples/flash_decoding/example_mha_inference.py (1)
321-325: F-string conversion looks goodFormatting and precision stay the same, readability improves.
testing/python/kernel/test_tilelang_kernel_gemm.py (1)
98-99: Redundant parentheses cleanup looks good.The simplified expression preserves behavior and keeps the intent clear.
examples/cast/example_group_per_split_token_cast_to_fp8.py (1)
202-205: LGTM on the f-string swapStraightforward modernization—f-strings keep the same output while matching the new Ruff style expectations.
examples/hadamard_transform/example_hadamard.py (1)
154-154: LGTM! Modern f-string formatting applied correctly.The conversion from
.format()to f-string is correct and improves readability while maintaining identical output.examples/cast/example_per_token_cast_to_fp8.py (1)
103-103: LGTM! Clean modernization to f-strings.The migration from
.format()to f-strings is correct and aligns with the PR objective of enabling pyupgrade linter. The formatting specifications are preserved accurately.Also applies to: 105-105, 117-117
examples/norm/rms_norm.py (1)
74-76: LGTM! Clean modernization to f-strings.The conversion from format strings to f-strings is correct and improves readability. The formatting specifiers are preserved accurately.
examples/bitnet-1.58b/eval_ppl.py (1)
57-57: LGTM! F-string modernization improves readability.The conversion to an f-string is correct and aligns with modern Python best practices. It also maintains consistency with the existing f-string usage at line 53.
testing/python/language/test_tilelang_language_pipeline.py (1)
106-107: LGTM! Clean formatting improvement.Removing the redundant outer parentheses improves readability without affecting functionality. The tfloat32 conversion logic remains correct.
tilelang/carver/roller/rasterization.py (1)
92-92: LGTM! Clean f-string conversion.The modernization from
.format()to an f-string is correct and aligns with the PR's objective to enable pyupgrade linting.examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py (1)
493-494: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct and improves readability. The formatting specifiers are preserved accurately, and the changes align with the PR's objective of modernizing Python style across the codebase.testing/python/kernel/test_tilelang_kernel_gemm_simt.py (1)
109-109: LGTM! Redundant parentheses removed.The removal of redundant parentheses around
block_K // micro_size_kimproves code clarity and aligns with the usage ofT.serialelsewhere in the file (lines 99, 110, 115, 121, 126). This change is part of the pyupgrade linter enforcement and has no functional impact.examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
198-202: LGTM! Clean f-string conversion.The print statements have been correctly modernized to use f-strings, improving readability while maintaining identical output formatting. The conversions follow Python best practices and align with the PR's objective to enable pyupgrade linting.
examples/bitnet-1.58b/utils_quant.py (1)
219-219: LGTM! Modern Python 3 super() call.The modernization from
super(BitLinear, self).__init__(...)tosuper().__init__(...)is correct and follows Python 3 best practices. This change is consistent with the existing code inBitLinearBitBLAS.__init__(line 44) and aligns with the PR's objective to modernize Python style.examples/linear_attention/example_mamba_chunk_state.py (2)
43-43: LGTM! Redundant parentheses removed.The extra parentheses around the subtraction expression are unnecessary and have been correctly removed without affecting functionality.
171-175: LGTM! Print statements modernized to f-strings.The conversion from
.format()to f-strings improves readability and aligns with modern Python style. The formatting specifiers and output values remain identical.examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (2)
225-226: LGTM! F-string conversion is correct.The conversion from
.format()to f-strings is properly done and aligns with the PR objective of modernizing Python style.
228-229: LGTM! Consistent f-string modernization.The f-string conversion matches the pattern used for the reference benchmark output above, maintaining consistency throughout the file.
examples/flash_attention/example_mha_bwd_bhsd.py (1)
342-346: LGTM! Good modernization to f-strings.The conversion from
.format()to f-strings improves readability and aligns with modern Python style.examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
37-37: LGTM! Clean modernization.Removing the explicit
"r"mode is appropriate since it's Python's default foropen(). This change aligns with the pyupgrade linter recommendations.tilelang/carver/arch/metal.py (1)
1-1: LGTM! Enables modern type hint syntax.Adding
from __future__ import annotationsis appropriate for this module. It enables the modern union syntax (Target | stron line 12) and improves forward reference handling during type checking.examples/bitnet-1.58b/modeling_bitnet.py (1)
1515-1515: LGTM! Consistent with file I/O modernization.Removing the explicit
"r"mode follows the same pattern as other file I/O updates in this PR and aligns with pyupgrade recommendations.tilelang/intrinsics/wgmma_macro_generator.py (1)
408-408: LGTM! F-string improves readability.The conversion from
.format()to an f-string is correct and aligns with modern Python style guidelines enforced by pyupgrade.examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py (1)
553-554: LGTM! Print statements modernized.The f-string conversions for the latency and TFlops output are correct and improve code readability while maintaining identical output formatting.
tilelang/carver/roller/bestfit.py (1)
20-20: LGTM! Cleaner repr implementation.The f-string conversion in the
__repr__method is correct and makes the code more readable while preserving the exact string representation.examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
211-215: LGTM! Benchmark output formatting improved.All four print statements have been correctly converted to f-strings, maintaining the same output format while improving code clarity and consistency with the rest of the PR.
examples/bitnet-1.58b/maint/create_bitblas_ckpt.py (1)
88-88: LGTM! Consistent file I/O modernization.Removing the explicit
"r"mode is appropriate and aligns with the broader PR pattern of modernizing file I/O operations across the codebase.pyproject.toml (1)
34-34: LGTM! Pyupgrade rules enabled.Enabling the UP (pyupgrade) and FA102 rules aligns with the PR objective to prevent issues by modernizing Python syntax across the codebase.
examples/flash_attention/example_mha_fwd_bhsd.py (1)
206-210: LGTM! F-string conversion improves readability.The conversion from
.format()to f-strings is a standard Python modernization that improves code readability while maintaining identical functionality.tilelang/carver/roller/hint.py (1)
150-150: LGTM! Python 3 class declaration modernization.Removing explicit
objectinheritance is correct for Python 3, where all classes implicitly inherit fromobject. This is a standard pyupgrade modernization with no behavioral change.tilelang/autotuner/param.py (3)
249-249: LGTM! Default file mode simplification.Omitting the explicit
"r"mode is correct since text-read mode is the default foropen(). This is a standard pyupgrade modernization.
318-318: LGTM! Default file mode simplification.Omitting the explicit
"r"mode is correct since text-read mode is the default foropen().
330-330: LGTM! Default file mode simplification.Omitting the explicit
"r"mode is correct since text-read mode is the default foropen().examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (2)
183-183: LGTM! Modern super() call.The modern
super()syntax without explicit class and self arguments is the recommended Python 3 pattern and is functionally equivalent.
357-362: LGTM! F-string conversions improve readability.The conversion to f-strings modernizes the code while maintaining identical output.
tilelang/jit/kernel.py (1)
20-20: LGTM! Python 3 class declaration modernization.Removing explicit
objectinheritance is correct for Python 3. This is a standard pyupgrade modernization with no behavioral change.tilelang/jit/adapter/libgen.py (1)
32-32: LGTM! Python 3 class declaration modernization.Removing explicit
objectinheritance is correct for Python 3. This is a standard pyupgrade modernization with no behavioral change.examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (2)
404-404: LGTM! Generator expression for unpacking is more efficient.Using a generator expression instead of a list comprehension for tuple unpacking is more memory-efficient. Python will consume the generator during unpacking to assign all values correctly.
542-546: LGTM! F-string conversions improve readability.The conversion from
.format()to f-strings is a standard Python modernization that improves code readability.tilelang/language/proxy.py (1)
4-4: LGTM! Type hints modernized to Python 3.10+ syntax.The changes correctly update type hints to use:
- Built-in
tupleinstead oftyping.Tuple- Union syntax
X | Yinstead ofUnion[X, Y]- Union syntax
X | Noneinstead ofOptional[X]These align with PEP 604 and PEP 585 best practices.
Also applies to: 146-146, 154-154, 175-176, 276-276
tilelang/jit/adapter/utils.py (1)
4-4: LGTM! Type hints modernized and validation improved.The changes correctly:
- Replace
Dictwith built-indict[...]- Use union syntax
X | Yinstead ofUnion[X, Y]- Use
X | Noneinstead ofOptional[X]The early validation for
model_typeat lines 74-76 is a good addition that provides clearer error messages.Also applies to: 67-72, 110-110
examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (2)
3-3: LGTM! Future annotations import added.This enables postponed evaluation of annotations (PEP 563), which is required for the modern type hints used elsewhere in the codebase.
423-427: LGTM! Print statements modernized to f-strings.The f-string conversions preserve the exact formatting (
.2f) and improve readability. No functional changes.examples/flash_decoding/example_gqa_decode.py (1)
476-480: LGTM! Print statements modernized to f-strings.The f-string conversions preserve the exact formatting (
.2f) and improve code readability.tilelang/version.py (1)
24-24: LGTM! Explicit read mode removed.The explicit
"r"mode is redundant as it's the default foropen(). This modernization aligns with Python best practices.docs/conf.py (1)
5-5: LGTM! String formatting and file opening modernized.The changes correctly:
- Convert
%formatting to f-string for better readability- Remove redundant explicit
"r"mode (default foropen())However, note the pipeline failures indicating formatting issues that need to be staged.
Please address the pipeline failures mentioned in the CI:
- "Reformatted by formatter. Changes not staged for commit."
- "clang-format reformatted file. Please review and stage the changes."
These appear to be formatting issues that need to be staged for commit.
Also applies to: 8-8
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (3)
2-2: LGTM! Future annotations import added.This enables postponed evaluation of annotations (PEP 563), supporting the modern type hints used throughout the codebase.
361-361: LGTM! List comprehension changed to generator expression.The generator expression is more memory-efficient and is safe here because the result is immediately unpacked into individual variables. This is a good optimization for transforming multiple items.
488-492: LGTM! Print statements modernized to f-strings.The f-string conversions preserve exact formatting (
.2f) and improve readability.examples/attention_sink/example_mha_sink_fwd_bhsd.py (2)
2-2: LGTM! Future annotations import added.This enables postponed evaluation of annotations (PEP 563), which is necessary for modern type hint syntax.
289-293: LGTM! Print statements modernized to f-strings.The f-string conversions preserve the exact formatting (
.2f) and are more Pythonic.tilelang/cache/kernel_cache.py (1)
366-366: LGTM! Safe modernization.Removing the explicit
"r"mode is a safe Python 3 idiom since"r"is the default foropen().examples/deepseek_mla/example_mla_decode_paged.py (1)
403-404: LGTM! Modern f-string syntax.The conversion from
.format()to f-strings is a safe, idiomatic modernization with identical output.examples/flash_attention/example_mha_bwd.py (2)
263-263: LGTM! Memory-efficient generator unpacking.Replacing the list comprehension with a generator expression is safe here since the unpacking consumes exactly five elements. This reduces memory overhead by avoiding an intermediate list.
339-343: LGTM! Modern f-string syntax.The conversion from
.format()to f-strings is a safe, idiomatic modernization with identical output.tilelang/tileop/gemm/gemm_base.py (1)
12-12: LGTM! Implicit object inheritance.In Python 3, all classes implicitly inherit from
object, so removing the explicit base is a safe modernization. The@dataclassdecorator and class behavior remain unchanged.examples/amd/example_amd_flash_attn_bwd.py (2)
247-247: LGTM! Memory-efficient generator unpacking.Replacing the list comprehension with a generator expression is safe here since the unpacking consumes exactly five elements. This reduces memory overhead by avoiding an intermediate list.
346-350: LGTM! Modern f-string syntax.The conversion from
.format()to f-strings is a safe, idiomatic modernization with identical output.setup.py (3)
140-140: LGTM! Safe modernization.Removing the explicit
"r"mode is a safe Python 3 idiom since"r"is the default foropen().
171-171: LGTM! Safe modernization.Removing the explicit
"r"mode is a safe Python 3 idiom since"r"is the default foropen().
711-740: LGTM! Safe modernization.Removing the explicit
"r"mode from text file opens is a safe Python 3 idiom since"r"is the default. The behavior remains identical.examples/attention_sink/example_mha_sink_bwd_bhsd.py (3)
2-2: LGTM! Forward-compatible annotations.Adding
from __future__ import annotationsenables PEP 563 (postponed evaluation of annotations), which is forward-compatible and allows using newer annotation syntax. This is a safe, future-proof change.
369-369: LGTM! Memory-efficient generator unpacking.Replacing the list comprehension with a generator expression is safe here since the unpacking consumes exactly six elements. This reduces memory overhead by avoiding an intermediate list.
492-496: LGTM! Modern f-string syntax.The conversion from
.format()to f-strings is a safe, idiomatic modernization with identical output.examples/flash_attention/example_gqa_bwd.py (2)
382-382: LGTM! Memory-efficient generator unpacking.Replacing the list comprehension with a generator expression is safe here since the unpacking consumes exactly five elements. This reduces memory overhead by avoiding an intermediate list.
520-524: LGTM! Modern f-string syntax.The conversion from
.format()to f-strings is a safe, idiomatic modernization with identical output.examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (3)
196-196: LGTM! Modern super() syntax.The change from
super(SparseFlashAttn, self).__init__()tosuper().__init__()follows Python 3 best practices and is the idiomatic way to call parent class constructors.
369-369: LGTM! f-string conversion.The change from
.format()to f-string improves readability and aligns with modern Python style.
372-372: LGTM! f-string conversion with multiple expressions.The conversion correctly maps all format arguments to f-string expressions, improving readability.
tilelang/jit/adapter/cython/adapter.py (1)
105-105: LGTM! Default file mode usage.Removing the explicit
"r"mode is correct since Python defaults to text read mode. This modernization aligns with pyupgrade linter rules and makes the code cleaner without changing behavior.Also applies to: 118-118, 135-135
examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
139-139: Confirm generator-expression change
The switch from[maybe_contiguous(x) for x in …]to(maybe_contiguous(x) for x in …)for immediate unpacking isn’t a known pyupgrade rule and offers no benefit. Verify this refactor is intentional and actually produced by your pyupgrade/ruff setup (e.g., share your lint output or rule config).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (1)
357-357: Good f-string conversion with minor optimization opportunity.The f-string conversion improves readability. Consider simplifying further by eliminating the concatenation:
- print(name + f" all_close={all_close}") + print(f"{name} all_close={all_close}")examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (1)
369-369: Good f-string conversion with minor optimization opportunity.The f-string conversion improves readability. Consider simplifying further by eliminating the concatenation:
- print(name + f" all_close={all_close}") + print(f"{name} all_close={all_close}")examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
139-139: Consider usingmap()for better idiomaticity.The generator expression works correctly but is unconventional for immediate unpacking. The
map()builtin is the standard Python pattern for applying a function to multiple values.Apply this diff to use the more idiomatic
map():- A, B, batch_sizes = (maybe_contiguous(x) for x in (A, B, batch_sizes)) + A, B, batch_sizes = map(maybe_contiguous, (A, B, batch_sizes))setup.py (1)
312-312: Remove the unusednoqadirective.The
# noqa: SIM115comment is no longer needed since the code has been updated to address the linting issue.Apply this diff:
- return open(get_path("README.md"), encoding="utf-8").read() # noqa: SIM115 + return open(get_path("README.md"), encoding="utf-8").read()tilelang/carver/roller/shape_inference/tir.py (1)
354-354: F-string migration looks good; consider usingTypeErrorfor type errors.The f-string conversion is correct. As an optional improvement, consider raising
TypeErrorinstead of genericExceptionwhen encountering unexpected types, as this provides clearer intent.Apply this diff if desired:
- raise Exception(f"Unhandled node type in walk_indice(): {expr}") + raise TypeError(f"Unhandled node type in walk_indice(): {type(expr).__name__}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (72)
docs/conf.py(1 hunks)examples/amd/example_amd_flash_attn_bwd.py(2 hunks)examples/attention_sink/example_gqa_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/attention_sink/example_mha_sink_bwd_bhsd.py(3 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd.py(2 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py(2 hunks)examples/bitnet-1.58b/configuration_bitnet.py(0 hunks)examples/bitnet-1.58b/eval_ppl.py(1 hunks)examples/bitnet-1.58b/maint/create_bitblas_ckpt.py(1 hunks)examples/bitnet-1.58b/modeling_bitnet.py(1 hunks)examples/bitnet-1.58b/tokenization_bitnet.py(0 hunks)examples/bitnet-1.58b/utils_quant.py(1 hunks)examples/bitnet-1.58b/vllm_workspace/conftest.py(1 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py(2 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py(2 hunks)examples/cast/example_group_per_split_token_cast_to_fp8.py(1 hunks)examples/cast/example_per_token_cast_to_fp8.py(2 hunks)examples/deepseek_mla/example_mla_decode_paged.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py(1 hunks)examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py(1 hunks)examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py(1 hunks)examples/flash_attention/example_gqa_bwd.py(2 hunks)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_gqa_fwd_bshd.py(1 hunks)examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_bwd.py(2 hunks)examples/flash_attention/example_mha_bwd_bhsd.py(2 hunks)examples/flash_attention/example_mha_bwd_wgmma_pipelined.py(2 hunks)examples/flash_attention/example_mha_fwd_bhsd.py(1 hunks)examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd.py(1 hunks)examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py(1 hunks)examples/flash_decoding/example_gqa_decode.py(1 hunks)examples/flash_decoding/example_mha_inference.py(1 hunks)examples/grouped_gemm/example_grouped_gemm_bwd.py(1 hunks)examples/hadamard_transform/example_hadamard.py(1 hunks)examples/linear_attention/example_mamba_chunk_scan.py(1 hunks)examples/linear_attention/example_mamba_chunk_state.py(2 hunks)examples/minference/example_vertical_slash_sparse_attn.py(1 hunks)examples/norm/rms_norm.py(1 hunks)pyproject.toml(1 hunks)setup.py(6 hunks)testing/python/kernel/test_tilelang_kernel_gemm.py(1 hunks)testing/python/kernel/test_tilelang_kernel_gemm_simt.py(1 hunks)testing/python/language/test_tilelang_language_pipeline.py(1 hunks)tilelang/autotuner/param.py(3 hunks)tilelang/cache/kernel_cache.py(1 hunks)tilelang/carver/arch/cuda.py(1 hunks)tilelang/carver/arch/metal.py(1 hunks)tilelang/carver/roller/bestfit.py(1 hunks)tilelang/carver/roller/hint.py(1 hunks)tilelang/carver/roller/node.py(4 hunks)tilelang/carver/roller/rasterization.py(1 hunks)tilelang/carver/roller/shape_inference/common.py(2 hunks)tilelang/carver/roller/shape_inference/tir.py(3 hunks)tilelang/contrib/hipcc.py(1 hunks)tilelang/intrinsics/mfma_macro_generator.py(1 hunks)tilelang/intrinsics/mma_macro_generator.py(2 hunks)tilelang/intrinsics/wgmma_macro_generator.py(1 hunks)tilelang/jit/adapter/cython/adapter.py(3 hunks)tilelang/jit/adapter/libgen.py(1 hunks)tilelang/jit/adapter/utils.py(3 hunks)tilelang/jit/adapter/wrapper.py(6 hunks)tilelang/jit/kernel.py(1 hunks)tilelang/language/proxy.py(4 hunks)tilelang/quantize/lop3.py(1 hunks)tilelang/quantize/quantization.py(2 hunks)tilelang/tileop/gemm/gemm_base.py(1 hunks)tilelang/version.py(1 hunks)
💤 Files with no reviewable changes (2)
- examples/bitnet-1.58b/configuration_bitnet.py
- examples/bitnet-1.58b/tokenization_bitnet.py
🧰 Additional context used
🧬 Code graph analysis (25)
examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/minference/example_vertical_slash_sparse_attn.py (1)
tilelang/language/builtin.py (1)
mbarrier_wait_parity(172-219)
examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/linear_attention/example_mamba_chunk_scan.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/jit/adapter/utils.py (1)
tilelang/language/ast/ir.py (1)
target(1682-1713)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (2)
examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)
maybe_contiguous(364-367)examples/flash_attention/example_gqa_bwd.py (1)
maybe_contiguous(377-380)
examples/flash_decoding/example_gqa_decode.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_bwd.py (2)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/carver/roller/shape_inference/common.py (1)
tilelang/carver/roller/shape_inference/tir.py (2)
Statement(7-43)InputShapeInference(169-318)
examples/linear_attention/example_mamba_chunk_state.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
tilelang/intrinsics/mma_macro_generator.py (1)
tilelang/utils/language.py (1)
is_fragment(68-78)
examples/flash_attention/example_mha_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/grouped_gemm/example_grouped_gemm_bwd.py (1)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)
examples/flash_decoding/example_mha_inference.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_fwd_bshd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_bwd.py (1)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)
examples/flash_attention/example_mha_fwd_bhsd.py (1)
tilelang/profiler/__init__.py (1)
do_bench(218-281)
examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (1)
examples/flash_attention/example_gqa_bwd.py (2)
maybe_contiguous(377-380)run1(514-515)
examples/amd/example_amd_flash_attn_bwd.py (1)
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
maybe_contiguous(356-359)
examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (1)
examples/amd/example_amd_flash_attn_bwd.py (1)
maybe_contiguous(242-245)
examples/attention_sink/example_mha_sink_bwd_bhsd.py (1)
examples/flash_attention/example_mha_bwd.py (1)
maybe_contiguous(258-261)
examples/flash_attention/example_mha_bwd_bhsd.py (1)
examples/amd/example_amd_flash_attn_bwd.py (2)
maybe_contiguous(242-245)run1(340-341)
tilelang/intrinsics/wgmma_macro_generator.py (1)
tilelang/utils/language.py (1)
is_fragment(68-78)
🪛 Ruff (0.13.3)
tilelang/carver/roller/shape_inference/tir.py
354-354: Prefer TypeError exception for invalid type
(TRY004)
354-354: Create your own exception
(TRY002)
354-354: Avoid specifying long messages outside the exception class
(TRY003)
setup.py
312-312: Unused noqa directive (non-enabled: SIM115)
Remove unused noqa directive
(RUF100)
tilelang/quantize/lop3.py
1189-1189: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/carver/roller/node.py
304-304: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
421-421: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-amd
- GitHub Check: format-check
🔇 Additional comments (98)
testing/python/language/test_tilelang_language_pipeline.py (1)
106-107: No-op cleanup looks good.Dropping the redundant parentheses keeps the expression readable with identical semantics while satisfying the linter. ✅
examples/minference/example_vertical_slash_sparse_attn.py (1)
209-209: Redundant parentheses removal looks good.The simplified parity expression keeps identical behavior while matching the linter’s style expectations.
examples/bitnet-1.58b/eval_ppl.py (1)
57-57: LGTM! Clean modernization to f-string.The conversion from
.format()to f-string is correct and improves readability. This change also enhances consistency, as line 53 already uses f-strings for similar output formatting.tilelang/contrib/hipcc.py (1)
57-57: LGTM! Clean string formatting modernization.The conversion from
%formatting to f-string is correct and improves readability. Sincetarget_formatis validated to be"hsaco"on line 54-55, both the old and new formatting produce identical output.examples/cast/example_per_token_cast_to_fp8.py (1)
103-103: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct and aligns with Python 3.6+ best practices. F-strings are more readable and slightly more performant than the older string formatting method.Also applies to: 105-105, 117-117
examples/flash_decoding/example_gqa_decode.py (1)
476-480: F-string conversions are equivalentSwitchover keeps the formatting semantics intact while aligning with the pyupgrade style target.
tilelang/quantize/quantization.py (2)
219-227: No behavior change in prefix assembly.Parentheses cleanup keeps the same precedence, so the packed bf16 exponent path remains identical. Looks good.
231-236: Bitwise OR chaining still correct.Associativity of
|means the expanded form matches the prior grouped expression; the e4m3 conversion stays stable.examples/cast/example_group_per_split_token_cast_to_fp8.py (2)
202-202: LGTM! Clean f-string migration.The conversion from
.format()to f-string is correct and improves readability, aligning with the PR objective to enable pyupgrade linter rules.
205-205: LGTM! Clean f-string migration.The conversion from
.format()to f-string is correct and improves readability, consistent with the modernization effort.examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py (1)
493-494: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings is correct and preserves the formatting specification (.2f). This change improves code consistency with the existing f-string usage elsewhere in the file (e.g., lines 436-437) and aligns with modern Python best practices.examples/flash_decoding/example_mha_inference.py (1)
321-325: LGTM! String formatting modernization improves readability.The conversion from
.format()to f-strings is correct and aligns with modern Python best practices (PEP 498) and the PR's objective to enable pyupgrade linting. The formatting specifiers and calculations remain identical, ensuring no behavioral changes.examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py (1)
436-437: LGTM: Clean f-string migration.The conversion from
.format()to f-strings improves readability and aligns with modern Python best practices.examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py (1)
553-554: LGTM: Consistent f-string migration.The formatting update is correct and consistent with the broader modernization effort across the codebase.
examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py (1)
537-538: LGTM: F-string conversion applied correctly.The formatting change maintains the same output while improving code quality.
examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py (1)
280-284: LGTM: F-string updates for all benchmark outputs.All four print statements have been correctly migrated to f-strings, maintaining consistent formatting across reference and kernel benchmarks.
examples/deepseek_mla/example_mla_decode_paged.py (1)
403-404: LGTM! Clean f-string conversion.The conversion from
.format()to f-strings is correct and maintains identical output. This modernization improves readability and aligns with Python 3.6+ best practices.examples/bitnet-1.58b/utils_quant.py (1)
219-219: LGTM! Modern super() syntax applied correctly.The change from
super(BitLinear, self).__init__(*kargs, **kwargs)tosuper().__init__(*kargs, **kwargs)is correct and aligns with Python 3+ best practices. The behavior remains equivalent.examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (2)
183-183: LGTM! Modern super() syntax.The simplified
super().__init__()call is the recommended Python 3 idiom, reducing boilerplate and potential errors.
362-364: LGTM! Clean multi-line f-string formatting.The conversion from
.format()to f-strings improves readability and performance while maintaining the multi-line structure.examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (2)
196-196: LGTM! Modern super() syntax.The simplified
super().__init__()call is the recommended Python 3 idiom, reducing boilerplate and potential errors.
372-374: LGTM! Clean multi-line f-string formatting.The conversion from
.format()to f-strings improves readability and performance while maintaining the multi-line structure.examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (2)
3-3: LGTM! Essential for modern type hints.The
from __future__ import annotationsimport enables theint | Nonesyntax used throughout the file (lines 209, 338, 383) without requiringtyping.Union. This is the recommended approach for Python 3.10+ style type hints in earlier Python versions.
438-446: LGTM! Clean f-string migration.The conversion to f-strings improves readability and follows modern Python best practices. All format specifiers are correctly preserved (
.2ffor floating-point precision).examples/flash_attention/example_mha_bwd.py (2)
263-263: LGTM! Generator expression for unpacking is idiomatic.The change from list comprehension to generator expression aligns with pyupgrade's UP027 rule. For unpacking into a fixed number of variables, generators are preferred as they avoid creating an intermediate list.
339-343: LGTM! F-strings improve readability.The conversion from
.format()to f-strings follows pyupgrade's UP031/UP032 rules. The formatting specifications are correctly preserved, and f-strings offer better readability and performance.examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (2)
3-3: LGTM!The
from __future__ import annotationsimport is correctly placed and enables the use of modern type hint syntax (e.g.,int | None,tuple[...]at lines 202, 331, 358) while maintaining compatibility with Python 3.7-3.9.
423-427: LGTM!The conversion from
.format()to f-strings is correct and maintains identical formatting. The changes improve readability and align with modern Python best practices.Note: The AI summary mentions a speedup calculation line that is not present in the annotated code changes. This may indicate the summary is inconsistent or referencing changes outside the shown diff.
examples/linear_attention/example_mamba_chunk_scan.py (1)
232-236: LGTM! Clean modernization to f-strings.The conversion from
.format()to f-strings improves readability while preserving identical formatting behavior. This aligns well with the PR's objective to enable pyupgrade linting.examples/attention_sink/example_mha_sink_bwd_bhsd.py (3)
2-2: LGTM: Modern annotation handling enabled.The
from __future__ import annotationsimport enables postponed evaluation of type hints (PEP 563), which allows the modern union syntaxint | Noneused in function signatures (lines 395, 442) to work correctly across Python versions.
369-369: LGTM: Efficient unpacking with generator expression.The conversion from list comprehension to generator expression is a minor memory optimization. During unpacking, the generator is fully consumed, so the behavior remains identical.
492-496: LGTM: Modern f-string formatting.The conversion to f-strings improves readability while maintaining identical output formatting. All format specifiers (
.2f) are correctly preserved.examples/attention_sink/example_mha_sink_fwd_bhsd.py (2)
2-2: LGTM! Essential import for modern type hints.The
from __future__ import annotationsimport enables PEP 563 postponed evaluation, which is necessary for the modern union syntax (int | None) used throughout this file (lines 190, 246). This is a standard modernization pattern recommended by pyupgrade.
289-293: LGTM! Clean f-string conversions.The migration from
.format()to f-strings preserves the exact formatting semantics (2 decimal places) while improving readability and performance. The expressiontotal_flops / latency * 1e-9is correctly embedded in the f-strings without any functional changes.tilelang/carver/roller/rasterization.py (1)
92-92: LGTM! Clean modernization to f-string.The conversion from
.format()to f-string is correct and aligns with the PR's pyupgrade objective. The variablepanel_widthis guaranteed to be defined at this point (handled by the None-check on lines 88-89), and the generated CUDA code maintains proper semicolon syntax.tilelang/language/proxy.py (5)
4-4: LGTM! Clean import modernization.Correctly removed
Optional,Tuple, andUnionfrom typing imports since they're replaced with Python 3.10+ built-in syntax (tuple,X | Y) throughout the file.
146-146: LGTM! Type hint modernization.Correctly updated from
Tuple[Any]totuple[Any]per Python 3.10+ standards.
154-154: LGTM! Enhanced type safety.The added type annotation
tuple[Any] | PrimExpr | intcorrectly captures the method's overloaded parameter using modern union syntax, aligning with the runtime check at lines 158-159.
175-176: LGTM! Consistent type hint modernization.Both
shapeandstridesparameters correctly updated to usetuple[Any].
276-276: LGTM! Modern union syntax.Correctly updated from
Optional[str]tostr | None, using the modern union operator. Thefrom __future__ import annotationsat line 3 ensures runtime compatibility while enabling this modern syntax.examples/flash_attention/example_mha_fwd_bhsd.py (1)
206-210: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions. The output formatting remains identical.examples/flash_attention/example_mha_bwd_bhsd.py (2)
264-264: LGTM! Valid generator expression unpacking.Replacing the list comprehension with a generator expression is valid—Python consumes the generator during tuple unpacking. This is slightly more memory efficient while maintaining identical behavior.
342-346: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_mha_fwd_bshd.py (1)
193-197: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py (2)
404-404: LGTM! Valid generator expression unpacking.Replacing the list comprehension with a generator expression is valid and slightly more memory efficient.
542-546: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (2)
273-273: LGTM! Valid generator expression unpacking.Replacing the list comprehension with a generator expression is valid and slightly more memory efficient.
349-353: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_gqa_fwd_bshd.py (1)
251-255: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (1)
198-202: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions.examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py (1)
211-215: LGTM! Clean f-string modernization.The conversion from
.format()to f-strings improves readability and follows modern Python conventions. All changes in this PR consistently apply Python 3 modernization best practices.examples/flash_attention/example_gqa_bwd.py (2)
382-382: LGTM: Memory-efficient unpacking.The generator expression is more efficient than a list comprehension for unpacking, eliminating the intermediate list object.
520-524: LGTM: Modern string formatting.The f-string conversions improve readability and performance compared to
.format()style.examples/linear_attention/example_mamba_chunk_state.py (2)
171-175: LGTM! Clean f-string conversion.The migration from
.format()to f-strings is correct and improves readability. These changes align with the PR's objective of enabling the pyupgrade linter in ruff.
43-43: Approve parentheses removal in decay_states
This change only removes redundant parentheses around the subtraction; functionality is unchanged.tilelang/jit/adapter/utils.py (2)
1-4: LGTM! Import cleanup aligns with modern Python.The removal of
Dictfrom typing imports is correct, as the built-indictgeneric can be used directly with Python 3.9+.
110-110: LGTM! Type annotation modernization is correct.The signature update correctly uses modern Python type hint syntax. The Python version verification requested for
get_annotated_mod(lines 67-72) also applies to this change.tilelang/jit/kernel.py (1)
20-20: LGTM! Modernized class declaration.Removed explicit
objectbase class, which is implicit in Python 3+. This aligns with the pyupgrade rule UP004.tilelang/intrinsics/wgmma_macro_generator.py (1)
408-408: LGTM! Modernized string formatting.Converted to f-string for better readability and performance. This aligns with the pyupgrade rule UP032.
tilelang/carver/arch/metal.py (1)
1-1: LGTM! Enabled postponed annotation evaluation.Added
from __future__ import annotationsto enable PEP 563, which supports the modern union type syntax (Target | stron line 12) and improves performance by deferring annotation evaluation.examples/bitnet-1.58b/modeling_bitnet.py (1)
1515-1515: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.tilelang/jit/adapter/cython/adapter.py (3)
105-105: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.
118-118: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.
135-135: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.tilelang/carver/roller/bestfit.py (1)
20-20: LGTM! Modernized string formatting.Converted to f-string for better readability and performance. This aligns with the pyupgrade rule UP032.
examples/bitnet-1.58b/maint/create_bitblas_ckpt.py (1)
88-88: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.tilelang/version.py (1)
24-24: LGTM! Removed redundant file mode.The default mode for
open()is'r'(read text), so explicitly specifying it is unnecessary. This aligns with the pyupgrade rule UP015.tilelang/carver/arch/cuda.py (1)
94-94: LGTM! Python 3 modernization.Removing the explicit
objectbase class is correct. In Python 3, all classes implicitly inherit fromobject, making this explicit declaration redundant.tilelang/carver/roller/hint.py (1)
150-150: LGTM! Python 3 modernization.The removal of the explicit
objectbase class is appropriate for Python 3, where all classes implicitly inherit fromobject.examples/hadamard_transform/example_hadamard.py (1)
154-154: LGTM! F-string modernization.The conversion from
.format()to f-string is a good modernization. F-strings are more readable and slightly more performant in Python 3.6+.examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
37-37: LGTM! Simplified file opening.Removing the explicit
"r"mode is appropriate since it's the default foropen(). This makes the code cleaner without changing behavior.tilelang/quantize/lop3.py (1)
1189-1191: LGTM! F-string modernization.The conversion to f-strings improves readability. The error message and function name construction are now clearer and more idiomatic.
tilelang/jit/adapter/libgen.py (1)
32-32: LGTM! Python 3 modernization.Removing the explicit
objectbase class aligns with Python 3 best practices, where all classes implicitly inherit fromobject.tilelang/cache/kernel_cache.py (1)
366-366: LGTM! Simplified file opening.Omitting the explicit
"r"mode is appropriate since it's the default foropen(). This simplifies the code without affecting behavior.tilelang/autotuner/param.py (3)
249-249: LGTM! Simplified file opening.Removing the explicit
"r"mode is correct since it's the default foropen(). This makes the code cleaner.
318-318: LGTM! Simplified file opening.Omitting the explicit
"r"mode is appropriate and aligns with Python best practices.
330-330: LGTM! Simplified file opening.The removal of the explicit
"r"mode is correct and makes the code more concise.tilelang/intrinsics/mma_macro_generator.py (2)
28-28: LGTM! Python 3 modernization.Removing the explicit
objectbase class is the recommended style for Python 3, as all classes implicitly inherit fromobject.
521-521: LGTM! Improved string formatting.The f-string provides the same functionality as the previous
.format()call with clearer syntax.docs/conf.py (2)
4-4: LGTM! Modernized string formatting.The f-string provides the same result as the previous
%formatting with improved readability.
7-7: LGTM! Simplified file opening.Removing the explicit
"r"mode is appropriate since text read mode is the default foropen().examples/norm/rms_norm.py (1)
74-76: LGTM! Improved print formatting.The f-string conversion maintains the same output format while improving code readability.
examples/attention_sink/example_gqa_sink_bwd_bhsd.py (3)
2-2: LGTM! Enables modern type hint syntax.Adding
from __future__ import annotationsenables PEP 563 postponed evaluation of annotations, allowing the use of modern type hint syntax likeint | Noneon Line 389.
361-361: LGTM! Memory-efficient unpacking.The generator expression is more memory-efficient than the list comprehension since values are consumed immediately during unpacking. The behavior is identical for this use case.
488-492: LGTM! Improved string formatting.The f-string conversions maintain the same output format with better readability.
tilelang/intrinsics/mfma_macro_generator.py (1)
14-14: LGTM! Python 3 modernization.Removing the explicit
objectbase class aligns with Python 3 conventions, as all classes implicitly inherit fromobject.tilelang/carver/roller/shape_inference/common.py (1)
7-7: LGTM! Python 3 modernization.Removing explicit
objectbase classes aligns with Python 3 conventions, as all classes implicitly inherit fromobject.Also applies to: 21-21
examples/amd/example_amd_flash_attn_bwd.py (2)
247-247: LGTM! Memory-efficient unpacking.The generator expression is more memory-efficient than a list comprehension since values are consumed immediately during unpacking. The behavior is identical for this use case.
346-350: LGTM! Improved string formatting.The f-string conversions maintain the same output format with better readability and performance.
tilelang/tileop/gemm/gemm_base.py (1)
11-12: Ensure subclasses ofGemmBaseremain instantiable after removing@dataclass.Removing
@dataclassdrops the generated__init__, so any subclass without its own constructor will inheritobject.__init__and break when passedgemm_node. We found no direct instantiations ofGemmBaseor its subclasses (tilelang/tileop/gemm/gemm_wgmma.py,tilelang/tileop/gemm/gemm_mma.py,tilelang/primitives/gemm/gemm_mma.py), but none define__init__. Confirm these classes aren’t instantiated at runtime or add explicit__init__methods.setup.py (1)
140-140: LGTM!The removal of explicit
"r"mode fromopen()calls is a safe modernization since text mode is the default in Python 3. These changes align with enabling the pyupgrade linter.Also applies to: 171-171, 711-712, 723-724, 740-741
tilelang/carver/roller/node.py (2)
32-32: LGTM!Removing explicit
objectbase classes is correct for Python 3, where all classes implicitly inherit fromobject.Also applies to: 93-93
304-304: Verify that instance lifetime justifies method-level caching.The static analysis tool (B019) flags that using
@functools.lru_cacheon methods can prevent garbage collection of instances. The cache holds references toself, which can lead to memory leaks if instances are meant to be short-lived.If
PrimFuncNodeinstances have long lifetimes or are reused frequently, this is acceptable. Otherwise, consider using a separate cache or@lru_cachewithmaxsizeto limit memory growth.Also applies to: 421-421
tilelang/carver/roller/shape_inference/tir.py (1)
50-50: LGTM!Removing explicit
objectbase classes is correct for Python 3.Also applies to: 79-79
tilelang/jit/adapter/wrapper.py (4)
179-179: LGTM!Removing explicit
objectbase classes is correct for Python 3.Also applies to: 895-895, 1062-1062
325-326: LGTM!The f-string conversion for CUDA kernel launch configuration is correct and more readable.
350-351: LGTM!The f-string conversion for kernel launch code is correct and improves readability.
993-993: LGTM!The f-string conversion is correct.
|
@XuehaiPan Would you mind help review this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Some small nits:
-
Add
__future__imports only when needed. -
We'd best add the
tool.ruff.target-versiontopyproject.tomlwhenUPcode is enabled.[tool.ruff] target-version = "py38"
|
Thanks! @XuehaiPan
This should be covered by the added
I steal a commit from your PR #955 to resolve that. |
7273746 to
391ddc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 22
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (19)
tilelang/language/persistent.py (1)
18-18: Update docstring to match the modernized type annotation.The docstring still references
List[tir.PrimExpr], but the function signature now useslist[tir.PrimExpr]. Update the docstring for consistency.Apply this diff:
- domain : List[tir.PrimExpr] + domain : list[tir.PrimExpr]tilelang/primitives/gemm/base.py (1)
229-239: Update the docstring to match the new type annotation syntax.The method parameter type was correctly updated to
int | None, but the docstring at line 239 still referencesOptional[int]. Please update the docstring to maintain consistency.Apply this diff to update the docstring:
Parameters ---------- - threads : Optional[int] + threads : int | None The total number of threads in a block. Must be provided if any block partition parameter is not already set.tilelang/language/frame.py (1)
201-210: Update docstring to match the new type annotation syntax.The return type annotation on line 201 uses the modern
PrimExpr | Nonesyntax, but the docstring on line 208 still referencesOptional[PrimExpr]. For consistency, update the docstring to use the new union syntax.Apply this diff to update the docstring:
"""Get the value bound to a variable in the current let frame stack. Args: var (Var): The variable to look up Returns: - Optional[PrimExpr]: The bound value if found, None otherwise + PrimExpr | None: The bound value if found, None otherwise """examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
220-232: Restore the explicitNonecheck forimages.Switching the guard to
if images:makes an empty list skip the length assertion, yet the code still doesimages[i]later whenimages is not None, raising anIndexError. Revert to the explicitNonecheck to keep the old, safe behavior.- if images: + if images is not None: assert len(prompts) == len(images)tilelang/language/tir/ir.py (2)
10-32: Fix implicit Optional violations and update docstring.Three issues need attention:
- Line 11:
stop: PrimExpr = Noneviolates PEP 484 (implicit Optional). Should bePrimExpr | None = None.- Line 13:
annotations: dict[str, Any] = Noneviolates PEP 484. Should bedict[str, Any] | None = None.- Line 24: Docstring still references
Dict[str, Any]instead ofdict[str, Any].Apply this diff:
-def serial(start: PrimExpr, - stop: PrimExpr = None, +def serial(start: PrimExpr, + stop: PrimExpr | None = None, *, - annotations: dict[str, Any] = None) -> frame.ForFrame: + annotations: dict[str, Any] | None = None) -> frame.ForFrame: """The serial For statement. Parameters ---------- start : PrimExpr The minimum value of iteration. stop : PrimExpr The maximum value of iteration. - annotations : Dict[str, Any] + annotations : dict[str, Any] | None The optional annotations of the For statement.
35-57: Apply the same fixes to remaining functions.The
parallel,vectorized,unroll, andthread_bindingfunctions have the same three issues asserial:
stop: PrimExpr = None→stop: PrimExpr | None = Noneannotations: dict[str, Any] = None→annotations: dict[str, Any] | None = None- Docstrings reference
Dict[str, Any]→ update todict[str, Any] | NoneFor
parallel(lines 35-57):-def parallel(start: PrimExpr, - stop: PrimExpr = None, +def parallel(start: PrimExpr, + stop: PrimExpr | None = None, *, - annotations: dict[str, Any] = None) -> frame.ForFrame: + annotations: dict[str, Any] | None = None) -> frame.ForFrame: """The parallel For statement. Parameters ---------- start : PrimExpr The minimum value of iteration. stop : PrimExpr The maximum value of iteration. - annotations : Dict[str, Any] + annotations : dict[str, Any] | None The optional annotations of the For statement.For
vectorized(lines 60-82):-def vectorized(start: PrimExpr, - stop: PrimExpr = None, +def vectorized(start: PrimExpr, + stop: PrimExpr | None = None, *, - annotations: dict[str, Any] = None) -> frame.ForFrame: + annotations: dict[str, Any] | None = None) -> frame.ForFrame: """The vectorized For statement. Parameters ---------- start : PrimExpr The minimum value of iteration. stop : PrimExpr The maximum value of iteration. - annotations : Dict[str, Any] + annotations : dict[str, Any] | None The optional annotations of the For statement.For
unroll(lines 85-107):-def unroll(start: PrimExpr, - stop: PrimExpr = None, +def unroll(start: PrimExpr, + stop: PrimExpr | None = None, *, - annotations: dict[str, Any] = None) -> frame.ForFrame: + annotations: dict[str, Any] | None = None) -> frame.ForFrame: """The unrolled For statement. Parameters ---------- start : PrimExpr The minimum value of iteration. stop : PrimExpr The maximum value of iteration. - annotations : Dict[str, Any] + annotations : dict[str, Any] | None The optional annotations of the For statement.For
thread_binding(lines 110-138):def thread_binding( start: PrimExpr, - stop: PrimExpr = None, - thread: str = None, + stop: PrimExpr | None = None, + thread: str | None = None, *, - annotations: dict[str, Any] = None, + annotations: dict[str, Any] | None = None, ) -> frame.ForFrame: """The thread-binding For statement. Parameters ---------- start : PrimExpr The minimum value of iteration. stop : PrimExpr The maximum value of iteration. thread : str The thread for loop variable to bind. - annotations : Dict[str, Any] + annotations : dict[str, Any] | None The optional annotations of the For statement.Note:
thread_bindingalso hasthread: str = Nonethat needs fixing.Also applies to: 60-82, 85-107, 110-138
examples/fusedmoe/example_fusedmoe_torch.py (1)
10-16: Don't use PEP 604/585 syntax under a Python 3.8 target.The new annotations (
int | None,tuple[...], baredict) rely on PEP 604/585 features that aren’t available on Python 3.8. Even withfrom __future__ import annotations, any runtime evaluation of these hints (e.g.typing.get_type_hints) will raise, breaking the stated goal of keeping 3.8 compatibility. Please revert to thetyping.Optional,typing.Tuple, andtyping.Dictforms (or equivalent) in this module before enabling the lint.Apply this diff:
-from __future__ import annotations +from __future__ import annotations +from typing import Dict, Optional, Tuple @@ - def __init__(self, config: dict, d_expert: int | None = None): + def __init__(self, config: Dict, d_expert: Optional[int] = None): @@ - def __init__(self, config: dict): + def __init__(self, config: Dict): @@ - def forward(self, x: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]: + def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: @@ - def __init__(self, config: dict): + def __init__(self, config: Dict): @@ -def ref_kernel(data: tuple[torch.Tensor, dict, dict]) -> torch.Tensor: +def ref_kernel(data: Tuple[torch.Tensor, Dict, Dict]) -> torch.Tensor: @@ - seed: int) -> tuple[torch.Tensor, dict, dict]: + seed: int) -> Tuple[torch.Tensor, Dict, Dict]:Also applies to: 37-38, 47-48, 100-146
tilelang/language/experimental/gemm_sp.py (1)
45-56: Type hint modernized. Update docstring and consider adding return type annotation.The PEP 604 union syntax is correctly applied to the parameter type hint.
However, the docstrings (lines 49, 52) still reference the old
Union[tir.Buffer, tir.Var]syntax. Additionally, consider adding a return type annotation for completeness.Apply this diff to update the docstrings and add the return type annotation:
- def legalize_arguments(arg: tir.Buffer | tir.Var): + def legalize_arguments(arg: tir.Buffer | tir.Var) -> tir.Buffer | tir.Var: """Convert let-bound variables to their corresponding buffers. Args: - arg (Union[tir.Buffer, tir.Var]): Input argument to legalize + arg (tir.Buffer | tir.Var): Input argument to legalize Returns: - Union[tir.Buffer, tir.Var]: The legalized argument + tir.Buffer | tir.Var: The legalized argument """tilelang/carver/matmul_analysis.py (1)
337-344: Correct theget_ordered_axesreturn annotationThe helper still builds and returns a list, but the annotation now advertises
set[Var]. This will confuse type-checkers (and readers) because downstream code indexes it (axes[-1]). Please change the return type back tolist[Var]here and in the mirrored helper insideanalysis_tensorcore_tags.- def get_ordered_axes(region: list[Range]) -> set[Var]: + def get_ordered_axes(region: list[Range]) -> list[Var]: axes: list[Var] = []tilelang/language/kernel.py (1)
195-200: threads property includes the block frame; should return only threadIdx.{x,y,z}.This currently returns 4 items (includes the last block frame). Align with get_thread_bindings and the doc intent.
def threads(self) -> list[Var]: """ Returns the thread indices from the topmost frame. """ - return [frame.iter_var.var for frame in self.frames[-4:]] + # Exclude the trailing block frame; only return threadIdx.{x,y,z} + return [frame.iter_var.var for frame in self.frames[-4:-1]]tilelang/engine/phase.py (1)
22-30: Guard None target before calling have_tma.have_tma(None) will raise (accesses target.kind). Add a fast‑path for None.
def allow_tma_and_warp_specialized(pass_ctx: PassContext | None = None, target: Target | None = None) -> bool: if pass_ctx is None: pass_ctx = tilelang.transform.get_pass_context() + if target is None: + return False if not have_tma(target): return False disable_tma_lower = pass_ctx.config.get("tl.disable_tma_lower", False) return not disable_tma_lower and allow_warp_specialized(pass_ctx=pass_ctx, target=target)Reference: have_tma expects target.kind.name (see tilelang/contrib/nvcc.py). [Based on relevant code snippet]
tilelang/carver/arch/arch_base.py (1)
17-34: Duplicate attribute definitions overwrite initial values.The attributes
transaction_sizeandbandwidthare defined twice in the__init__method:
- First at lines 17-25 with comments describing their purpose
- Again at lines 32-34 with comments about units
The second definitions (lines 32, 34) overwrite the first ones (lines 17, 22), making the initial assignments redundant. This appears to be a pre-existing issue that should be addressed.
Remove the duplicate definitions and consolidate into a single set:
self.max_smem_usage: int = 0 # The maximum shared memory usage allowed - self.bandwidth: list[int] = [ - 0, - 0, - ] # Bandwidth specifications, possibly including peak and sustained rates self.platform: str = "unknown" # The platform or manufacturer of the device self.compute_capability: str = ( "unknown" # The compute capability, indicating the feature set and performance level ) self.l2_cache_size_bytes: int = 0 - # the number of transaction size in bytes - self.transaction_size: list[int] = [0, 0] # in bytes - # bandwidth in MB/s, will be used for recommend basic tile size - self.bandwidth: list[int] = [0, 0] + # Consolidate comments: The number of transaction size in bytes + self.transaction_size: list[int] = [0, 0] + # Consolidate comments: Bandwidth in MB/s, will be used for recommend basic tile size (possibly including peak and sustained rates) + self.bandwidth: list[int] = [0, 0]Alternatively, if the second definitions should replace the first, remove lines 17-25:
self.sm_partition: int = 0 # The number of streaming multiprocessor partitions - self.transaction_size: list[int] = [ - 0, - 0, - ] # The size of memory transactions, typically in bytes - self.max_smem_usage: int = 0 # The maximum shared memory usage allowed - self.bandwidth: list[int] = [ - 0, - 0, - ] # Bandwidth specifications, possibly including peak and sustained rates + self.max_smem_usage: int = 0 # The maximum shared memory usage allowed self.platform: str = "unknown" # The platform or manufacturer of the devicetilelang/engine/lower.py (1)
128-133: Remove unusedtargetparameter fromcanon_target_host.The
targetargument isn’t referenced; drop it from the signature and update both call sites.--- a/tilelang/engine/lower.py @@ -128,7 +128,6 @@ def canon_target_host( - def canon_target_host(target: str | Target, target_host: str | Target | None): + def canon_target_host(target_host: str | Target | None): if not target_host: target_host = "llvm" if tvm.runtime.enabled("llvm") else "c" return target_host @@ -216,7 +215,7 @@ def lower(...): - target_host = canon_target_host(target, target_host) + target_host = canon_target_host(target_host)--- a/tilelang/jit/adapter/utils.py @@ -85,7 +85,7 @@ def some_util(...): - target_host = tvm.target.Target.canon_target(canon_target_host(target, target_host)) + target_host = tvm.target.Target.canon_target(canon_target_host(target_host))tilelang/carver/arch/driver/cuda_driver.py (1)
124-143: Don’t silently returnNonefromget_device_attributeThe new
try/exceptnow returnsNoneon failure while the signature remains-> int. Existing callers expect anint(or a raised error) and will now receiveNone, causing downstream breakage or type confusion. Please restore the previous behavior—let the RuntimeError propagate (or at least re-raise it) and keep returning anint, or explicitly change the return type and update every caller to handle theNonecase.tilelang/jit/adapter/nvrtc/adapter.py (1)
163-167: Guardbuffer_maplookup before indexing
PrimFunc.paramsoften include scalar vars in addition to buffers. Those scalars are absent frombuffer_map, so the new direct lookup raises aKeyErrorduring adapter initialisation. Please skip params that aren’t backed by buffers, like the ctypes adapter still does.Apply:
- for i, param in enumerate(params): - buffer = buffer_map[param] + for i, param in enumerate(params): + if param not in buffer_map: + continue + buffer = buffer_map[param]tilelang/jit/adapter/ctypes/adapter.py (2)
226-233: Unpack bug after changing dynamic_symbolic_map to (id, i, j)dynamic_symbolic_map now stores triples (id, buffer_idx, dim). This two-value unpack will raise “too many values to unpack”.
- if isinstance(s, tir.Var): - ref_tensor_idx, ref_shape_idx = self.dynamic_symbolic_map[s] - shape.append(ins[ref_tensor_idx].shape[ref_shape_idx]) + if isinstance(s, tir.Var): + _, ref_tensor_idx, ref_shape_idx = self.dynamic_symbolic_map[s] + shape.append(ins[ref_tensor_idx].shape[ref_shape_idx])
239-245: Use args, not ins, for dynamic symbolics
When appending dynamic dim/stride values, index into args (which includes both inputs and outputs) rather than ins (inputs-only) to support symbols on output buffers. Replace ins[buffer_idx] with args[buffer_idx] in the loop at lines 239–245.tilelang/autotuner/tuner.py (1)
232-256: generate_cache_key return type is wrongFunction returns a hex string, not AutotuneResult | None.
- def generate_cache_key(self, parameters: dict[str, Any]) -> AutotuneResult | None: + def generate_cache_key(self, parameters: dict[str, Any]) -> str:tilelang/jit/adapter/wrapper.py (1)
493-499: Use of undefined variable 'function_name'Inside host_mod loop, function_name is not defined; this will raise at runtime. l2_persistent_map likely applies to all kernels; set the map directly.
- if "l2_persistent_map" in func.attrs: - self.l2_persistent_map[function_name] = func.attrs["l2_persistent_map"] + if "l2_persistent_map" in func.attrs: + self.l2_persistent_map = func.attrs["l2_persistent_map"]
♻️ Duplicate comments (2)
setup.py (1)
312-312: Remove unused noqa (SIM115 not enabled).Ruff flags this as an unused suppression.
- return open(get_path("README.md"), encoding="utf-8").read() # noqa: SIM115 + return open(get_path("README.md"), encoding="utf-8").read()tilelang/carver/roller/node.py (1)
305-307: Restorelru_cache()invocationDropping the parentheses passes the method object in as
maxsize, so definition now raisesTypeError: 'function' object cannot be interpreted as an integer. Please revert to calling the decorator (applies to both cached methods).Fix:
- @functools.lru_cache + @functools.lru_cache() @@ - @functools.lru_cache + @functools.lru_cache()Also applies to: 420-422
🧹 Nitpick comments (23)
examples/dequantize_gemm/example_dequant_gemv_fp16xint4.py (1)
76-76: Remove redundant self-assignment.Line 76 assigns
import_sourceto itself, which has no effect. This line can be safely removed.Apply this diff to remove the redundant line:
assert import_source is not None, "lop3_intrin_info is not found" assert func_name is not None, "lop3_intrin_info is not found" - import_source = import_sourcetilelang/utils/language.py (1)
88-98: Update docstring to match the new type annotation.The function signature correctly uses the modern
list[int]syntax, but the docstring at line 93 still referencesList[int]. For consistency, update the docstring to match the signature.Apply this diff to update the docstring:
Args: - array (List[int]): The array of integers to reduce. + array (list[int]): The array of integers to reduce. Returns:testing/python/kernel/test_tilelang_kernel_fp8_gemv_simt.py (1)
25-26: Type annotation modernization looks good.The migration from
Optional[int]toint | Nonesyntax is correct and aligns with the PR's goal of enabling pyupgrade linter rules. The runtime behavior is unchanged since the defaults and assertions remain the same.Optional refactor: Consider tightening the type hints.
Since both parameters have non-None defaults (4 and 32) and are immediately validated as non-None (lines 28-31), the type hints could be simplified to just
intrather thanint | None:- n_partition: int | None = 4, - reduce_thread: int | None = 32, + n_partition: int = 4, + reduce_thread: int = 32,This would make the signature more accurate—the function doesn't meaningfully accept None since it's rejected immediately. However, this is a pre-existing pattern from the original
Optional[int]annotations and is outside the scope of this linting PR.tilelang/language/warpgroup.py (1)
48-50: Consider simplifying the list construction.The current loop-based approach could be streamlined using a list constructor, though this is beyond the scope of the linting changes.
Apply this diff to simplify:
- warp_group_ids: list[int] = [] - for warp_group_id in warp_group_idx: - warp_group_ids.append(warp_group_id) + warp_group_ids: list[int] = list(warp_group_idx)examples/bitnet-1.58b/vllm_workspace/utils.py (1)
1-2: Clean up unused imports and modernize type aliases for consistency.After updating the function signatures to use built-in
list, theListimport fromtypingon line 2 is no longer used in the function signatures. Additionally, for consistency with the modernized function signatures, the type aliasesTokensText(line 4) andTokensTextLogprobs(line 27) should also be updated to use built-in generics instead ofTupleandListfrom typing.Apply this diff to modernize the type aliases and clean up imports:
-from typing import Dict, List, Tuple +from typing import Dict -TokensText = Tuple[List[int], str] +TokensText = tuple[list[int], str] -TokensTextLogprobs = Tuple[List[int], str, List[Dict[int, float]]] +TokensTextLogprobs = tuple[list[int], str, list[Dict[int, float]]]Note: If
Dictis also unused elsewhere in the codebase, consider removing it too. The modernization todict[int, float]would require postponed evaluation support, which is already enabled via thefrom __future__ import annotationsstatement.tilelang/contrib/cc.py (1)
211-211: LGTM! Type annotation correctly modernized.The change from
typing.Dict[str, str]todict[str, str]is correct and aligns with modern Python typing practices (PEP 585). The future annotations import at line 18 ensures Python 3.8 compatibility.Optional: Consider updating the docstring for consistency.
The docstring at line 224 still uses the old-style
Dict[str, str]notation:Returns ------- symbol_section_map: Dict[str, str] A map from defined global symbol to their sectionsFor consistency with the actual annotation, you could update it to:
Returns ------- - symbol_section_map: Dict[str, str] + symbol_section_map: dict[str, str] A map from defined global symbol to their sectionsexamples/fusedmoe/example_fusedmoe_tilelang.py (4)
274-278: LGTM! Type hints modernized.The conversion to built-in generic types (
dict,int | None) is correct and consistent with PEP 585/604 standards.Optionally, consider making the
dicttype more specific for better type safety:def __init__(self, - config: dict, + config: dict[str, Any], gate: torch.Tensor, up: torch.Tensor, down: torch.Tensor, d_expert: int | None = None):This would require importing
Anyfromtypingif not already imported.
298-298: LGTM! Type hints modernized.The conversion to built-in
dicttype is correct and consistent with PEP 585 standards.Optionally, consider making the
dicttypes more specific:- def __init__(self, config: dict, weights: dict): + def __init__(self, config: dict[str, Any], weights: dict[str, torch.Tensor]):
317-320: LGTM! Type hints modernized.The conversion to built-in
dicttypes is correct and consistent with PEP 585 standards.Optionally, consider making the
dicttypes more specific:def __init__(self, - config: dict, + config: dict[str, Any], shared_kernel: tilelang.JITKernel, routed_kernel: tilelang.JITKernel, - weights: dict, + weights: dict[str, torch.Tensor], padding_M: int = 128):
478-478: LGTM! Type hint modernized.The conversion to built-in
tupletype is correct and consistent with PEP 585 standards.Optionally, consider making the
dicttypes more specific to match the documented structure:-def custom_kernel(data: tuple[torch.Tensor, dict, dict]) -> torch.Tensor: +def custom_kernel(data: tuple[torch.Tensor, dict[str, torch.Tensor], dict[str, Any]]) -> torch.Tensor:examples/cast/example_per_token_cast_to_fp8.py (1)
103-117: All print statements use f-strings; refactor Tuple imports
- No remaining
.format()or%formatting inprint()calls.- Replace
from typing import Tuplewith built-intuple[...]annotations in:
• maint/precision/compare_ops.py
• tilelang/language/ast/ir.py
• examples/deepseek_v32/inference/kernel.py
• examples/deepseek_v32/inference/model.py
• examples/bitnet-1.58b/vllm_workspace/utils.py
• examples/deepseek_v32/utils.pytilelang/contrib/nvrtc.py (1)
16-17: Update docstrings to match modernized type annotations.The parameter type annotations have been correctly modernized to use PEP 604 union syntax (
int | None,str | list[str] | None). However, the docstrings at lines 29 and 32 still reference the old typing notation (Optional[int],Optional[Union[str, List[str]]]).Consider updating the docstring to match the new annotation style for consistency:
- arch : Optional[int] + arch : int | None The cuda architecture code. - options : Optional[Union[str, List[str]]] + options : str | list[str] | None The additional options.tilelang/language/builtin.py (1)
173-173: Consider expanding type hints to match implementation.The type hints for
mbarrier/barrier_idspecifyint | PrimExpr | tir.Call, but the implementations (lines 212-219, 230-237) also accepttir.BufferLoadandtir.Buffer. Consider adding these types to the annotations for more accurate API documentation:-def mbarrier_wait_parity(mbarrier: int | PrimExpr | tir.Call, parity: int | Var): +def mbarrier_wait_parity(mbarrier: int | PrimExpr | tir.Call | tir.Buffer | tir.BufferLoad, parity: int | Var):(Apply similar changes to
mbarrier_arrive,barrier_wait, andbarrier_arrive)Also applies to: 223-223, 266-266, 281-281
setup.py (1)
124-129: Harden requirements parsing and set encoding.Avoid env‑dependent defaults and stray entries in install_requires. Filter comments/empties and set UTF‑8.
-def get_requirements(file_path: str = "requirements.txt") -> list[str]: - """Get Python package dependencies from requirements.txt.""" - with open(get_path(file_path)) as f: - requirements = f.read().strip().split("\n") - return requirements +def get_requirements(file_path: str = "requirements.txt") -> list[str]: + """Get Python package dependencies from requirements.txt.""" + with open(get_path(file_path), encoding="utf-8") as f: + requirements: list[str] = [] + for line in f: + line = line.strip() + if not line or line.startswith("#"): + continue + requirements.append(line) + return requirementstilelang/language/kernel.py (2)
210-214: Fix varargs annotation for blocks.For varargs, the annotation applies to each element. Use tir.PrimExpr, not list[tir.PrimExpr].
-def Kernel( - *blocks: list[tir.PrimExpr], +def Kernel( + *blocks: tir.PrimExpr,
158-164: Docstring doesn’t match return type.Method returns a list of three bindings, not a single dim’s binding.
def get_thread_bindings(self) -> list[Var]: - """ - Returns the thread binding for the given dimension. - dim=0 corresponds to threadIdx.x, dim=1 to threadIdx.y, and dim=2 to threadIdx.z. - """ + """ + Returns all three thread bindings: threadIdx.x, threadIdx.y, threadIdx.z. + """tilelang/language/copy.py (1)
11-16: Include tir.Var in type hints to match runtime handling.Implementation accepts tir.Var (resolved via T.has_let_value/T.get_let_value). Reflect that in signatures.
-def copy(src: tir.Buffer | tir.BufferLoad | tir.BufferRegion, - dst: tir.Buffer | tir.BufferLoad, +def copy(src: tir.Buffer | tir.BufferLoad | tir.BufferRegion | tir.Var, + dst: tir.Buffer | tir.BufferLoad | tir.Var, coalesced_width: int | None = None, disable_tma: bool = False, eviction_policy: Literal["evict_normal", "evict_first", "evict_last"] | None = None):tilelang/carver/template/conv.py (1)
47-59: Restore the TileDevice parameter annotationAll other templates still surface the
arch: TileDeviceannotation, andBaseTemplate.get_hardware_aware_configsdeclares the same signature. Dropping it here hides the contract from static analysis and breaks consistency. Please keep the type annotation (re-importingTileDeviceif necessary) so the override matches the base method.-from ..roller import Hint +from ..arch import TileDevice +from ..roller import Hint @@ - def get_hardware_aware_configs(self, arch=None, topk=10) -> list[Hint]: + def get_hardware_aware_configs(self, arch: TileDevice = None, topk: int = 10) -> list[Hint]:tilelang/carver/arch/cuda.py (1)
137-138: Fix theavailable_tensor_instructionsannotationThe attribute is annotated as
list[TensorInstruction]but initialized toNone, which violates the declared type and will trip stricter type checkers now that we’ve enabled the new linting. Please declare it aslist[TensorInstruction] | None(or initialize it with an empty list) to keep the annotation truthful.tilelang/jit/adapter/ctypes/adapter.py (1)
102-106: Check library init result and surface errorFor parity with the cython adapter and better diagnostics, check init() return and expose get_last_error.
- self.lib = self.lib_generator.load_lib() - self.lib.init() + self.lib = self.lib_generator.load_lib() + self.lib.get_last_error.restype = ctypes.c_char_p + _res = self.lib.init() + if _res != 0: + _err = self.lib.get_last_error().decode("utf-8") + raise RuntimeError(f"Initialization failed: {_err}")tilelang/jit/adapter/cython/adapter.py (3)
378-393: buffer_dtype_map key type annotation mismatchKeys are buffer names (str), not tir.Var. Fix annotations.
- def _process_buffer_dtype(self) -> dict[tir.Var, tuple[int, torch.dtype]]: + def _process_buffer_dtype(self) -> dict[str, tuple[int, torch.dtype]]: @@ - buffer_dtype_map = {} + buffer_dtype_map: dict[str, tuple[int, torch.dtype]] = {}
408-412: _process_static_buffer_infos return type annotations incorrectMaps keyed by buffer.name (str) and static_contiguous_list holds (index, name).
- def _process_static_buffer_infos(self) -> \ - tuple[dict[tir.Var, tuple[int, list[tuple[int, int]]]], - dict[tir.Var, tuple[int, list[tuple[int, int]]]], - list[tuple[tir.Var]]]: + def _process_static_buffer_infos(self) -> \ + tuple[dict[str, tuple[int, list[tuple[int, int]]]], + dict[str, tuple[int, list[tuple[int, int]]]], + list[tuple[int, str]]]:
442-467: buffer_device_map key type annotation mismatchSame as others: key is buffer name (str).
- def _process_buffer_device(self) -> dict[tir.Var, tuple[int, torch.device]]: + def _process_buffer_device(self) -> dict[str, tuple[int, torch.device]]: @@ - buffer_device_map = {} + buffer_device_map: dict[str, tuple[int, torch.device]] = {}
013834f to
94610e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
benchmark/matmul/benchmark_matmul_intrinsic.py (1)
305-305: Remove hardcoded override of user input.Line 305 unconditionally sets
with_roller = True, overriding the CLI argument parsed on line 304. This makes the argparse setup (lines 290-295) and user input meaningless.Given the PR's scope (linting/formatting), this appears to be a debugging artifact that was unintentionally committed.
Apply this diff to remove the hardcoded override:
accum_dtype = "float32" if in_dtype == "int8" else "float16" with_roller = args.with_roller - with_roller = True # Compute total floating-point operations total_flops = 2 * M * N * Kexamples/gdn/example_chunk_delta_h.py (1)
10-17: Critical: Incomplete fallback for missing fla dependency.When
flaimport fails,chunk_gated_delta_rule_fwd_his undefined, causing aNameErrorat line 299. The error message "fla not found, using tilelang implementation" is misleading—no actual fallback occurs.Apply this diff to either skip the reference comparison gracefully or provide a proper fallback:
Option 1: Skip reference comparison when fla is unavailable
try: import fla print(fla.__file__) from fla.ops.common.chunk_delta_h import chunk_gated_delta_rule_fwd_h + FLA_AVAILABLE = True except ImportError: print("fla not found, using tilelang implementation") fla = None + FLA_AVAILABLE = FalseThen at line 298, add:
- # fla ref - h_ref, V_new_ref, final_state_ref = chunk_gated_delta_rule_fwd_h(K, W, U, G, initial_state, - store_final_state, chunk_size, - save_new_value) + # fla ref + if FLA_AVAILABLE: + h_ref, V_new_ref, final_state_ref = chunk_gated_delta_rule_fwd_h(K, W, U, G, initial_state, + store_final_state, chunk_size, + save_new_value) + else: + print("Skipping reference comparison (fla not available)") + h_ref, V_new_ref, final_state_ref = None, None, NoneAnd update the correctness checks (lines 343-386) to skip when
h_ref is None.examples/blocksparse_gemm/example_blocksparse_gemm.py (2)
57-69: Fix: Python if on CUDA bool tensor in ref pathUsing a CUDA bool tensor in an if causes errors or sync. Convert mask to CPU (once) and compare Python bools.
Apply this diff:
def ref_program(A, B, BlockMask, block_M, block_N, block_K): - ref_c = torch.zeros((M, N), dtype=torch.float16, device=A.device) + ref_c = torch.zeros((M, N), dtype=torch.float16, device=A.device) + mask_cpu = BlockMask.bool().cpu() @@ - for k in range(K // block_K): - if BlockMask[i, j, k]: + for k in range(K // block_K): + if mask_cpu[i, j, k].item(): accu += A[i * block_M:(i + 1) * block_M, k * block_K:(k + 1) * block_K].to( torch.float32) @ B[k * block_K:(k + 1) * block_K, j * block_N:(j + 1) * block_N].to(torch.float32)
72-88: Fix: BlockMask generation uses invalid device and cross-device assignment
- device=torch.cuda.current_device() passes an int; needs torch.device.
- torch.rand(...) creates CPU tensor; assigning to CUDA tensor fails.
- Prefer robust BlockMask detection via dtype+rank and ensure shape is a tuple.
Apply this diff:
-def supply_program(params: list[KernelParam]): - input_tensors = [] - - for p in params: - # Check if the kernel parameter is BlockMask tensor. - # Here, BlockMask is uniquely identified by having 3 dimensions. - if len(p.shape) != 3: - # For non-BlockMask tensors, use the default tensor generation logic. - input_tensors.append(default_tensor_supply(p)) - else: - # For BlockMask tensor, randomly set elements to True based on desired - # sparsity level. - block_mask = torch.zeros(p.shape, dtype=torch.bool, device=torch.cuda.current_device()) - block_mask[:, :, :] = torch.rand(p.shape) > sparsity - input_tensors.append(block_mask) - - return input_tensors +def supply_program(params: list[KernelParam]): + inputs: list[torch.Tensor] = [] + device = torch.device("cuda", torch.cuda.current_device()) if torch.cuda.is_available() \ + else torch.device("cpu") + for p in params: + # Identify BlockMask robustly: boolean 3D tensor + if hasattr(p, "is_boolean") and p.is_boolean() and len(p.shape) == 3: + shape = tuple(p.shape) # torch expects a tuple of ints + mask = torch.rand(shape, device=device) > sparsity + inputs.append(mask) + else: + inputs.append(default_tensor_supply(p)) + return inputsNote: If the autotuner invokes this function, this fix prevents CUDA/CPU mismatches during tuning. See KernelParam.is_boolean() in tilelang/engine/param.py. [Based on relevant code snippets]
examples/deepseek_nsa/example_tilelang_nsa_bwd.py (4)
6-6: Remove duplicate importRedundant second
import torch. Drop it.-import torch
206-206: Fix undefined H and wrong grid split in bwd_dkv kernel
His not defined; useheads_kv = heads // groups. Also drop stray print.- print("NV", NV, "NS", NS, "B", B, "H", H) + # print removed: avoid debug noise - with T.Kernel(NV, NS, B * H, threads=num_threads) as (i_v, i_s, i_bh): + with T.Kernel(NV, NS, B * heads_kv, threads=num_threads) as (i_v, i_s, i_bh): - i_b, i_h = i_bh // H, i_bh % H + i_b, i_h = i_bh // heads_kv, i_bh % heads_kvAlso applies to: 220-220, 239-239
387-387: Fix undefined H and wrong grid split in bwd_dqkv kernelUse
heads_kvto size grid and spliti_bh.- with T.Kernel(NV, NS, B * H, threads=num_threads) as (i_v, i_s, i_bh): + with T.Kernel(NV, NS, B * heads_kv, threads=num_threads) as (i_v, i_s, i_bh): - i_b, i_h = i_bh // H, i_bh % H + i_b, i_h = i_bh // heads_kv, i_bh % heads_kvAlso applies to: 407-407
537-545: Fix undefinedblock_countsin block_mask and robustify call site
block_countsis referenced but not in scope; add it to the wrapper signature and computeUSE_BLOCK_COUNTSfrom it.- Call site unconditionally calls
.to(); guard for None/int.-def tilelang_kernel_block_mask( - batch, - heads, - seq_len, - selected_blocks, - block_size, - dtype="int32", -): +def tilelang_kernel_block_mask( + batch, + heads, + seq_len, + selected_blocks, + block_size, + dtype="int32", + block_counts=None, +): @@ - USE_BLOCK_COUNTS = block_counts is not None + USE_BLOCK_COUNTS = block_counts is not NoneUpdate the call site:
- block_mask = tilelang_kernel_block_mask(B, H, T, S, - BS)(block_indices.to(torch.int32), - block_counts.to(torch.int32)).to(torch.bool) + bc_arg = ( + block_counts.to(torch.int32) + if isinstance(block_counts, torch.Tensor) + else torch.empty(0, dtype=torch.int32, device=block_indices.device) + ) + block_mask = tilelang_kernel_block_mask(B, H, T, S, BS, block_counts=block_counts)( + block_indices.to(torch.int32), bc_arg + ).to(torch.bool)Also applies to: 554-554, 610-613
examples/deepseek_nsa/example_triton_nsa_bwd.py (1)
354-355: Use bitwise boolean ops in Triton expressionsPython
andon Triton tensors is invalid; use elementwise&with parentheses.- b_p_swa = tl.where((i >= o_s and (i - WS) < o_s)[:, None], b_p_swa, 0) + b_p_swa = tl.where(((i >= o_s) & ((i - WS) < o_s))[:, None], b_p_swa, 0)
🧹 Nitpick comments (23)
examples/bitnet-1.58b/load_from_quantized.py (1)
52-52: Outer parentheses are unnecessary.The parentheses wrapping the entire expression add visual noise without improving readability or changing behavior. Consider removing them for cleaner code.
Apply this diff:
- qmodel = (BitnetForCausalLM.from_quantized(saved_model_path,).cuda().half()) + qmodel = BitnetForCausalLM.from_quantized(saved_model_path).cuda().half()Note: Also removed the trailing comma in
from_quantized(saved_model_path,)since it's unnecessary for a single-argument call.examples/blocksparse_attention/block_sparse_attn_triton.py (1)
201-213: Consider prefixing unusedctxparameter with underscore.The multi-line formatting improves readability. However, the
ctxparameter is unused throughout the function body. Since this helper is called from_sparse_attention.forward(line 262) andctxis never utilized (backward pass is not implemented), consider prefixing it with an underscore (_ctx) to indicate it's intentionally unused and silence the linter warning.Apply this diff if you want to silence the linter warning:
def _forward( - ctx, + _ctx, q, k, v,examples/flash_decoding/example_gqa_decode.py (1)
475-475: Consider defining a custom exception class (optional).Static analysis flags TRY003 on this line. While the current error message is concise and clear, defining a custom exception class (e.g.,
class SimilarityError(AssertionError)) would align with the TRY003 guideline. However, this is a minor style concern and may be overkill for an example/benchmark file.examples/deepseek_v32/fp8_lighting_indexer.py (1)
1-1: Consider removing or making the noqa directive more specific.The blanket
# ruff: noqadisables all ruff checks for this file, which seems to conflict with the PR's goal of enabling pyupgrade rules. If specific rules need to be suppressed, consider using targeted ignores like# ruff: noqa: UP001, UP032instead.examples/convolution/example_convolution.py (1)
50-51: Consider removing unused parameter overrides.The
dtypeandaccum_dtypeparameters are immediately overridden with hardcoded values, making the function parameters ineffective. Since you're updating the function signature anyway, consider either:
- Removing these parameters entirely (if always hardcoded), or
- Removing lines 50-51 to honor the passed values
Note: This pattern also appears in
example_convolution_autotune.py(lines 117-118), so you may want to address it consistently across related files.examples/deepseek_mla/example_mla_decode_paged.py (1)
313-328: Note: Unused parameters in reference implementation.The
block_tableandblock_sizeparameters are unused in the reference implementationrun_torch_mla. The function uses direct indexing (i * max_seqlen_pad) rather than block table lookups.These parameters are likely kept for API consistency with
run_tilelang_mla, which does use them. This is acceptable for maintaining a uniform interface, though you could consider documenting this or using leading underscore naming (e.g.,_block_table) to indicate intentionally unused parameters.If you want to explicitly mark them as intentionally unused:
def run_torch_mla( q, - block_table, + _block_table, # unused, kept for API consistency blocked_k, max_seqlen_pad, - block_size, + _block_size, # unused, kept for API consistency b, s_q, cache_seqlens, h_q, h_kv, d, dv, causal, dtype, ):Based on learnings
examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py (1)
66-93: Comprehensive documentation added.The new docstring provides thorough documentation of the function's parameters, behavior, and requirements. While technically correct and informative, the 28-line docstring is quite verbose for an example file. Consider whether a more concise summary would suffice, reserving this level of detail for core library functions.
examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1)
206-215: LGTM! Consider fixing the comment typo.The type annotation modernization from
Optional[int]toint | Noneis correct and aligns with PEP 604. The public API signature change is backward-compatible at runtime.Minor: Line 215 has a typo in the comment:
function'sinterfaceshould befunction's interface(missing space).examples/attention_sink/example_mha_sink_fwd_bhsd.py (1)
169-169: LGTM! Consider fixing the comment typo.The type annotation update and
T.alloc_localusage are correct. The public API signature change maintains runtime compatibility.Minor: Line 202 has the same typo as the first file:
function's interfaceis missing a space between "function's" and "interface".Also applies to: 193-202
examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
366-367: Verify the defensive getattr change and ensure consistency.This change from direct attribute access to
getattr()with a default is a logic improvement but appears unrelated to the pyupgrade linting objectives. While the defensive coding is good practice, note that Line 319 uses a similar pattern without this defensive check:if self.model.get_output_embeddings().bias is not None:For consistency, consider updating Line 319 similarly, or clarify whether this change addresses a specific issue with certain model types that lack a
biasattribute.Consider applying the same defensive pattern to Line 319:
-if self.model.get_output_embeddings().bias is not None: +if getattr(self.model.get_output_embeddings(), "bias", None) is not None:examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py (1)
1-351: Consider separating functional changes from linting fixes.This PR mixes functional changes (lines 30-35:
pass_configsaddition) with linting/formatting improvements. While the linting changes are appropriate for a "Enable pyupgrade linter" PR, functional changes that alter behavior should ideally be in separate commits or PRs for easier review and potential rollback.examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (1)
415-415: Remove unnecessary int() cast.The value is already an integer from
math.ceil(), which returns an int in Python 3. Theint()cast is redundant.Apply this diff:
- max_selected_blocks = int(math.ceil(max_cache_seqlen * (1 - sparse_ratio) / block_size)) + max_selected_blocks = math.ceil(max_cache_seqlen * (1 - sparse_ratio) / block_size)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (1)
418-418: Remove unnecessary int() cast.
math.ceil()already returns an int in Python 3.Apply this diff:
- max_selected_blocks = int(math.ceil(max_cache_seqlen * (1 - sparse_ratio) / block_size)) + max_selected_blocks = math.ceil(max_cache_seqlen * (1 - sparse_ratio) / block_size)examples/fusedmoe/example_fusedmoe_tilelang.py (1)
7-8: Consider replacing star imports.Static analysis flags these star imports as they make it difficult to track which names are imported and can lead to namespace pollution.
If these modules have a limited set of commonly used exports, consider explicit imports:
-from tilelang.autotuner import * -from example_fusedmoe_torch import * +from tilelang.autotuner import <specific items> +from example_fusedmoe_torch import <specific items>Alternatively, use qualified imports:
-from tilelang.autotuner import * -from example_fusedmoe_torch import * +import tilelang.autotuner +import example_fusedmoe_torchexamples/blocksparse_gemm/example_blocksparse_gemm.py (1)
93-105: Rename “enable_rasteration” → “enable_rasterization” for consistencySpelling is inconsistent with DEFAULT_ENABLE_RASTERIZATION and typical terminology. Rename for clarity and avoid future confusion.
Apply these diffs within this file’s changed regions:
def blocksparse_matmul( @@ - enable_rasteration, + enable_rasterization, @@ - T.use_swizzle(panel_size=10, enable=enable_rasteration) + T.use_swizzle(panel_size=10, enable=enable_rasterization)- kernel = blocksparse_matmul( + kernel = blocksparse_matmul( @@ - enable_rasteration=DEFAULT_ENABLE_RASTERIZATION, + enable_rasterization=DEFAULT_ENABLE_RASTERIZATION,Also update get_configs for consistency (outside the changed hunk; example snippet):
enable_rasterization = [True, False] # ... { # ... "enable_rasterization": c[5], }Also applies to: 168-169
examples/flash_attention/example_mha_bwd.py (1)
290-299: Place scalar on the same device or use a Python scalar
Replacescores = scores / torch.sqrt(torch.tensor(dim, dtype=scores.dtype))with either
scores = scores / torch.sqrt(torch.tensor(dim, dtype=scores.dtype, device=scores.device))or
scores = scores / (dim ** 0.5)to avoid implicit CPU→CUDA copies.
examples/bitnet-1.58b/benchmark_inference_latency.py (1)
15-16: Drop unnecessary NumPy import and mean;timesis already a scalar.Simplify and avoid an extra dependency.
def profile(model, input_data): import time - - import numpy as np @@ - return np.mean(times) + return timesAlso applies to: 34-34
examples/bitnet-1.58b/tokenization_bitnet.py (1)
326-337: Align return type with implementation (can return None).save_vocabulary returns early without a value on invalid dir (Line 339), conflicting with
-> tuple[str].-def save_vocabulary(self, save_directory, filename_prefix: str | None = None) -> tuple[str]: +def save_vocabulary(self, save_directory, filename_prefix: str | None = None) -> tuple[str] | None:examples/bitnet-1.58b/modeling_bitnet.py (3)
371-382: Silence unused-argument lint while preserving HF API.use_cache/kwargs are intentionally unused for signature compatibility. Add deletions to satisfy Ruff ARG002.
def forward( self, hidden_states: torch.Tensor, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_value: Cache | None = None, output_attentions: bool = False, use_cache: bool = False, cache_position: torch.LongTensor | None = None, **kwargs, ) -> tuple[torch.Tensor, torch.Tensor | None, tuple[torch.Tensor] | None]: + # Unused; kept for HF interface compatibility. + del use_cache, kwargs bsz, q_len, _ = hidden_states.size()[Based on static analysis hints]
524-535: Apply same ARG002 fix in fused attention.def forward( self, hidden_states: torch.Tensor, attention_mask: torch.Tensor | None = None, position_ids: torch.LongTensor | None = None, past_key_value: Cache | None = None, output_attentions: bool = False, use_cache: bool = False, cache_position: torch.LongTensor | None = None, **kwargs, ) -> tuple[torch.Tensor, torch.Tensor | None, tuple[torch.Tensor] | None]: + # Unused; kept for HF interface compatibility. + del use_cache, kwargs bsz, q_len, _ = hidden_states.size()[Based on static analysis hints]
612-623: Apply same ARG002 fix in FlashAttention2 forward.def forward( self, hidden_states: torch.Tensor, attention_mask: torch.LongTensor | None = None, position_ids: torch.LongTensor | None = None, past_key_value: Cache | None = None, output_attentions: bool = False, use_cache: bool = False, cache_position: torch.LongTensor | None = None, **kwargs, ) -> tuple[torch.Tensor, torch.Tensor | None, tuple[torch.Tensor] | None]: + # Unused; kept for HF interface compatibility. + del use_cache, kwargs output_attentions = False[Based on static analysis hints]
examples/bitnet-1.58b/eval_correctness.py (1)
49-51: Avoid importing NumPy; return the scalar directly.
timesis already averaged;np.mean(times)is redundant.def profile(model, input_data): - import numpy as np @@ - return np.mean(times) + return timesAlso applies to: 69-69
examples/deepseek_nsa/example_triton_nsa_fwd.py (1)
21-24: Wire up USE_BLOCK_COUNTS or drop the heuristic
USE_BLOCK_COUNTSis defined but not used (NS hard-coded to S). Prefer using it to respect per-token counts.- # if USE_BLOCK_COUNTS: - # NS = tl.load(block_counts + (bos + i_t) * H + i_h) - # else: - NS = S + NS = tl.load(block_counts + (bos + i_t) * H + i_h) if USE_BLOCK_COUNTS else SAlso applies to: 66-71
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (107)
benchmark/blocksparse_attention/benchmark_library_dense_fmha.py(1 hunks)benchmark/blocksparse_attention/benchmark_tilelang_block_sparse_fmha.py(3 hunks)benchmark/blocksparse_attention/benchmark_torch_block_sparse_fmha.py(1 hunks)benchmark/blocksparse_attention/benchmark_triton_block_sparse_fmha.py(4 hunks)benchmark/matmul/benchmark_matmul.py(1 hunks)benchmark/matmul/benchmark_matmul_intrinsic.py(1 hunks)benchmark/matmul/benchmark_matmul_sp.py(3 hunks)benchmark/matmul_fp8/benchmark_matmul.py(0 hunks)docs/conf.py(2 hunks)examples/amd/example_amd_flash_attn_bwd.py(3 hunks)examples/amd/example_amd_flash_attn_fwd.py(7 hunks)examples/analyze/example_conv_analyze.py(2 hunks)examples/attention_sink/example_gqa_sink_bwd_bhsd.py(14 hunks)examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py(10 hunks)examples/attention_sink/example_mha_sink_bwd_bhsd.py(14 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd.py(6 hunks)examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py(9 hunks)examples/bitnet-1.58b/benchmark_generate.py(3 hunks)examples/bitnet-1.58b/benchmark_inference_latency.py(3 hunks)examples/bitnet-1.58b/configuration_bitnet.py(2 hunks)examples/bitnet-1.58b/eval_correctness.py(3 hunks)examples/bitnet-1.58b/eval_gpu_memory.py(2 hunks)examples/bitnet-1.58b/eval_ppl.py(3 hunks)examples/bitnet-1.58b/eval_utils.py(1 hunks)examples/bitnet-1.58b/kernel_benchmark/tilelang_bitnet_158_int8xint2_prefill.py(2 hunks)examples/bitnet-1.58b/kernel_benchmark/tl_int8xint8.py(0 hunks)examples/bitnet-1.58b/load_from_quantized.py(1 hunks)examples/bitnet-1.58b/maint/create_bitblas_ckpt.py(3 hunks)examples/bitnet-1.58b/modeling_bitnet.py(27 hunks)examples/bitnet-1.58b/tokenization_bitnet.py(9 hunks)examples/bitnet-1.58b/utils_quant.py(2 hunks)examples/bitnet-1.58b/vllm_workspace/conftest.py(22 hunks)examples/bitnet-1.58b/vllm_workspace/utils.py(3 hunks)examples/blocksparse_attention/block_sparse_attn_triton.py(9 hunks)examples/blocksparse_attention/example_tilelang_block_sparse_attn.py(5 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_paged.py(17 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py(15 hunks)examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py(16 hunks)examples/blocksparse_attention/example_triton_sparse_gqa_decode_varlen_indice.py(12 hunks)examples/blocksparse_attention/example_triton_sparse_gqa_decode_varlen_mask.py(12 hunks)examples/blocksparse_attention/heuristic.py(1 hunks)examples/blocksparse_gemm/example_blocksparse_gemm.py(5 hunks)examples/cast/example_group_per_split_token_cast_to_fp8.py(7 hunks)examples/cast/example_per_token_cast_to_fp8.py(7 hunks)examples/cast/example_triton_cast_to_fp8.py(3 hunks)examples/convolution/example_convolution.py(3 hunks)examples/convolution/example_convolution_autotune.py(5 hunks)examples/deepseek_deepgemm/example_deepgemm_fp8_2xAcc.py(4 hunks)examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py(7 hunks)examples/deepseek_mla/amd/benchmark_mla_decode_amd_torch.py(9 hunks)examples/deepseek_mla/amd/benchmark_mla_decode_amd_triton.py(9 hunks)examples/deepseek_mla/benchmark_mla.py(13 hunks)examples/deepseek_mla/example_mla_decode.py(7 hunks)examples/deepseek_mla/example_mla_decode_paged.py(12 hunks)examples/deepseek_mla/example_mla_decode_persistent.py(4 hunks)examples/deepseek_mla/example_mla_decode_ws.py(14 hunks)examples/deepseek_mla/experimental/example_mla_decode_kv_fp8.py(3 hunks)examples/deepseek_mla/torch_refs.py(1 hunks)examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py(28 hunks)examples/deepseek_nsa/example_tilelang_nsa_bwd.py(4 hunks)examples/deepseek_nsa/example_tilelang_nsa_decode.py(3 hunks)examples/deepseek_nsa/example_tilelang_nsa_fwd.py(2 hunks)examples/deepseek_nsa/example_tilelang_nsa_fwd_varlen.py(7 hunks)examples/deepseek_nsa/example_triton_nsa_bwd.py(22 hunks)examples/deepseek_nsa/example_triton_nsa_fwd.py(8 hunks)examples/deepseek_nsa/example_triton_nsa_fwd_varlen.py(12 hunks)examples/deepseek_nsa/reference.py(9 hunks)examples/deepseek_v32/fp8_lighting_indexer.py(3 hunks)examples/deepseek_v32/sparse_mla_bwd.py(12 hunks)examples/deepseek_v32/sparse_mla_fwd.py(6 hunks)examples/deepseek_v32/sparse_mla_fwd_pipelined.py(13 hunks)examples/deepseek_v32/topk_selector.py(3 hunks)examples/deepseek_v32/utils.py(4 hunks)examples/dequantize_gemm/dequantize_utils.py(4 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py(7 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py(11 hunks)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py(11 hunks)examples/dequantize_gemm/example_dequant_gemm_fine_grained.py(8 hunks)examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py(9 hunks)examples/dequantize_gemm/example_dequant_gemm_w4a8.py(4 hunks)examples/dequantize_gemm/example_dequant_gemv_fp16xint4.py(5 hunks)examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py(14 hunks)examples/dynamic_shape/example_dynamic.py(2 hunks)examples/elementwise/example_elementwise_add.py(2 hunks)examples/elementwise/example_elementwise_add_tma_1d.py(1 hunks)examples/flash_attention/bert_padding.py(1 hunks)examples/flash_attention/example_gqa_bwd.py(12 hunks)examples/flash_attention/example_gqa_bwd_tma_reduce.py(12 hunks)examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py(12 hunks)examples/flash_attention/example_gqa_fwd_bshd.py(5 hunks)examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py(6 hunks)examples/flash_attention/example_mha_bwd.py(5 hunks)examples/flash_attention/example_mha_bwd_bhsd.py(5 hunks)examples/flash_attention/example_mha_bwd_wgmma_pipelined.py(5 hunks)examples/flash_attention/example_mha_fwd_bhsd.py(5 hunks)examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py(6 hunks)examples/flash_attention/example_mha_fwd_bshd.py(5 hunks)examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py(6 hunks)examples/flash_attention/example_mha_fwd_varlen.py(4 hunks)examples/flash_decoding/example_gqa_decode.py(11 hunks)examples/flash_decoding/example_mha_inference.py(9 hunks)examples/fusedmoe/example_fusedmoe_tilelang.py(16 hunks)examples/fusedmoe/example_fusedmoe_torch.py(7 hunks)examples/fusedmoe/test_example_fusedmoe.py(1 hunks)examples/gdn/example_chunk_delta_bwd.py(10 hunks)examples/gdn/example_chunk_delta_h.py(9 hunks)examples/gdn/example_chunk_o.py(5 hunks)
⛔ Files not processed due to max files limit (36)
- examples/gdn/example_chunk_o_bwd.py
- examples/gdn/example_chunk_scaled_dot_kkt.py
- examples/gdn/example_cumsum.py
- examples/gdn/example_wy_fast.py
- examples/gdn/example_wy_fast_bwd_split.py
- examples/gdn/test_example_gdn_compilation.py
- examples/gdn/utils.py
- examples/gemm/example_gemm_autotune.py
- examples/gemm/example_gemm_intrinsics.py
- examples/gemm/example_gemm_persistent.py
- examples/gemm_fp8/example_tilelang_gemm_amd.py
- examples/gemm_fp8/example_tilelang_gemm_fp8.py
- examples/gemm_fp8/example_tilelang_gemm_fp8_2xAcc.py
- examples/gemm_fp8/example_tilelang_gemm_fp8_intrinsic.py
- examples/gemm_sm100/gemm_mma.py
- examples/gemm_sm100/gemm_tcgen5mma.py
- examples/gemm_sp/example_gemm_sp.py
- examples/gemm_splitk/example_tilelang_gemm_splitk.py
- examples/gemm_splitk/example_tilelang_gemm_splitk_vectorize_atomicadd.py
- examples/gemm_streamk/example_tilelang_gemm_streamk.py
- examples/grouped_gemm/example_grouped_gemm_bwd.py
- examples/grouped_gemm/example_grouped_gemm_fwd.py
- examples/hadamard_transform/example_hadamard.py
- examples/linear_attention/example_linear_attn_bwd.py
- examples/linear_attention/example_linear_attn_fwd.py
- examples/linear_attention/example_mamba_chunk_scan.py
- examples/linear_attention/example_mamba_chunk_state.py
- examples/linear_attention/example_retention_fwd.py
- examples/minference/example_vertical_slash_sparse_attn.py
- examples/norm/rms_norm.py
- examples/online_softmax/online_softmax.py
- examples/plot_layout/fragment_mma_load_a.py
- examples/seer_attention/block_sparse_attn_tilelang.py
- examples/seer_attention/block_sparse_attn_triton.py
- examples/sparse_tensorcore/tilelang_example_sparse_tensorcore.py
- examples/topk/example_topk.py
💤 Files with no reviewable changes (2)
- examples/bitnet-1.58b/kernel_benchmark/tl_int8xint8.py
- benchmark/matmul_fp8/benchmark_matmul.py
✅ Files skipped from review due to trivial changes (21)
- examples/flash_attention/bert_padding.py
- benchmark/matmul/benchmark_matmul.py
- examples/amd/example_amd_flash_attn_fwd.py
- examples/deepseek_nsa/example_tilelang_nsa_decode.py
- examples/deepseek_mla/example_mla_decode_ws.py
- examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_paged.py
- examples/bitnet-1.58b/kernel_benchmark/tilelang_bitnet_158_int8xint2_prefill.py
- examples/deepseek_nsa/reference.py
- examples/blocksparse_attention/heuristic.py
- examples/deepseek_mla/amd/benchmark_mla_decode_amd_torch.py
- examples/bitnet-1.58b/eval_utils.py
- examples/blocksparse_attention/example_triton_sparse_gqa_decode_varlen_mask.py
- examples/blocksparse_attention/example_tilelang_block_sparse_attn.py
- examples/deepseek_v32/topk_selector.py
- examples/deepseek_nsa/example_tilelang_nsa_fwd.py
- examples/gdn/example_chunk_o.py
- examples/elementwise/example_elementwise_add.py
- benchmark/matmul/benchmark_matmul_sp.py
- examples/deepseek_v32/sparse_mla_fwd_pipelined.py
- examples/deepseek_mla/torch_refs.py
- examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py
🚧 Files skipped from review as they are similar to previous changes (22)
- docs/conf.py
- examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py
- examples/flash_attention/example_mha_bwd_bhsd.py
- examples/dequantize_gemm/example_dequant_gemm_fp4_hopper.py
- examples/dequantize_gemm/example_dequant_gemv_fp16xint4.py
- examples/bitnet-1.58b/configuration_bitnet.py
- examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py
- examples/deepseek_deepgemm/example_deepgemm_fp8_2xAcc.py
- examples/flash_attention/example_mha_bwd_wgmma_pipelined.py
- examples/flash_attention/example_mha_fwd_bhsd_wgmma_pipelined.py
- examples/bitnet-1.58b/utils_quant.py
- examples/bitnet-1.58b/vllm_workspace/utils.py
- examples/flash_attention/example_gqa_bwd.py
- examples/bitnet-1.58b/maint/create_bitblas_ckpt.py
- examples/cast/example_triton_cast_to_fp8.py
- examples/flash_attention/example_gqa_fwd_bshd.py
- examples/attention_sink/example_gqa_sink_bwd_bhsd.py
- examples/flash_attention/example_gqa_bwd_wgmma_pipelined.py
- examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper_tma.py
- examples/cast/example_per_token_cast_to_fp8.py
- examples/amd/example_amd_flash_attn_bwd.py
- examples/attention_sink/example_mha_sink_bwd_bhsd.py
🧰 Additional context used
🧬 Code graph analysis (48)
examples/cast/example_group_per_split_token_cast_to_fp8.py (3)
tilelang/language/tir/op.py (1)
ceildiv(3116-3133)tilelang/language/kernel.py (2)
Kernel(229-303)threads(215-219)tilelang/language/copy.py (1)
copy(15-93)
examples/elementwise/example_elementwise_add_tma_1d.py (1)
tilelang/language/parallel.py (1)
Parallel(9-29)
examples/flash_decoding/example_gqa_decode.py (2)
examples/gemm/example_gemm_autotune.py (1)
get_heuristic_config(165-199)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)
examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py (1)
examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (6)
flashattn(35-207)ref_program(212-257)main(141-205)main(394-473)gen_inputs(379-391)triton_program(349-376)
examples/flash_attention/example_mha_fwd_bhsd.py (3)
tilelang/jit/kernel.py (2)
out_idx(471-472)get_profiler(385-401)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/profiler/__init__.py (2)
assert_allclose(77-146)do_bench(219-282)
examples/deepseek_mla/benchmark_mla.py (1)
examples/deepseek_mla/example_mla_decode_paged.py (1)
run_torch_mla(313-354)
examples/deepseek_mla/amd/benchmark_mla_decode_amd_triton.py (1)
examples/deepseek_mla/amd/benchmark_mla_decode_amd_torch.py (6)
run_torch_mla(35-73)run_flash_mla_triton(327-373)flash_mla_triton(352-369)mla_decode_triton(292-323)compare_a(458-505)compare_ab(382-455)
benchmark/blocksparse_attention/benchmark_tilelang_block_sparse_fmha.py (1)
examples/blocksparse_attention/example_tilelang_block_sparse_attn.py (1)
Softmax(87-114)
examples/bitnet-1.58b/load_from_quantized.py (1)
examples/bitnet-1.58b/modeling_bitnet.py (1)
from_quantized(1500-1578)
examples/flash_attention/example_mha_fwd_bshd.py (2)
tilelang/jit/kernel.py (2)
out_idx(471-472)get_profiler(385-401)tilelang/profiler/__init__.py (2)
assert_allclose(77-146)do_bench(219-282)
examples/deepseek_mla/example_mla_decode.py (5)
tilelang/jit/kernel.py (1)
out_idx(471-472)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/tileop/gemm/gemm_base.py (2)
clear_accum(107-108)policy(119-120)tilelang/language/kernel.py (1)
threads(215-219)tilelang/language/copy.py (1)
copy(15-93)
examples/gdn/example_chunk_delta_bwd.py (1)
tilelang/language/copy.py (1)
copy(15-93)
examples/deepseek_nsa/example_tilelang_nsa_fwd_varlen.py (2)
examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py (1)
parallel_nsa(232-308)examples/gemm_streamk/example_tilelang_gemm_streamk.py (1)
cdiv(8-9)
examples/fusedmoe/example_fusedmoe_tilelang.py (2)
tilelang/language/copy.py (1)
copy(15-93)examples/fusedmoe/example_fusedmoe_torch.py (3)
forward(21-24)forward(37-42)forward(56-67)
benchmark/blocksparse_attention/benchmark_torch_block_sparse_fmha.py (1)
benchmark/blocksparse_attention/benchmark_tilelang_block_sparse_fmha.py (1)
get_sparse_attn_mask_from_topk(14-26)
examples/bitnet-1.58b/eval_ppl.py (1)
examples/bitnet-1.58b/modeling_bitnet.py (1)
BitnetForCausalLM(1231-1578)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py (3)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/language/tir/op.py (1)
if_then_else(2906-2936)tilelang/language/__init__.py (1)
symbolic(87-98)
examples/deepseek_mla/example_mla_decode_persistent.py (2)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/tileop/gemm/gemm_base.py (1)
policy(119-120)
examples/bitnet-1.58b/eval_gpu_memory.py (5)
examples/bitnet-1.58b/benchmark_generate.py (1)
profile(54-74)examples/bitnet-1.58b/benchmark_inference_latency.py (1)
profile(12-34)examples/bitnet-1.58b/eval_correctness.py (1)
profile(49-69)examples/bitnet-1.58b/benchmark_model_10k_loops.py (1)
profile(19-41)examples/bitnet-1.58b/modeling_bitnet.py (1)
_post_process_weights(1487-1491)
examples/bitnet-1.58b/benchmark_generate.py (2)
examples/bitnet-1.58b/eval_ppl.py (1)
main(31-61)examples/bitnet-1.58b/benchmark_inference_latency.py (1)
main(37-53)
examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (2)
tilelang/language/allocate.py (1)
alloc_local(39-50)examples/attention_sink/example_gqa_sink_bwd_bhsd.py (1)
ref_program(414-459)
examples/deepseek_nsa/example_triton_nsa_fwd.py (1)
examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py (2)
parallel_nsa_fwd_kernel(26-107)parallel_nsa(232-308)
examples/attention_sink/example_mha_sink_fwd_bhsd.py (2)
examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py (2)
ref_program(212-257)gen_inputs(379-391)tilelang/language/allocate.py (1)
alloc_local(39-50)
examples/deepseek_v32/sparse_mla_fwd.py (2)
tilelang/math/__init__.py (1)
next_power_of_2(1-2)examples/deepseek_v32/sparse_mla_fwd_pipelined.py (1)
sparse_mla_fwd_interface(343-395)
examples/analyze/example_conv_analyze.py (2)
examples/gemm/example_gemm_autotune.py (1)
kernel(110-150)examples/analyze/example_gemm_analyze.py (1)
kernel(10-46)
examples/gdn/example_chunk_delta_h.py (2)
examples/gdn/example_chunk_o.py (2)
prepare_input(26-44)kernel(92-197)examples/gdn/example_chunk_delta_bwd.py (3)
prepare_input(33-64)kernel(232-398)do_bench(614-636)
examples/bitnet-1.58b/vllm_workspace/conftest.py (1)
examples/bitnet-1.58b/modeling_bitnet.py (1)
get_output_embeddings(1249-1250)
examples/dequantize_gemm/example_dequant_gemm_fine_grained.py (1)
tilelang/language/tir/op.py (2)
call_extern(173-195)address_of(464-480)
examples/bitnet-1.58b/benchmark_inference_latency.py (4)
examples/bitnet-1.58b/benchmark_generate.py (1)
profile(54-74)examples/bitnet-1.58b/eval_correctness.py (1)
profile(49-69)examples/bitnet-1.58b/eval_gpu_memory.py (1)
profile(12-34)examples/bitnet-1.58b/benchmark_model_10k_loops.py (1)
profile(19-41)
examples/deepseek_nsa/example_tilelang_nsa_bwd.py (2)
tilelang/jit/kernel.py (1)
out_idx(471-472)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)
examples/flash_decoding/example_mha_inference.py (2)
tilelang/language/copy.py (1)
copy(15-93)tilelang/profiler/__init__.py (1)
do_bench(219-282)
examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py (2)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py (1)
matmul(86-381)
examples/convolution/example_convolution_autotune.py (2)
tilelang/autotuner/tuner.py (1)
autotune(727-820)examples/gemm/example_gemm_autotune.py (1)
get_configs(22-105)
examples/convolution/example_convolution.py (2)
examples/convolution/example_convolution_autotune.py (1)
convolution(97-168)tilelang/language/kernel.py (1)
threads(215-219)
examples/fusedmoe/example_fusedmoe_torch.py (2)
examples/fusedmoe/example_fusedmoe_tilelang.py (3)
forward(317-320)forward(333-338)forward(432-535)tilelang/language/customize.py (1)
view(51-61)
examples/bitnet-1.58b/modeling_bitnet.py (1)
examples/bitnet-1.58b/configuration_bitnet.py (1)
BitnetConfig(29-194)
examples/deepseek_mla/amd/benchmark_mla_decode_amd_tilelang.py (1)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)
examples/flash_attention/example_mha_fwd_bshd_wgmma_pipelined.py (2)
tilelang/jit/kernel.py (2)
out_idx(471-472)get_profiler(385-401)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)
examples/flash_attention/example_gqa_bwd_tma_reduce.py (3)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/language/__init__.py (1)
annotate_layout(110-148)tilelang/jit/__init__.py (1)
jit(242-318)
examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py (3)
examples/dequantize_gemm/example_dequant_gemm_bf16_fp4_hopper.py (1)
matmul(49-354)examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_hopper.py (1)
matmul(86-381)tilelang/quantize/mxfp.py (1)
get_mxfp_intrin_group(52-109)
examples/deepseek_mla/experimental/example_mla_decode_kv_fp8.py (2)
tilelang/jit/kernel.py (1)
out_idx(471-472)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)
examples/flash_attention/example_mha_fwd_varlen.py (2)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/language/tir/op.py (1)
if_then_else(2906-2936)
examples/deepseek_nsa/example_triton_nsa_bwd.py (3)
examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py (2)
parallel_nsa_fwd_kernel(26-107)parallel_nsa(232-308)examples/deepseek_nsa/example_triton_nsa_fwd.py (2)
parallel_nsa_fwd_kernel(30-113)parallel_nsa(238-314)examples/deepseek_nsa/example_triton_nsa_fwd_varlen.py (2)
parallel_nsa_fwd_kernel(30-157)parallel_nsa(259-335)
examples/flash_attention/example_mha_bwd.py (3)
tilelang/jit/kernel.py (1)
out_idx(471-472)tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/profiler/__init__.py (1)
do_bench(219-282)
examples/deepseek_nsa/example_triton_nsa_fwd_varlen.py (2)
examples/deepseek_nsa/benchmark/benchmark_nsa_fwd.py (2)
parallel_nsa_fwd_kernel(26-107)parallel_nsa(232-308)examples/deepseek_nsa/example_triton_nsa_bwd.py (3)
parallel_nsa_fwd_kernel(30-113)parallel_nsa_fwd_kernel(559-686)parallel_nsa(914-990)
examples/blocksparse_gemm/example_blocksparse_gemm.py (2)
tilelang/jit/kernel.py (1)
params(475-476)tilelang/engine/param.py (1)
KernelParam(12-104)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py (2)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)tilelang/language/__init__.py (1)
symbolic(87-98)
examples/deepseek_mla/example_mla_decode_paged.py (2)
tilelang/transform/pass_config.py (1)
PassConfigKey(6-105)examples/deepseek_mla/benchmark_mla.py (1)
run_torch_mla(36-74)
🪛 Ruff (0.14.0)
examples/flash_decoding/example_gqa_decode.py
475-475: Avoid specifying long messages outside the exception class
(TRY003)
examples/fusedmoe/example_fusedmoe_tilelang.py
7-7: from tilelang.autotuner import * used; unable to detect undefined names
(F403)
8-8: from example_fusedmoe_torch import * used; unable to detect undefined names
(F403)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
332-332: Unused function argument: max_cache_seqlen
(ARG001)
332-332: Unused function argument: num_blocks
(ARG001)
334-334: Unpacked variable heads is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
373-373: Unused function argument: block_indices
(ARG001)
373-373: Unused function argument: max_cache_seqlen
(ARG001)
373-373: Unused function argument: num_blocks
(ARG001)
374-374: Unused function argument: block_size
(ARG001)
418-418: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
examples/blocksparse_attention/block_sparse_attn_triton.py
202-202: Unused function argument: ctx
(ARG001)
examples/convolution/example_convolution_autotune.py
112-112: Unused function argument: enable_rasteration
(ARG001)
182-182: Unused function argument: with_roller
(ARG001)
examples/bitnet-1.58b/modeling_bitnet.py
378-378: Unused method argument: use_cache
(ARG002)
380-380: Unused method argument: kwargs
(ARG002)
531-531: Unused method argument: use_cache
(ARG002)
533-533: Unused method argument: kwargs
(ARG002)
619-619: Unused method argument: use_cache
(ARG002)
621-621: Unused method argument: kwargs
(ARG002)
examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py
440-440: Comment contains ambiguous ( (FULLWIDTH LEFT PARENTHESIS). Did you mean ( (LEFT PARENTHESIS)?
(RUF003)
440-440: Comment contains ambiguous ) (FULLWIDTH RIGHT PARENTHESIS). Did you mean ) (RIGHT PARENTHESIS)?
(RUF003)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
368-368: Unused function argument: block_indices
(ARG001)
368-368: Unused function argument: max_cache_seqlen
(ARG001)
368-368: Unused function argument: num_blocks
(ARG001)
369-369: Unused function argument: block_size
(ARG001)
415-415: Value being cast to int is already an integer
Remove unnecessary int call
(RUF046)
examples/deepseek_mla/example_mla_decode_paged.py
315-315: Unused function argument: block_table
(ARG001)
318-318: Unused function argument: block_size
(ARG001)
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
Show resolved
Hide resolved
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_indice.py
Show resolved
Hide resolved
examples/blocksparse_attention/example_tilelang_sparse_gqa_decode_varlen_mask.py
Show resolved
Hide resolved
examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py
Outdated
Show resolved
Hide resolved
examples/dequantize_gemm/example_dequant_groupedgemm_bf16_mxfp4_hopper.py
Outdated
Show resolved
Hide resolved
|
Never mix multiple code styles, but be consistent with only one. You should revert this commit aca0569, since you may format the code with |
XuehaiPan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
65cd197 to
2a7a6c5
Compare
8309155 to
cbea8ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (26)
tilelang/language/copy.py (1)
19-21: Update docstrings to match modernized type hints.The docstring still references old-style type hints (
Union[...],Optional[...]) while the function signature uses the new syntax. Consider updating the docstring for consistency.Apply this diff to update the docstring:
Args: - src (Union[tir.Buffer, tir.BufferLoad, tir.BufferRegion]): Source memory region - dst (Union[tir.Buffer, tir.BufferLoad]): Destination memory region - coalesced_width (Optional[int], optional): Width for coalesced memory access. Defaults to None. + src (tir.Buffer | tir.BufferLoad | tir.BufferRegion): Source memory region + dst (tir.Buffer | tir.BufferLoad): Destination memory region + coalesced_width (int | None, optional): Width for coalesced memory access. Defaults to None.tilelang/carver/template/flashattention.py (1)
29-41: Update the docstring to match the modernized return type.The method signature correctly uses
list[Hint], but the docstring at line 38 still references the old-styleList[Hint].Apply this diff to update the docstring:
Returns: - List[Hint]: A list of optimization hints for hardware acceleration. + list[Hint]: A list of optimization hints for hardware acceleration. """tilelang/language/frame.py (1)
201-209: Update docstring to match modernized return type annotation.The return type annotation was correctly updated to use PEP 604 syntax (
PrimExpr | None), but the docstring at line 208 still references the old style (Optional[PrimExpr]). Update the docstring for consistency.Apply this diff to update the docstring:
"""Get the value bound to a variable in the current let frame stack. Args: var (Var): The variable to look up Returns: - Optional[PrimExpr]: The bound value if found, None otherwise + PrimExpr | None: The bound value if found, None otherwise """version_provider.py (1)
19-29: Fix return type mismatch between annotation and implementation.The return type annotation is
str | None, but the function never returnsNone—it returns eitherr.stdout.strip()or'unknown'(line 29). This creates an issue: when the git command fails, the caller on line 67 (elif git_hash := get_git_commit_id():) receives'unknown', which is truthy, causing line 68 to append'gitunkno'to the version string.Apply this diff to return
Noneon error, matching the annotation and caller expectations:if r.returncode == 0: return r.stdout.strip() else: - return 'unknown' + return Nonetilelang/contrib/nvrtc.py (1)
14-42: Update docstring to match modernized type annotations.The function signature correctly uses modern type syntax (
int | None,str | list[str] | None), but the docstring still uses the old-style syntax. This creates an inconsistency between the signature and documentation.Apply this diff to update the docstring:
- arch : Optional[int] + arch : int | None The cuda architecture code. - options : Optional[Union[str, List[str]]] + options : str | list[str] | None The additional options.tilelang/engine/phase.py (2)
22-29: Guard against None target before calling have_tma.The function accepts
target: Target | None = None, but line 26 directly callshave_tma(target)without checking iftargetisNone. This will raise anAttributeErrorwhenhave_tmatries to accesstarget.kind.name.Apply this diff to add a None guard:
def allow_tma_and_warp_specialized(pass_ctx: PassContext | None = None, target: Target | None = None) -> bool: if pass_ctx is None: pass_ctx = tilelang.transform.get_pass_context() + if target is None: + return False if not have_tma(target): return False disable_tma_lower = pass_ctx.config.get("tl.disable_tma_lower", False) return not disable_tma_lower and allow_warp_specialized(pass_ctx=pass_ctx, target=target)
9-19: Guard against None target before calling helper functions.The function accepts
target: Target | None = None, but line 16 directly callsis_cuda_target(target)andhave_tma(target)without checking iftargetisNone. Verification confirms thatis_cuda_targetexpects a non-NoneTargetand accessestarget.kind.namedirectly, which will raise anAttributeErroriftargetisNone.Apply this diff:
def allow_warp_specialized(pass_ctx: PassContext | None = None, target: Target | None = None) -> bool: # avoid circular import from tilelang.jit.adapter.utils import is_cuda_target if pass_ctx is None: pass_ctx = tilelang.transform.get_pass_context() + if target is None: + return False if (not is_cuda_target(target)) or (not have_tma(target)): return False disable_warp_specialized = pass_ctx.config.get("tl.disable_warp_specialized", False) return not disable_warp_specializedtilelang/jit/adapter/dlpack.py (1)
22-24: Fix device selection when no inputs.torch.cuda.current_device() returns an int and can fail if CUDA is unavailable. Pass a torch.device and fall back to CPU.
Apply:
- device = ins[0].device if len(ins) > 0 else torch.cuda.current_device() + if len(ins) > 0: + device = ins[0].device + elif torch.cuda.is_available(): + device = torch.device("cuda", torch.cuda.current_device()) + else: + device = torch.device("cpu")tilelang/carver/arch/driver/cuda_driver.py (4)
80-95: Return type says “| None” but function never returns Noneget_cuda_device_properties raises on error; it never returns None. Tighten the annotation.
-def get_cuda_device_properties(device_id: int = 0) -> cudaDeviceProp | None: +def get_cuda_device_properties(device_id: int = 0) -> cudaDeviceProp:
106-121: Do not use assert for runtime input validationAssertions can be stripped with -O; use explicit validation. Also the function never returns None, so drop “| None”.
-def get_shared_memory_per_block(device_id: int = 0, format: str = "bytes") -> int | None: - assert format in ["bytes", "kb", "mb"], "Invalid format. Must be one of: bytes, kb, mb" +def get_shared_memory_per_block(device_id: int = 0, format: str = "bytes") -> int: + if format not in {"bytes", "kb", "mb"}: + raise ValueError("Invalid format. Must be one of: bytes, kb, mb") @@ - else: - raise RuntimeError("Failed to get device properties.") + else: + raise RuntimeError("Failed to get device properties for shared memory per block.")
124-144: Type annotation mismatch and noisy printget_device_attribute returns None on errors but is annotated as int, and it prints to stdout. Either always raise or return int | None and use logging.
+import logging @@ -def get_device_attribute(attr: int, device_id: int = 0) -> int: +def get_device_attribute(attr: int, device_id: int = 0) -> int | None: @@ - except Exception as e: - print(f"Error getting device attribute: {str(e)}") - return None + except Exception as e: + logging.getLogger(__name__).debug("Error getting device attribute: %s", e) + return None
146-165: Same assert issue and return typeReplace assert with explicit validation; function doesn’t return None on success paths.
-def get_max_dynamic_shared_size_bytes(device_id: int = 0, format: str = "bytes") -> int | None: +def get_max_dynamic_shared_size_bytes(device_id: int = 0, format: str = "bytes") -> int: @@ - assert format in ["bytes", "kb", "mb"], "Invalid format. Must be one of: bytes, kb, mb" + if format not in {"bytes", "kb", "mb"}: + raise ValueError("Invalid format. Must be one of: bytes, kb, mb") @@ - else: - raise RuntimeError("Failed to get device properties.") + else: + raise RuntimeError("Failed to get device properties for dynamic shared memory size.")tilelang/carver/utils.py (2)
49-51: Fix logger.debug usage (tuple passed as message)
logger.debug("...", e_msg)formats as a tuple. Use%sformatting orexc_info=True.- except Exception as e_msg: - logger.debug("Get tensorized func and tags failed: ", e_msg) + except Exception as e_msg: + logger.debug("Get tensorized func and tags failed: %s", e_msg) @@ - except Exception as e_msg: - logger.debug("Get tensorized func and tags failed: ", e_msg) + except Exception as e_msg: + logger.debug("Get tensorized func and tags failed: %s", e_msg)Also applies to: 64-65
76-86: Unused parameterextra_tagsand misleading assert message
extra_tagsis unused (ruff ARG001). Either thread it into policy creation, or prefix with_extra_tagsif kept for API.- The assert message says “list of functions” but the param is
list[OutputNode].-def get_roller_hints_from_output_nodes(output_nodes: list[OutputNode], - arch: TileDevice, - topk: int = 10, - extra_tags: list[str] | None = None) -> list[Hint] | None: - assert isinstance(output_nodes, list), "The input should be a list of functions." +def get_roller_hints_from_output_nodes(output_nodes: list[OutputNode], + arch: TileDevice, + topk: int = 10, + _extra_tags: list[str] | None = None) -> list[Hint]: + if not isinstance(output_nodes, list): + raise TypeError("output_nodes must be a list of OutputNode") @@ - except Exception as e_msg: - logger.debug(f"Generate hints from output nodes failed: {e_msg}", - "fallback to default policy") + except Exception as e_msg: + logger.debug("Generate hints from output nodes failed: %s; fallback to default policy", + e_msg)tilelang/jit/adapter/cython/adapter.py (1)
171-173: Redundant/incorrect target normalization
determine_target(target, return_object=True)already returns an object; callingdetermine_targetagain is redundant and may change semantics. Canonicalize once.- target = determine_target(target, return_object=True) - adapter.target = Target.canon_target(determine_target(target)) + target_obj = determine_target(target, return_object=True) + adapter.target = Target.canon_target(target_obj)tilelang/carver/analysis.py (1)
262-284: Return type mismatch: function can return Noneget_reduction_blocks returns None on several branches but is annotated to return list. Adjust to
list[BlockRV] | None.-def get_reduction_blocks(sch: tir.Schedule, - blocks: list[tir.schedule.BlockRV]) -> list[tir.schedule.BlockRV]: +def get_reduction_blocks(sch: tir.Schedule, + blocks: list[tir.schedule.BlockRV]) -> list[tir.schedule.BlockRV] | None:tilelang/engine/lower.py (1)
128-133: Remove unusedtargetparameter and update call sites.The
targetparameter is confirmed to be completely unused in the function body. The function only operates ontarget_hostand has no docstring explaining intentional API padding. Both call sites pass the parameter unnecessarily:
- Remove
targetparameter from function signature at tilelang/engine/lower.py:128- Update call site at tilelang/engine/lower.py:218
- Update call site at tilelang/jit/adapter/utils.py:86
tilelang/jit/adapter/wrapper.py (1)
493-499: Bug: l2_persistent_map assignment uses stale function_name from previous loop.Inside the host_mod loop, function_name is not defined; this writes under the last device function’s name. Merge/update the provided map instead.
- if "l2_persistent_map" in func.attrs: - self.l2_persistent_map[function_name] = func.attrs["l2_persistent_map"] + if "l2_persistent_map" in func.attrs: + l2_map = func.attrs["l2_persistent_map"] + if isinstance(l2_map, dict): + self.l2_persistent_map.update(l2_map)tilelang/intrinsics/mma_macro_generator.py (1)
1113-1119: Wrong offset constant: using local_size_b where local_size_a is required.This misindexes A_local_buf for INT4 paths and corrupts results. Fix both occurrences.
@@ - A_local_buf.data, - i * local_size_a + lift(local_size_b) // 2, + A_local_buf.data, + i * local_size_a + lift(local_size_a) // 2, @@ - A_local_buf.data, - i * local_size_a + lift(local_size_b) // 2, + A_local_buf.data, + i * local_size_a + lift(local_size_a) // 2,Also applies to: 1216-1222
tilelang/autotuner/tuner.py (2)
232-266: generate_cache_key returns a string; fix return annotation.The function returns a SHA256 hex digest, not AutotuneResult | None.
- def generate_cache_key(self, parameters: dict[str, Any]) -> AutotuneResult | None: + def generate_cache_key(self, parameters: dict[str, Any]) -> str:
679-712: Type annotation mismatch:_kernel_parametersis misannotated.The type annotation at line 107 declares
_kernel_parameters: tuple[str, ...], but the actual usage at line 430 unpacks it as exactly two values:key_args_tuple, key_kwargs_tuple = self._kernel_parameters. This contradicts the annotation, which claims a variable-length tuple of strings. The actual structure passed at line 710 iskey = (key_args_tuple, key_kwargs_tuple), a 2-tuple of tuples. The type annotation should reflect the actual structure (e.g.,tuple[tuple, tuple]or similar).tilelang/cache/kernel_cache.py (1)
43-44: Add "torch" to all execution_backend Literal type annotations.Code currently requires "torch" backend for metal targets (tilelang/jit/init.py:75 asserts this), and actively uses it in tilelang/jit/kernel.py:279 with MetalKernelAdapter. However, Literal types across the codebase omit "torch", causing type checker inconsistencies.
Affected locations requiring "torch" addition to Literal definitions:
- tilelang/cache/kernel_cache.py: lines 43, 72, 120, 329
- tilelang/jit/kernel.py: lines 48, 65, 146
- tilelang/jit/init.py: lines 33, 48, 94, 104, 123, 246, 267
- tilelang/cache/init.py: line 21
- tilelang/autotuner/param.py: lines 52, 218
- tilelang/autotuner/tuner.py: lines 137, 775
tilelang/jit/kernel.py (2)
34-43: Make Optional explicit for attributes with None defaults (RUF013).These fields default to None but omit
| None. This will fail with the new pyupgrade config.Apply:
- prim_func: PrimFunc = None - artifact: CompiledArtifact = None - adapter: BaseKernelAdapter = None - torch_function: Callable = None + prim_func: PrimFunc | None = None + artifact: CompiledArtifact | None = None + adapter: BaseKernelAdapter | None = None + torch_function: Callable | None = None @@ - latency: float = None - config: dict[str, Any] = None - ref_latency: float = None + latency: float | None = None + config: dict[str, Any] | None = None + ref_latency: float | None = None
472-489: export_library uses undefined attribute (runtime bug).
self.rt_moduledoesn’t exist; compiled module lives underself.artifact.rt_mod. This will crash at call time.- # Export the compiled kernel function to a shared library file. - self.rt_module.export_library(kernel_file) + # Export the compiled kernel function to a shared library file. + assert self.artifact is not None and self.artifact.rt_mod is not None, \ + "No runtime module to export" + self.artifact.rt_mod.export_library(kernel_file)tilelang/jit/adapter/nvrtc/adapter.py (2)
21-23: Call the availability function; don’t treat it as a truthy object.
if is_nvrtc_available:will always be truthy (function object), forcingcuda.bindings.driverimport and breaking environments without NVRTC.-if is_nvrtc_available: +if is_nvrtc_available(): import cuda.bindings.driver as cuda
147-168: Guard buffer_map access to avoid KeyError.Unlike ctypes/cython adapters, this code unconditionally indexes
buffer_map[param]. For params without buffers, this raises at runtime.- for i, param in enumerate(params): - buffer = buffer_map[param] - for j, shape in enumerate(buffer.shape): - if isinstance(shape, tir.Var) and (shape not in dynamic_symbolic_map): - dynamic_symbolic_map[shape] = (i, j) + for i, param in enumerate(params): + if param in buffer_map: + buffer = buffer_map[param] + for j, shape in enumerate(buffer.shape): + if isinstance(shape, tir.Var) and (shape not in dynamic_symbolic_map): + dynamic_symbolic_map[shape] = (i, j)
♻️ Duplicate comments (17)
tilelang/language/parallel.py (1)
18-18: Docstring still referencesOptional[int]instead ofint | None.This inconsistency has already been flagged in a previous review comment.
tilelang/carver/roller/shape_inference/common.py (1)
47-47: Fix implicitOptionalviolation detected by the new linter rules.This issue was previously flagged but remains unresolved. The
rstepparameter has aNonedefault but the type annotation doesn't includeNone, violating PEP 484. Since this PR enables stricter linting rules (RUF013), this should be addressed.Apply this diff to fix:
- def infer(self, shape, rstep: dict[str, int] = None): + def infer(self, shape, rstep: dict[str, int] | None = None):tilelang/engine/phase.py (1)
32-33: Guard against None target before calling have_tma.This issue was already flagged in a previous review: the function accepts
target: Target | None = None, but directly callshave_tma(target)without checking iftargetisNone, which will raise anAttributeError.Apply this diff to add a None guard (as suggested in the previous review):
def allow_fence_proxy(target: Target | None = None) -> bool: + if target is None: + return False return have_tma(target)tilelang/carver/matmul_analysis.py (1)
522-528: Return annotation still contradicts actual outputs.This issue was previously flagged but remains unresolved. The function returns
(func, None)at multiple points (lines 539, 642, 651, 660, 680, 684) and(sch.mod["main"], tags)wheretagscan beFalse(line 682), but the annotation only allowsdictin the second position.Recommend either:
- Widen the annotation to match current behavior:
-) -> tuple[tir.PrimFunc, dict[str, list[int] | int]]: +) -> tuple[tir.PrimFunc, dict[str, list[int] | int] | bool | None]:
- Or normalize all non-dict returns to
Noneand adjust the annotation accordingly:-) -> tuple[tir.PrimFunc, dict[str, list[int] | int]]: +) -> tuple[tir.PrimFunc, dict[str, list[int] | int] | None]:tilelang/engine/callback.py (2)
62-62: Type annotation missingNonein union.The parameter
funchas a default value ofNone, but the type hintCallable | booldoes not includeNone. This violates PEP 484, which requires explicitNonewhen a parameter defaults toNone. The runtime logic at line 83 explicitly checks forNone, confirming it's a valid value.Apply this diff to fix the type annotation:
-def register_hip_postproc_callback(func: Callable | bool = None, override: bool = True): +def register_hip_postproc_callback(func: Callable | bool | None = None, override: bool = True):
29-29: Type annotation missingNonein union.The parameter
funchas a default value ofNone, but the type hintCallable | booldoes not includeNone. This violates PEP 484, which requires explicitNonewhen a parameter defaults toNone. The runtime logic at line 50 explicitly checks forNone, confirming it's a valid value.Apply this diff to fix the type annotation:
-def register_cuda_postproc_callback(func: Callable | bool = None, override: bool = True): +def register_cuda_postproc_callback(func: Callable | bool | None = None, override: bool = True):tilelang/carver/roller/rasterization.py (1)
86-86: Fix implicitOptionalin parameter type annotation.The parameter
panel_widthhas a default value ofNonebut is typed asint, which violates PEP 484.Apply this diff to fix the type annotation:
- def get_code(self, panel_width: int = None) -> list[str]: + def get_code(self, panel_width: int | None = None) -> list[str]:tilelang/language/experimental/gemm_sp.py (1)
27-30: Update docstrings to match modernized type hints.The docstrings still reference the old
Union[tir.Buffer, tir.Var]syntax, but the actual type hints now usetir.Buffer | tir.Var.Apply this diff to update the docstrings:
Args: - A_sparse (Union[tir.Buffer, tir.Var]): First input matrix dense values - E (Union[tir.Buffer, tir.Var]): First input matrix sparse metadata - B (Union[tir.Buffer, tir.Var]): Second input matrix - C (Union[tir.Buffer, tir.Var]): Output matrix for results + A_sparse (tir.Buffer | tir.Var): First input matrix dense values + E (tir.Buffer | tir.Var): First input matrix sparse metadata + B (tir.Buffer | tir.Var): Second input matrix + C (tir.Buffer | tir.Var): Output matrix for resultstilelang/jit/adapter/cython/adapter.py (1)
149-158: from_database params should be KernelParam (same as init)- def from_database(cls, - params: list[TensorType], + def from_database(cls, + params: list[KernelParam], result_idx: list[int], target: str, func_or_mod: tir.PrimFunc | tvm.IRModule,tilelang/cache/__init__.py (1)
17-17: Add explicit| Noneannotations for parameters withNonedefaults.Per PEP 484, parameters with
Nonedefaults must explicitly includeNonein the type annotation. Lines 17 and 20 have implicitOptionalwhich violates this requirement.Apply this diff:
def cached( func: PrimFunc = None, - out_idx: list[int] = None, + out_idx: list[int] | None = None, *args, - target: str | Target = "auto", - target_host: str | Target = None, + target: str | Target = "auto", + target_host: str | Target | None = None,Also applies to: 20-20
tilelang/profiler/__init__.py (1)
198-228: Fix missing| Noneininput_tensorsannotation.Line 226 has
input_tensors: list[torch.Tensor] = Nonebut must include| Nonein the type annotation. The function checksif input_tensors is Noneat lines 250 and 267, confirmingNoneis a valid value.Apply this diff:
- input_tensors: list[torch.Tensor] = None, + input_tensors: list[torch.Tensor] | None = None,tilelang/autotuner/param.py (1)
51-56: Fix missing| Noneintarget_hostannotation.Line 54 has
target_host: str | Target = Nonebut the type annotation must include| Noneto match the default value and satisfy the RUF013 lint rule that was enabled by this PR.Apply this diff:
- target_host: str | Target = None + target_host: str | Target | None = Nonetilelang/cache/kernel_cache.py (1)
116-124: Make all None defaults explicit in type hints; widen out_idx and func types.PEP 484 forbids implicit Optional and the current signatures also under‑specify accepted shapes (e.g., out_idx can be int | list[int] | None). Apply the following changes. This aligns with FA102/UP rules and prior feedback.
@@ - def _generate_key( + def _generate_key( self, func: Callable, - out_idx: list[int], + out_idx: list[int] | int | None, execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc"] = "cython", - args=None, + args=(), target: str | Target = "auto", - target_host: str | Target = None, - pass_configs: dict = None, + target_host: str | Target | None = None, + pass_configs: dict | None = None, compile_flags: list[str] | str | None = None, ) -> str: @@ - "args_repr": tuple( - repr(arg) for arg in args - ), # Use repr to serialize arguments, may need more robust serialization + "args_repr": tuple(repr(arg) for arg in (args or ())), @@ - def cached( + def cached( self, - func: PrimFunc = None, - out_idx: list[int] = None, + func: PrimFunc | None = None, + out_idx: list[int] | int | None = None, *args, target: str | Target = "auto", - target_host: str | Target = None, + target_host: str | Target | None = None, execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc"] = "cython", verbose: bool = False, - pass_configs: dict = None, + pass_configs: dict | None = None, compile_flags: list[str] | str | None = None, ) -> JITKernel: @@ - def _load_kernel_from_disk( + def _load_kernel_from_disk( self, key: str, target: str | Target = "auto", - target_host: str | Target = None, - out_idx: list[int] = None, + target_host: str | Target | None = None, + out_idx: list[int] | int | None = None, execution_backend: Literal["dlpack", "ctypes", "cython", "nvrtc"] = "cython", - pass_configs: dict = None, + pass_configs: dict | None = None, compile_flags: list[str] | str | None = None, - func: Callable = None, + func: Callable | None = None, verbose: bool = False, ) -> JITKernel | None:Also applies to: 326-335, 71-78
tilelang/autotuner/tuner.py (1)
107-109: Fix parameter key typing: use (args_tuple, sorted_kwargs_tuple) and align setter signature.Matches actual usage in call (key = (args, sorted(kwargs.items()))).
@@ - _kernel_parameters: tuple[str, ...] | None = None - _function_parameters: dict[str, Any] | None = None + _kernel_parameters: tuple[tuple[Any, ...], tuple[tuple[str, Any], ...]] | None = None + _function_parameters: dict[str, Any] | None = None @@ - def set_kernel_parameters(self, k_parameters: tuple[str, ...], f_parameters: dict[str, Any]): + def set_kernel_parameters( + self, + k_parameters: tuple[tuple[Any, ...], tuple[tuple[str, Any], ...]], + f_parameters: dict[str, Any], + ): # for cache key generation self._kernel_parameters = k_parameters self._function_parameters = f_parametersAlso applies to: 227-231
tilelang/jit/adapter/ctypes/adapter.py (2)
7-13: Use KernelParam for params, not TensorType (consistency with JIT/NVRTC).Runtime expects
torch.dtypefromKernelParam.dtype. KeepingTensorTypehere is inconsistent and misleading.-from typing import Callable, Any +from typing import Callable, Any @@ -from tvm.relax import TensorType +from tilelang.engine.param import KernelParam @@ - def __init__(self, - params: list[TensorType], + def __init__(self, + params: list[KernelParam], result_idx: list[int], target: str,Also applies to: 44-55
109-120: Align from_database params type with init.Same rationale; match NVRTC/Cython adapters and JITKernel.
- def from_database(cls, - params: list[TensorType], + def from_database(cls, + params: list[KernelParam], result_idx: list[int], target: str,tilelang/jit/kernel.py (1)
44-55: Fix implicit Optional in init signature (RUF013).
out_idxandtarget_hostdefault to None but omit| None.- out_idx: list[int] | int = None, + out_idx: list[int] | int | None = None, @@ - target_host: str | Target = None, + target_host: str | Target | None = None,
🧹 Nitpick comments (27)
tilelang/layout/layout.py (1)
54-87: Consider updating docstrings for consistency.The docstrings still use
List[int],List[IterVar], andList[PrimExpr]in their Returns sections (lines 54, 65, 76, 87), while the actual type hints now use the built-inlist[...]syntax. Updating these docstrings would improve consistency with the modernized type annotations.Example for line 54:
Returns ------- - PrimExpr or List[PrimExpr] + PrimExpr or list[PrimExpr] The computed forward index expression(s).pyproject.toml (1)
90-100: Minor: Redundant exclude pattern for examples.Line 92 excludes
"examples/deepseek_v32/inference"at the top level, but line 99 uses a per-file-ignore pattern"examples/**.py"to ignore UP and FA rules for all examples. The top-level exclude is stricter (file-system level), while the per-file-ignore is rule-specific. If the intent is to exclude that specific examples directory from all linting rules, the top-level exclude is appropriate. If the intent is only to suppress UP/FA within examples (allowing other rules to apply), the per-file-ignore alone is sufficient. Consider clarifying or consolidating these patterns.tilelang/language/persistent.py (1)
14-26: Optional: Update docstring for consistency.The docstring could be updated to match the modernized signature:
- Line 18:
List[tir.PrimExpr]→list[tir.PrimExpr]- Line 24-25: Document that
group_sizecan beNone(currently only showstir.PrimExpr)Apply this diff to update the docstring:
"""Tools to construct persistent for loop. Parameters ---------- - domain : List[tir.PrimExpr] + domain : list[tir.PrimExpr] The list of dominators. wave_size : int The wave size. index : int The tile index in one wave. - group_size : tir.PrimExpr + group_size : tir.PrimExpr | None The group size. """tilelang/language/utils.py (1)
45-45: Type hint modernization looks good; consider updating docstrings for consistency.The conversion from
List[...]tolist[...]is correct and aligns with the pyupgrade rules enabled in this PR. However, the docstrings still referenceList(lines 51, 102), which creates a minor inconsistency with the updated type hints.Consider updating the docstrings to match:
- extents (List[tir.PrimExpr]): List of expressions defining the region size + extents (list[tir.PrimExpr]): List of expressions defining the region size- List[PrimExpr]: Coordinates for each dimension in the same order as `shape`. + list[PrimExpr]: Coordinates for each dimension in the same order as `shape`.Additionally, verify that any
from typing import Listimport (if present) was removed, as it's no longer needed.Also applies to: 72-72, 91-91
tilelang/language/tir/op.py (1)
1861-1861: Type annotations modernized correctly.The function signatures have been properly updated to use PEP 604 union syntax (
Span | None) instead of the legacyOptional[Span]. With thefrom __future__ import annotationsimport, these annotations will work correctly on Python 3.8+.Optional: Consider updating docstrings for consistency.
The parameter docstrings (lines 1869, 1888, 1910) still reference
Optional[Span]in prose form. While this is semantically correct and doesn't affect functionality, you could optionally update them toSpan | Nonefor consistency with the code annotations.Example for line 1869:
- span : Optional[Span] + span : Span | None The location of this operator in the source code.Also applies to: 1880-1880, 1899-1899
tilelang/language/tir/ir.py (1)
10-12: Consider fixing pre-existing implicit Optional annotations.While not introduced by this PR, the
stopparameter (inserial,parallel,vectorized,unroll,thread_binding) and thethreadparameter (inthread_binding) also have default values ofNonebut don't explicitly includeNonein their type annotations. Now thatfrom __future__ import annotationshas been added, these should ideally be updated toPrimExpr | Noneandstr | Nonerespectively for full PEP 484 compliance.Also applies to: 35-37, 60-62, 85-87, 110-114
tilelang/language/atomic.py (1)
4-4: Type annotations modernized correctly; docstrings should also be updated for consistency.The function signatures (lines 24, 70, 118) have been correctly updated to use PEP 604
str | Nonesyntax. However, the docstring parameter descriptions (lines 34, 82, 129) still referenceOptional[str]instead of the modernized syntax. For full consistency with the PR's modernization goals, update these docstrings to usestr | Noneas well:# Line 34 (and similarly for lines 82, 129): - memory_order (Optional[str]): Optional memory-order name (e.g. "relaxed", "acquire", "seq_cst"). + memory_order (str | None): Optional memory-order name (e.g. "relaxed", "acquire", "seq_cst").tilelang/carver/roller/shape_inference/common.py (1)
47-47: Consider adding type annotation forshapeparameter.The
shapeparameter currently lacks a type annotation. Based on the implementation (lines 50-52), it accepts either a sequence or a dictionary. Adding an annotation would improve type safety and align with this PR's modernization goals.Suggested annotation:
- def infer(self, shape, rstep: dict[str, int] | None = None): + def infer(self, shape: list | tuple | dict[str, list[arith.ConstIntBound]], rstep: dict[str, int] | None = None):tilelang/language/gemm.py (2)
10-21: Type annotations modernized correctly.The function signature has been properly updated to use PEP 604 union syntax (
|). The changes are syntactically correct and maintain compatibility with thefrom __future__ import annotationsimport.Consider updating the docstrings (lines 28-30, 39) to match the new annotation style for consistency:
Args: - A (Union[tir.Buffer, tir.Var]): First input matrix - B (Union[tir.Buffer, tir.Var]): Second input matrix - C (Union[tir.Buffer, tir.Var]): Output matrix for results + A (tir.Buffer | tir.Var): First input matrix + B (tir.Buffer | tir.Var): Second input matrix + C (tir.Buffer | tir.Var): Output matrix for results transpose_A (bool, optional): Whether to transpose matrix A. Defaults to False. transpose_B (bool, optional): Whether to transpose matrix B. Defaults to False. policy (GemmWarpPolicy, optional): Warp execution policy. Defaults to GemmWarpPolicy.Square. clear_accum (bool, optional): Whether to clear accumulator before computation. Defaults to False. k_pack (int, optional): Number of k dimensions packed into a single warp. Defaults to 1. wg_wait (int, optional): Warp group wait count. Defaults to 0. On hopper it is equivalent to `wgmma.wait_group.sync.aligned <wg_wait>` if wg_wait is not -1 On sm100, `wg_wait` can only be 0 or -1. `mbarrier_wait(TCGEN5MMA barrier)` will be appended if wg_wait is 0. - mbar (tir.Buffer, optional): mbarrier for TCGEN5MMA synchronization + mbar (tir.Buffer | None, optional): mbarrier for TCGEN5MMA synchronization
215-225:gemm_v2signature modernized correctly.The function signature follows the same modernization pattern as
gemm, using PEP 604 union syntax appropriately.Similarly, consider updating the docstrings (lines 232-234) for consistency:
Args: - A (Union[tir.Buffer, tir.Var]): First input matrix - B (Union[tir.Buffer, tir.Var]): Second input matrix - C (Union[tir.Buffer, tir.Var]): Output matrix for results + A (tir.Buffer | tir.Var): First input matrix + B (tir.Buffer | tir.Var): Second input matrix + C (tir.Buffer | tir.Var): Output matrix for results transpose_A (bool, optional): Whether to transpose matrix A. Defaults to False.tilelang/carver/matmul_analysis.py (1)
557-558: Consider refining the return type for better type safety.The return annotation
bool | dictis imprecise. Specifying the dict structure would improve type checking.Apply this diff:
- def analysis_tensorcore_tags(sch: tir.Schedule, block: BlockRV, target: Target) -> bool | dict: + def analysis_tensorcore_tags(sch: tir.Schedule, block: BlockRV, target: Target) -> bool | dict[str, list[int] | int]: tags: dict[str, list[int] | int] = {}tilelang/jit/adapter/dlpack.py (1)
11-15: Tighten Callable typing for returned function.Use collections.abc.Callable and annotate the inner function’s return (tensor or list of tensors).
Apply:
@@ -from tilelang.contrib.dlpack import to_pytorch_func +from tilelang.contrib.dlpack import to_pytorch_func +from collections.abc import Callable @@ - def _convert_torch_func(self) -> callable: + def _convert_torch_func(self) -> Callable[..., torch.Tensor | list[torch.Tensor]]: @@ - def func(*ins: list[torch.Tensor]): + def func(*ins: list[torch.Tensor]) -> torch.Tensor | list[torch.Tensor]:tilelang/contrib/cc.py (1)
211-254: Align docstring type with signature (Dict → dict).Minor consistency fix.
Apply:
- symbol_section_map: Dict[str, str] + symbol_section_map: dict[str, str]tilelang/transform/add_bufstore_wrapper.py (1)
72-90: Fix type of buffer indices (they are PrimExpr, not int).Indices from BufferLoad/BufferStore are PrimExprs; current hint is misleading.
Apply:
-from tvm.tir import (BufferStore, For, AttrStmt, ForKind, Var, PrimFunc, BufferLoad, Buffer, IntImm) +from tvm.tir import (BufferStore, For, AttrStmt, ForKind, Var, PrimFunc, BufferLoad, Buffer, IntImm, PrimExpr) @@ - def collect_buffer_indices(statement) -> dict[Buffer, list[int]]: + def collect_buffer_indices(statement) -> dict[Buffer, list[PrimExpr]]:Alternatively (no new import):
- def collect_buffer_indices(statement) -> dict[Buffer, list[int]]: + def collect_buffer_indices(statement) -> dict[Buffer, list["tvm.tir.PrimExpr"]]:tilelang/language/annotations.py (1)
25-25: Consider restoring type parameters for dict annotations.The unparameterized
dictloses type information compared to the originalDict[K, V]. Since the relevant code snippet fromtilelang/language/ast/ir.py(line 429) usesDict[str, Any]forblock_attr, consider usingdict[str, Any]here for consistency and better type safety.Apply this pattern to restore type parameters:
-def annotate_layout(layout_map: dict): +def annotate_layout(layout_map: dict[str, Any]):-def annotate_safe_value(safe_value_map: dict): +def annotate_safe_value(safe_value_map: dict[str, Any]):-def annotate_l2_hit_ratio(l2_hit_ratio_map: dict): +def annotate_l2_hit_ratio(l2_hit_ratio_map: dict[str, Any]):You'll also need to import
Any:-from typing import Callable +from typing import Any, CallableAlso applies to: 39-39, 47-47
tilelang/jit/adapter/cython/adapter.py (1)
330-339: Stream handling in ctypes callIf
streamis -1 (default) or None,ctypes.c_void_p(stream)yields an invalid pointer. If the C++ side treats -1 specially, fine; else map -1/None to 0.- ctypes_args.append(ctypes.c_void_p(stream)) + s = None if stream in (-1, None) else stream + ctypes_args.append(ctypes.c_void_p(0 if s is None else s))tilelang/carver/roller/hint.py (1)
181-189: pass_context typeIf
pass_contextflows into TVM pass configs, prefer a concrete type likedict[str, bool | int | str]to prevent accidental non-serializable values.tilelang/cache/kernel_cache.py (1)
367-369: Be explicit with file mode/encoding when reading sources.Opening text files without mode/encoding can be flaky on non‑UTF8 locales. Minor, but safer to specify.
- with open(wrapped_kernel_path) as f: + with open(wrapped_kernel_path, "r", encoding="utf-8") as f: kernel_global_source = f.read()tilelang/carver/roller/policy/default.py (1)
6-7: Annotate mutable class attributes with ClassVar to avoid shared state warnings.Ruff RUF012: nodes is a mutable class attribute. Mark it ClassVar (or move to init).
-from typing import Iterable +from typing import Iterable, ClassVar @@ - nodes: list[PrimFuncNode] = [] + nodes: ClassVar[list[PrimFuncNode]] = []Also applies to: 26-30
tilelang/autotuner/tuner.py (1)
592-608: configs attribute can be None; annotate Optional to satisfy linters.Minor cleanup to avoid implicit Optional on class attribute and mirror init signature.
- configs: dict | Callable = None + configs: dict | Callable | None = None @@ - self._tuner_cache: dict[tuple, tilelang.JITKernel] = {} # (args, kwargs) -> compiled kernel + self._tuner_cache: dict[tuple, tilelang.JITKernel] = {} # (args, kwargs) -> compiled kernelAlso applies to: 656-658
tilelang/jit/kernel.py (3)
195-197: Broaden type of out_idx parameter to match callers.
_compile_and_create_adapter(..., out_idx: list[int])is too narrow; callers passint | list[int] | None. Tight annotation obscures intent.- def _compile_and_create_adapter(self, tilelang_func: PrimFunc, - out_idx: list[int]) -> BaseKernelAdapter: + def _compile_and_create_adapter(self, tilelang_func: PrimFunc, + out_idx: list[int] | int | None) -> BaseKernelAdapter:
411-434: Docstring return section mismatches implementation.
update_tuner_resultreturnsself, notNone.- Returns - ------- - None + Returns + ------- + JITKernel + Self (for chaining)
299-308: Avoid shadowing/ignoring the target parameter.
_create_adapter_from_databasetakestargetbut immediately overwrites it withself.target. Either use the passed value or drop the parameter to reduce confusion and silence ARG* lints.- def _create_adapter_from_database(self, - params: list[KernelParam], - result_idx: list[int] | int, - target: str | Target, + def _create_adapter_from_database(self, + params: list[KernelParam], + result_idx: list[int] | int, func_or_mod: PrimFunc | tvm.runtime.Module, @@ - target = self.target + target = self.targetAnd update call sites accordingly.
tilelang/jit/adapter/nvrtc/adapter.py (1)
101-105: pass_configs is unused in from_database (ARG003).Either thread it through or acknowledge it’s intentionally unused.
Option A — thread through:
- def from_database(cls, + def from_database(cls, params: list[KernelParam], result_idx: list[int], target: str, func_or_mod: tir.PrimFunc | tvm.IRModule, kernel_global_source: str, kernel_lib_path: str, verbose: bool = False, - pass_configs: dict[str, Any] | None = None, + pass_configs: dict[str, Any] | None = None, compile_flags: list[str] | None = None): @@ - adapter.lib_generator = PyLibraryGenerator(adapter.target, adapter.verbose) + adapter.lib_generator = PyLibraryGenerator(adapter.target, adapter.verbose) + # (Optional) NVRTC path currently ignores pass configs; keep for parity if supported later. + # adapter.lib_generator.assign_pass_configs(pass_configs)Option B — silence lint:
- pass_configs: dict[str, Any] | None = None, + _pass_configs: dict[str, Any] | None = None,tilelang/jit/adapter/ctypes/adapter.py (3)
44-55: Broaden result_idx annotation to match BaseKernelAdapter._legalize_result_idx.Callers may pass
int | list[int] | None. Make this explicit.- result_idx: list[int], + result_idx: list[int] | int | None,
289-295: get_kernel_source return type can be Optional[str].It returns
Nonewhenkernel_only=Trueandkernel_global_sourceis unset.- def get_kernel_source(self, kernel_only: bool = False): + def get_kernel_source(self, kernel_only: bool = False) -> str | None:
40-43: Tighten cached shape typing (nit).
param_shapes: list[list]is vague. Prefer explicit element types for readability.- param_shapes: list[list] | None = None # Cache for parameter shapes + param_shapes: list[list[int | tir.Var]] | None = None # Cache for parameter shapes
|
Errors reported by nvidia runner passed locally on H20, the failures should be unrelated to this PR. |
* update rules * ruff check * other fixes * fmt * do not touch examples * fmt
and add FA102 for missing
from __future__ import annotations. Prevent issues like #959.Or use UP035 (with FA102) only if this is too wide.
This should be sufficient to keep py38 compatibility for a long time.
Summary by CodeRabbit
typingmodule imports to Python 3.10+ built-in generic syntax for improved code clarity and IDE support.