-
Notifications
You must be signed in to change notification settings - Fork 332
[Bugfix] Enhance LetStmt handling in Vectorize Loop Pass #1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ethod and improve let variable handling
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughThe vectorizer gains a public static entry point Changes
Sequence DiagramsequenceDiagram
participant LoopVec as LoopVectorizer
participant Entry as TLVectorizer::Vectorize()
participant TV as TLVectorizer (state)
participant Scalar as Scalarization
LoopVec->>Entry: Vectorize(var, var_lanes, body)
activate Entry
Entry->>TV: initialize `let_var_map_` and `let_value_binding_`
activate TV
TV->>TV: visit LetStmt / LetExpr\npopulate binding maps
note right of TV #E3F2FD: bindings recorded for\nlater rebinding
TV->>Scalar: perform scalarization\n(detect used let-bound vars)
activate Scalar
note right of Scalar #E8F5E9: create local `scalarized_stmt`
Scalar->>TV: return scalarized_stmt
deactivate Scalar
TV->>TV: wrap scalarized_stmt with\nLetStmt bindings from `let_value_binding_`
TV->>Entry: return vectorized Stmt
deactivate TV
Entry->>LoopVec: result
deactivate Entry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
testing/python/language/test_tilelang_language_let.py (1)
5-5: Remove unused import.The
map_torch_typeimport is not used anywhere in this test file.Apply this diff to remove the unused import:
-from tilelang.utils.tensor import map_torch_type
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/transform/vectorize_loop.cc(9 hunks)testing/python/language/test_tilelang_language_let.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
testing/python/language/test_tilelang_language_let.py (2)
tilelang/utils/tensor.py (1)
map_torch_type(21-38)tilelang/jit/__init__.py (1)
compile(30-79)
🪛 Ruff (0.14.2)
testing/python/language/test_tilelang_language_let.py
12-12: Loop control variable blockIdx not used within loop body
Rename unused blockIdx to _blockIdx
(B007)
13-13: Loop control variable threadIdx not used within loop body
Rename unused threadIdx to _threadIdx
(B007)
🔇 Additional comments (9)
testing/python/language/test_tilelang_language_let.py (2)
12-13: Thread binding variables: static analysis false positive.The static analysis tool flags
blockIdxandthreadIdxas unused. In TVM/TileLang, thread binding variables establish the execution context and do not require explicit references within the loop body. This is expected behavior.
7-19: Test correctly validates vectorized let-binding code generation.The test verifies that a let-bound vectorized load (
b: T.float32x4) is correctly emitted as a CUDAfloat4declaration in the generated source. The test focuses on code generation correctness, which aligns with the PR's objective to enhance LetStmt handling in the vectorizer.src/transform/vectorize_loop.cc (7)
36-36: LGTM: Include added for new set usage.The
<unordered_set>include is correctly added to support theused_let_bound_varsset introduced in the enhancedScalarizemethod.
212-218: LGTM: Clean static entry point improves API.The new static
Vectorizemethod provides a clean, single-entry API that encapsulates the construction-then-invocation pattern. This improves usability at call sites.
225-235: LGTM: Clearer scalarization flow.Extracting the scalarized statement into a local variable improves readability and makes the control flow more explicit.
414-415: LGTM: Let-binding refactor improves correctness.The split of
let_binding_intolet_var_map_(variable substitution) andlet_value_binding_(value tracking) provides better separation of concerns and enables correct scalarization of let-bound variables. The implementation consistently maintains both maps acrossLetExprandLetStmtvisitors.Also applies to: 544-557, 667-680
704-727: LGTM: Enhanced scalarization correctly handles let-bound variables.The enhanced
Scalarizemethod correctly:
- Identifies all let-bound variables used within the statement
- Substitutes the loop variable with the scalar index
- Re-establishes let bindings by wrapping the scalarized statement with
LetStmtnodes, using the scalarized bound valuesThe key insight is that
Scalarizeoperates on the original (pre-vectorization) statement (line 229), ensuring that variable lookups inlet_value_binding_retrieve the correct unvectorized expressions for substitution.Minor improvement: the variable suffix changed from
".s"to"_s", which is clearer and follows conventional naming.
742-746: LGTM: Member variables properly documented.The member variable declarations are clearly documented and consistent with the refactored let-binding mechanism.
844-844: LGTM: Updated to use new static entry point.The integration correctly uses the new
TLVectorizer::Vectorizestatic method, providing a cleaner call site.
* [Refactor] Enhance TLVectorizer with loop vectorization convenience method and improve let variable handling * lint fix * let test fix * lint fix
This pull request refactors the let-binding logic in the
TLVectorizerclass to improve correctness and maintainability when vectorizing loops, especially those involving let-bound variables. It introduces new data structures to track variable mappings and their bound values, updates substitution and scalarization logic to handle nested let-bindings, and adds a new convenience entry point for vectorization. Additionally, a new test is introduced to verify correct vectorization of let-bound loads.Vectorizer refactor and let-binding improvements
let_binding_map with two new maps:let_var_map_(tracks variable mapping) andlet_value_binding_(tracks mapping from new variables to their bound values), and updated all relevant logic to use these new structures for handling let-bindings in vectorization. [1] [2] [3]Scalarizemethod to substitute let-bound variables correctly in the presence of nested or reused let-bindings, ensuring that all relevant variables are properly scoped and substituted during scalarization.Vectorizemethod toTLVectorizer, simplifying how vectorization is invoked from call sites. [1] [2]Code cleanup and correctness
let_binding_tolet_var_map_throughout the class to reflect the new data structure and its purpose. [1] [2] [3]".s"to"_s"in scalarization index variable names). [1] [2]Testing
test_let_vectorize_loadintesting/python/language/test_tilelang_language_let.pyto verify that let-bound loads are correctly vectorized, ensuring that the generated CUDA code contains the expected vectorized variable declaration.Summary by CodeRabbit
Bug Fixes
New Features
Tests