Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR introduces the support for vectorized tir.ShuffleNode, which is useful for extracting bits and converting to float4, since float4 is sub-byte.

Prior to this PR, ShuffleNode is not supported in vectorization. This PR allows vectorizing ShuffleNode subject to special patterns, and still throws error for ShuffleNodes that don't meet the pattern requirements.

This PR introduces the support for vectorized tir.ShuffleNode,
which is useful for extracting bits and converting to float4,
since float4 is sub-byte.

Prior to this PR, ShuffleNode is not supported in vectorization.
This PR allows vectorizing ShuffleNode subject to special patterns,
and still throws error for ShuffleNodes that don't meet the pattern
requirements.
@tqchen tqchen merged commit 435f641 into apache:main Mar 15, 2025
15 checks passed
MasterJH5574 added a commit that referenced this pull request May 4, 2025
)

This PR overrides `PrintVecElemLoad()` and `PrintVecElemStore()`
for the WebGPU backend.

Otherwise, we would generate things like `(QK_local[0i].s0)` for
WebGPU, which is not a valid syntax in WGSL.
Instead, we generate `(QK_local[0i][0])` after this PR. `QK_local` here
is a `array<vec4<f32>, 1>`. 

This issue prevented WebLLM from generating the correct kernel
after #17748

Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
This PR introduces the support for vectorized tir.ShuffleNode,
which is useful for extracting bits and converting to float4,
since float4 is sub-byte.

Prior to this PR, ShuffleNode is not supported in vectorization.
This PR allows vectorizing ShuffleNode subject to special patterns,
and still throws error for ShuffleNodes that don't meet the pattern
requirements.
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
…che#17917)

This PR overrides `PrintVecElemLoad()` and `PrintVecElemStore()`
for the WebGPU backend.

Otherwise, we would generate things like `(QK_local[0i].s0)` for
WebGPU, which is not a valid syntax in WGSL.
Instead, we generate `(QK_local[0i][0])` after this PR. `QK_local` here
is a `array<vec4<f32>, 1>`. 

This issue prevented WebLLM from generating the correct kernel
after apache#17748

Co-authored-by: Ruihang Lai <ruihangl@cs.cmu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants