Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR fixes a bug in the general reduction dlight rule, which happens when there is a trailing spatial block, and for the previous reduction blocks, the reduction axes are not on the back.

In the case above, the loop orders of the reduction blocks and the trailing spatial block are inconsistent, while the dlight rule before this fix always treat the loop orders as consistent.

As a result, though the function after applying the rule is numerically correct, it may require much extra shared memory use (in proportion to the size of spatial loops). And when the spatial dimensions are large, the required share memory size may exceed the device limit.

This PR fixes this bug and adds a unit test.

This PR fixes a bug in the general reduction dlight rule, which happens
when there is a trailing spatial block, and for the previous reduction
blocks, the reduction axes are not on the back.

In the case above, the loop orders of the reduction blocks and the
trailing spatial block are inconsistent, while the dlight rule before
this fix always treat the loop orders as consistent.

As a result, though the function after applying the rule is numerically
correct, it may require much extra shared memory use (in proportion to
the size of spatial loops). And when the spatial dimensions are large,
the required share memory size may exceed the device limit.

This PR fixes this bug and adds a unit test.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2025-03-16-dlight-sfm branch from 2c257a6 to 64b7520 Compare March 17, 2025 17:38
@jinhongyii jinhongyii merged commit dafd053 into apache:main Mar 17, 2025
15 checks passed
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
apache#17754)

This PR fixes a bug in the general reduction dlight rule, which happens
when there is a trailing spatial block, and for the previous reduction
blocks, the reduction axes are not on the back.

In the case above, the loop orders of the reduction blocks and the
trailing spatial block are inconsistent, while the dlight rule before
this fix always treat the loop orders as consistent.

As a result, though the function after applying the rule is numerically
correct, it may require much extra shared memory use (in proportion to
the size of spatial loops). And when the spatial dimensions are large,
the required share memory size may exceed the device limit.

This PR fixes this bug and adds a unit test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants