-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with non-even division of block count and the else
path of an element-wise broadcast kernel?
#1785
Comments
As a note, if I make the This is an example:
|
This is a register aliasing problem. Fixing it now. |
I would be curious to know which part of the kernel was in error if you don't mind pointing it out. |
^^^
|
To unblock, going for a quicker fix that blanket disables a wider range of scenarios. Will improve the analysis precision in follow ups. |
🐛 Describe the bug
There seems to be broadcast issue in our TOT code as the following code produces a max difference of
80
in given element.Example Code:
In the NGC 22.05 container,
unsqueeze()
is fused and the blocking is different. For 22.05, the number of blocks corresponds toT2
's outer 2 dimensions1024 * 192
. In TOT, the outer 2 dimensions inT1
1024 * 192
does not divide evenly by the the number of blocks65535
.65536 * 3 == 1024 * 192
. I am guessing something is wrong in the else path as you step through non-vectorized loads of the remainder ofT1
. I didn't see any obvious differences on theif-then
path.Fusion IR:
Launch Params:
Grid(196608, 1, 1) Block(96, 1, 1)
Kernel:
For TOT :
graph:
Fusion IR:
Launch Params:
Grid(1, 65535, 1) Block(96, 1, 1)
Kernel:
Versions
TOT
The text was updated successfully, but these errors were encountered: