-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong results where one of the args is assigned to constant inside the kernel #741
Comments
I am tracking down a major bug that was introduced with the shared signature entry point. I suspect this is also related. |
Hmmm https://github.com/openai/triton/pull/742/files was a bug in the JIT so this shouldn't affect users of the compilation API, and this was also for constexprs that aren't clustered at the end. I'll dig deeper into this case |
JIT produces wrong values in both versions of the kernel, with compilation API one kernel is correct and one is wrong, so it still might be related to #742, let me check. |
I did a little digging. And the big difference is that the falling kernel is vectorized, while the working one isn't. This is funny since 2000 is a multiple of 16 |
Yes, but the stores use indirect addressing:
|
Setting Setting
(i.e., just adding parentheses) also seems to resolve the issues. So my money is that there's a bug deep down in the alignment analysis pass. The good news is that in general it is preferable to add parentheses around offset math, since it promotes int32 math over int64 pointer arithmetics (though in this particular case tmp0 is int64 so it doesn't matter), so this is something that torchinductor should probably do anyway. The bad news is that I will be away until Sunday, so I won't have time to properly fix this issue until next week. |
Thanks! Adding parentheses for indexing math is totally doable on our side! |
Address review comment: https://github.com/intel/intel-xpu-backend-for-triton/pull/734/files#r1537140155 Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
This might be related to #714. Repro (comments inside, requires torchdynamo unfortunately), tl;dr if the kernel has
xnumel=<const>
wherexnumel
is also a kernel arg, and is equal to the value ofxnumel
that is passed to the kernel (so should be a no-op, or even if it's used for optimization, shouldn't change results) it produces wrong results. Note that this is using the new runtime, with the old runtime both versions of the kernel produce wrong results.I'm happy to provide generated ptx if needed, or any additional info, given that repro requires dynamo, although to get wrong results minor changes can be made to disable pre-compilation and lose dynamo dependency.
Output:
The text was updated successfully, but these errors were encountered: