-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] [Codegen] Ensuring atleast one thread block to handle empty tensor #7273
Conversation
779e437
to
8c20809
Compare
@kevinthesun @masahi @mbrookhart @zhiics @trevor-m Please review. |
hmm, I think I've already added a fix for such cases, here: tvm/python/tvm/tir/ir_builder.py Lines 205 to 206 in 82942fb
Do you know why it is not working? cc @mbrookhart |
Is this because this the lines that you suggested are specific to IR Builder, while the failure that I see is for injective schedule? My failures was coming for an injective schedule. |
Yeah, I think this change catches it at a lower level. We might not need the ir_builder change after this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
Topk was failing on CUDA when k is a var and its value is 0 at runtime. At closer inspection I found that there are 0 thread blocks at runtime. This PR ensures that there is atleast 1 thread block.