-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVVMReflect] Force dead branch elimination in NVVMReflect #81189
Conversation
Summary: The `__nvvm_reflect` function is used to guard invalid code that varies between architectures. One problem with this feature is that if it is used without optimizations, it will leave invalid code in the module that will then make it to the backend. The `__nvvm_reflect` pass is already mandatory, so it should do some trivial branch removal to ensure that constants are handled correctly. This dead branch elimination only works in the trivial case of a compare on a branch and does not touch any conditionals that were not realted to the `__nvvm_reflect` call in order to preserve `O0` semantics as much as possible. This should allow the following to work on NVPTX targets ```c int foo() { if (__nvvm_reflect__("__CUDA_ARCH") >= 700) asm("valid;\n"); } ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems very reasonable to me, thanks.
for (User *U : Call->users()) | ||
if (ICmpInst *I = dyn_cast<ICmpInst>(U)) | ||
ToSimplify.push_back(I); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is overly conservative, IMO. E.g. it may not handle something like switch(__nvvm_reflect("__CUDA_ARCH"))
, or if ((__nvvm_reflect("__CUDA_ARCH") / 10) == 70)
I think, ideally, we may want to iterate upwards the use tree, as long as the current subtree evaluates to a constant, until we reach a switch/branch/select where we can now pick the correct branch.
On one hand it may be an overkill, and we could just document that this improvement works for if
only. Even that would implicitly rely on the pass happening very early in the pipeline, before LLVM gets to possibly convert that if
into select
. if-only solution may be OK for now, but if would be great to make sure its behavior is consistent for other forms of conditionals.
I'm OK with the patch in this form, but we should add a TODO outlining the still missing pieces.
Maybe add a few test cases showing where it does not work at the moment.
Seems this broke the CUDA buildbot. I'll revert for now and try to reland it later. |
…81189) Summary: The `__nvvm_reflect` function is used to guard invalid code that varies between architectures. One problem with this feature is that if it is used without optimizations, it will leave invalid code in the module that will then make it to the backend. The `__nvvm_reflect` pass is already mandatory, so it should do some trivial branch removal to ensure that constants are handled correctly. This dead branch elimination only works in the trivial case of a compare on a branch and does not touch any conditionals that were not realted to the `__nvvm_reflect` call in order to preserve `O0` semantics as much as possible. This should allow the following to work on NVPTX targets ```c int foo() { if (__nvvm_reflect("__CUDA_ARCH") >= 700) asm("valid;\n"); } ``` Relanding after fixing a bug.
// false value. Simply replace it with an unconditional branch to the | ||
// appropriate basic block and delete the rest if it is trivially dead. | ||
DenseSet<Instruction *> Removed; | ||
for (BranchInst *Branch : Simplified) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConstantFoldTerminator? I also thought there was an API to just call simplifyCFG on a single block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's also in Local.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that, thanks. I'll probably make an updated version that also handles the Switch case since this function makes that trivial.
Side note, how do we handle the ROCm-DL constants? I remember we have some mandatory constant prop + DCE in a similar case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incompatible subtarget features on each function block inlining. The incompatible functions pass deletes functions. It's a high maintenance system to keep all of those pieces consistent, but in principle incompatible code never exists in the same function
Summary:
The
__nvvm_reflect
function is used to guard invalid code that variesbetween architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The
__nvvm_reflect
pass isalready mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the
__nvvm_reflect
call inorder to preserve
O0
semantics as much as possible. This should allowthe following to work on NVPTX targets