Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVVMReflect] Force dead branch elimination in NVVMReflect #81189

Merged
merged 3 commits into from
Feb 8, 2024

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Feb 8, 2024

Summary:
The __nvvm_reflect function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The __nvvm_reflect pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the __nvvm_reflect call in
order to preserve O0 semantics as much as possible. This should allow
the following to work on NVPTX targets

int foo() {
  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
    asm("valid;\n");
}

Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets

```c
int foo() {
  if (__nvvm_reflect__("__CUDA_ARCH") >= 700)
    asm("valid;\n");
}
```
@jhuber6 jhuber6 changed the title [NNVMReflect] Force dead branch elimination in NNVMReflect [NVVMReflect] Force dead branch elimination in NNVMReflect Feb 8, 2024
@jhuber6 jhuber6 changed the title [NVVMReflect] Force dead branch elimination in NNVMReflect [NVVMReflect] Force dead branch elimination in NVVMReflect Feb 8, 2024
Copy link
Member

@jlebar jlebar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems very reasonable to me, thanks.

llvm/lib/Target/NVPTX/NVVMReflect.cpp Outdated Show resolved Hide resolved
Comment on lines +180 to +183
for (User *U : Call->users())
if (ICmpInst *I = dyn_cast<ICmpInst>(U))
ToSimplify.push_back(I);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly conservative, IMO. E.g. it may not handle something like switch(__nvvm_reflect("__CUDA_ARCH")), or if ((__nvvm_reflect("__CUDA_ARCH") / 10) == 70)

I think, ideally, we may want to iterate upwards the use tree, as long as the current subtree evaluates to a constant, until we reach a switch/branch/select where we can now pick the correct branch.

On one hand it may be an overkill, and we could just document that this improvement works for if only. Even that would implicitly rely on the pass happening very early in the pipeline, before LLVM gets to possibly convert that if into select. if-only solution may be OK for now, but if would be great to make sure its behavior is consistent for other forms of conditionals.

I'm OK with the patch in this form, but we should add a TODO outlining the still missing pieces.

Maybe add a few test cases showing where it does not work at the moment.

@jhuber6 jhuber6 merged commit 9211e67 into llvm:main Feb 8, 2024
4 of 5 checks passed
@jhuber6
Copy link
Contributor Author

jhuber6 commented Feb 8, 2024

Seems this broke the CUDA buildbot. I'll revert for now and try to reland it later.

jhuber6 added a commit that referenced this pull request Feb 8, 2024
…81189)"

This reverts commit 9211e67.

Summary:
This seemed to crash one one of the CUDA math tests. Revert until it can
be fixed.
jhuber6 added a commit that referenced this pull request Feb 9, 2024
…81189)

Summary:
The `__nvvm_reflect` function is used to guard invalid code that varies
between architectures. One problem with this feature is that if it is
used without optimizations, it will leave invalid code in the module
that will then make it to the backend. The `__nvvm_reflect` pass is
already mandatory, so it should do some trivial branch removal to ensure
that constants are handled correctly. This dead branch elimination only
works in the trivial case of a compare on a branch and does not touch
any conditionals that were not realted to the `__nvvm_reflect` call in
order to preserve `O0` semantics as much as possible. This should allow
the following to work on NVPTX targets

```c
int foo() {
  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
    asm("valid;\n");
}
```

Relanding after fixing a bug.
// false value. Simply replace it with an unconditional branch to the
// appropriate basic block and delete the rest if it is trivially dead.
DenseSet<Instruction *> Removed;
for (BranchInst *Branch : Simplified) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantFoldTerminator? I also thought there was an API to just call simplifyCFG on a single block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's also in Local.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that, thanks. I'll probably make an updated version that also handles the Switch case since this function makes that trivial.

Side note, how do we handle the ROCm-DL constants? I remember we have some mandatory constant prop + DCE in a similar case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incompatible subtarget features on each function block inlining. The incompatible functions pass deletes functions. It's a high maintenance system to keep all of those pieces consistent, but in principle incompatible code never exists in the same function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants