Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a target option "merge-functions", and a corresponding -Z flag (works around #57356) #57268

Merged
merged 1 commit into from
Jan 19, 2019

Conversation

peterhj
Copy link
Contributor

@peterhj peterhj commented Jan 2, 2019

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around #57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and extern "ptx-kernel" functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler ptxas). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

Examples/more details

Consider an example Rust lib using extern "ptx-kernel" functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions foo and bar have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with ptxas (I'm on the CUDA 9.2 toolchain), we get an assembler error:

ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the bar function to call foo. However, directly calling an extern "ptx-kernel" function from another extern "ptx-kernel" is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1]

self.merge_functions = sess.opts.optimize == config::OptLevel::Default ||

[2]
add("-mergefunc-use-aliases");

[3] #56358
[4] #49479

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @zackmdavis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 2, 2019
@peterhj
Copy link
Contributor Author

peterhj commented Jan 2, 2019

r? @alexcrichton

@peterhj peterhj force-pushed the peterhj-optmergefunc branch from 7e053a8 to 4aba371 Compare January 2, 2019 15:45
Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

/// Whether the MergeFunctions LLVM pass should run for this target.
/// The MergeFunctions pass is generally useful, but some targets may need
/// to opt out. Defaults to `true`.
pub merge_functions: bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target option could probably be forward looking and allow specifying any of merge-functions = disabled, merge-functions = trampolines and merge-functions = aliases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit adds "merge-functions" options for "aliases", "trampolines", and "disabled", as well as a matching -Z flag.

@nagisa
Copy link
Member

nagisa commented Jan 2, 2019

It would also be nice to gain a -Z flag that controls merge-functions. I wanted it at least once for tests & it seems that it is applicable more widely than I initially expected.

MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and extern "ptx-kernel" functions are currently not compatible with each other.

Does the NVPTX target, perchance, support the non-extern "ptx-kernel" functions at all? In that case this solution is not great as it would disable the optimisation (the trampoline version could still be used, right?) target-wide even for internal functions that use the compatible ABI.

@alexcrichton
Copy link
Member

@bors: r+

This seems like a great start! We can always add more bells and whistles in the future too. I'd arguably say that this is an LLVM bug, but I'm fine fixing it on our end

@bors
Copy link
Contributor

bors commented Jan 2, 2019

📌 Commit 4aba371a6c16784c5938bd53badaa38a45629257 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 2, 2019
@nikic
Copy link
Contributor

nikic commented Jan 2, 2019

Yes, this is definitely an LLVM bug and it would be good to fix it there in the long term.

Would it be possible to do something similar to how weak functions are handled? I.e., when merging functions with the ptx-kernel CC, instead of simply calling one function from the other, we would instead create a new function (with default CC) and make both functions call the new function. Would NVPTX support this kind of scheme?

@nagisa
Copy link
Member

nagisa commented Jan 2, 2019

Yes, this is definitely an LLVM bug

If we are landing this, then at least an LLVM issue should be filled & cross referenced to a corresponding Rust issue.

@peterhj
Copy link
Contributor Author

peterhj commented Jan 3, 2019

It would also be nice to gain a -Z flag that controls merge-functions. I wanted it at least once for tests & it seems that it is applicable more widely than I initially expected.

The latest commit will add a -Z merge-functions as well, and both the -Z flag and the target option have the same options ("disabled", "trampolines", "aliases"). The -Z flag has precedence over the target option.

Does the NVPTX target, perchance, support the non-extern "ptx-kernel" functions at all? In that case this solution is not great as it would disable the optimisation (the trampoline version could still be used, right?) target-wide even for internal functions that use the compatible ABI.
...
Would it be possible to do something similar to how weak functions are handled? I.e., when merging functions with the ptx-kernel CC, instead of simply calling one function from the other, we would instead create a new function (with default CC) and make both functions call the new function. Would NVPTX support this kind of scheme?

Non-extern "ptx-kernel" functions are "device" functions which can call each other and be called by extern "ptx-kernel" functions. My observation though is that the NVPTX backend prefers to inline those functions rather than emit calls, so there should not be much fallout on device functions when disabling MergeFunctions. It definitely warrants some investigation along with the general LLVM issue for MergeFunctions+NVPTX and the potential fix.

@peterhj peterhj changed the title Add a target option "merge-functions" Add a target option "merge-functions", and a corresponding -Z flag Jan 3, 2019
"aliases" => {
add("-mergefunc-use-aliases");
}
k => panic!("unknown merge-functions kind: {}", k),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use bug! instead of panic!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a MergeFunctions enum now, so no more wildcard arm

@@ -1380,6 +1380,9 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
"whether to use the PLT when calling into shared libraries;
only has effect for PIC code on systems with ELF binaries
(default: PLT is disabled if full relro is enabled)"),
merge_functions: Option<String> = (None, parse_opt_string, [TRACKED],
"control the operation of the MergeFunctions LLVM pass, taking
the same values as the target option of the same name"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause an ICE later when invalid values are provided as an argument, because argument parsing does not verify the valid values. Consider validating this in argument parsing.

fn parse_panic_strategy(slot: &mut Option<PanicStrategy>, v: Option<&str>) -> bool {
match v {
Some("unwind") => *slot = Some(PanicStrategy::Unwind),
Some("abort") => *slot = Some(PanicStrategy::Abort),
_ => return false
}
true
}

is a good example of how arguments should be handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with a MergeFunctions enum

@kennytm kennytm added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 3, 2019
@nagisa
Copy link
Member

nagisa commented Jan 3, 2019

@bors r+ Thanks!

@bors
Copy link
Contributor

bors commented Jan 3, 2019

📌 Commit 3cd40139a11396adc697802c4b6454c248448777 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 3, 2019
@nikic
Copy link
Contributor

nikic commented Jan 5, 2019

I've reported https://bugs.llvm.org/show_bug.cgi?id=40232 upstream to track this issue.

"trampolines", or "aliases (the default)) to allow targets to opt out of
the MergeFunctions LLVM pass. Also add a corresponding -Z option with
the same name and values.

This works around: rust-lang#57356

Motivation:

Basically, the problem is that the MergeFunctions pass, which rustc
currently enables by default at -O2 and -O3, and `extern "ptx-kernel"`
functions (specific to the NVPTX target) are currently not compatible
with each other. If the MergeFunctions pass is allowed to run, rustc can
generate invalid PTX assembly (i.e. a PTX file that is not accepted by
the native PTX assembler ptxas). Therefore we would like a way to opt
out of the MergeFunctions pass, which is what our target option does.

Related work:

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3,
and also to enable the use of function aliases within MergeFunctions.
MergeFunctions both with and without function aliases is incompatible with
the NVPTX target.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to
the MergeFunctions pass, but it is not enabled by default.
@peterhj peterhj force-pushed the peterhj-optmergefunc branch from 3cd4013 to b91d211 Compare January 5, 2019 18:01
@peterhj peterhj changed the title Add a target option "merge-functions", and a corresponding -Z flag Add a target option "merge-functions", and a corresponding -Z flag (works around #57356) Jan 5, 2019
@Centril
Copy link
Contributor

Centril commented Jan 15, 2019

@bors r=nagisa

@bors
Copy link
Contributor

bors commented Jan 15, 2019

📌 Commit b91d211 has been approved by nagisa

Centril added a commit to Centril/rust that referenced this pull request Jan 15, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
bors added a commit that referenced this pull request Jan 16, 2019
Rollup of 6 pull requests

Successful merges:

 - #56884 (rustdoc: overhaul code block lexing errors)
 - #57065 (Optimize try_mark_green and eliminate the lock on dep node colors)
 - #57107 (Add a regression test for mutating a non-mut #[thread_local])
 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57551 (resolve: Add a test for issue #57539)
 - #57598 (Add missing unpretty option help message)

Failed merges:

r? @ghost
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 17, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
Centril added a commit to Centril/rust that referenced this pull request Jan 19, 2019
Add a target option "merge-functions", and a corresponding -Z flag (works around rust-lang#57356)

This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.

This works around rust-lang#57356.

cc @eddyb @japaric @oli-obk @nox @nagisa

Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!

### Motivation

Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.

### Related work

The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.

clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.

### Examples/more details

Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:

    LLVM ERROR: Module has aliases, which NVPTX does not support.

This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.

Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:

    ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error   : Illegal call target, device function expected
    ptxas fatal   : Ptx assembly aborted due to errors

What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.

If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx

[1] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_ssa/back/write.rs#L155
[2] https://github.com/rust-lang/rust/blob/a36b960df626cbb8bea74f01243318b73f0bd201/src/librustc_codegen_llvm/llvm_util.rs#L64
[3] rust-lang#56358
[4] rust-lang#49479
bors added a commit that referenced this pull request Jan 19, 2019
Rollup of 10 pull requests

Successful merges:

 - #57268 (Add a target option "merge-functions", and a corresponding -Z flag (works around #57356))
 - #57476 (Move glob map use to query and get rid of CrateAnalysis)
 - #57501 (High priority resolutions for associated variants)
 - #57573 (Querify `entry_fn`)
 - #57610 (Fix nested `?` matchers)
 - #57634 (Remove an unused function argument)
 - #57653 (Make the contribution doc reference the guide more)
 - #57666 (Generalize `huge-enum.rs` test and expected stderr for more cross platform cases)
 - #57698 (Fix typo bug in DepGraph::try_mark_green().)
 - #57746 (Update README.md)

Failed merges:

r? @ghost
@bors
Copy link
Contributor

bors commented Jan 19, 2019

⌛ Testing commit b91d211 with merge c87144f...

@bors bors merged commit b91d211 into rust-lang:master Jan 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants