Support for disabling PLT for better function call performance #54592

GabrielMajeri · 2018-09-26T16:55:35Z

This PR gives rustc the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

This is a little known feature which is supported by GCC and Clang as -fno-plt (some Linux distros enable it by default for all builds).

Implementation inspired by this patch which adds -fno-plt support to Clang.

Performance

I didn't run a lot of benchmarks, but these are the results on my machine for a clap benchmark:

 name              control ns/iter  no-plt ns/iter  diff ns/iter  diff %  speedup 
 build_app_long    11,097           10,733                  -364  -3.28%   x 1.03 
 build_app_short   11,089           10,742                  -347  -3.13%   x 1.03 
 build_help_long   186,835          182,713               -4,122  -2.21%   x 1.02 
 build_help_short  80,949           78,455                -2,494  -3.08%   x 1.03 
 parse_clean       12,385           12,044                  -341  -2.75%   x 1.03 
 parse_complex     19,438           19,017                  -421  -2.17%   x 1.02 
 parse_lots        431,493          421,421              -10,072  -2.33%   x 1.02

A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. This comment suggests that, in some cases, -fno-plt could improve PIC/PIE code performance by 10%.

Security benefits

Bonus: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for retpoline.

Remaining PLT calls

The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with CFLAGS=-fno-plt CXXFLAGS=-fno-plt removes them.

rust-highfive · 2018-09-26T16:55:45Z

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

nikomatsakis · 2018-09-26T17:21:40Z

cc @rust-lang/compiler — I'm not expert on this, but based on the description, seems like a "no brainer". Is there a catch?

nikomatsakis · 2018-09-26T17:22:22Z

@rfcbot fcp merge

I move that we merge this PR. As I wrote before, I'm not an expert on this stuff; the fact though that some distros enable the flag by default suggests we might as well do it. I'm curious whether anyone knows of any downsides or reasons not to do it.

rfcbot · 2018-09-26T17:22:22Z

nikomatsakis · 2018-09-26T17:24:45Z

cc @cuviper — seems like something you might know about :)

eddyb · 2018-09-26T17:31:36Z

cc @alexcrichton

cuviper · 2018-09-26T17:50:25Z

AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already enables full relro for security, lazy binding was disabled anyway.

Note that PPC64 is only defaulting to partial relro due to an old ld.so bug in bind-now.
https://github.com/rust-lang/rust/pull/43170/files#diff-b2d51315427bd679ca33d47167e82171R20

There's also an option for -Z relro-level={full,partial,off}. I'll try to see if similar PPC64 issues arise with -fno-plt, but my initial feeling is that we should only enable this in conjunction with relro-level=full.

rust-highfive · 2018-09-26T18:04:14Z

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:55:36] ....................................................................................................
[00:55:39] ..............................................................i.....................................
[00:55:42] ....................................................................................................
[00:55:45] ....................................................................................................
[00:55:48] ...........iiiiiiiii................................................................................
[00:55:53] ....................................................................................................
[00:55:57] ...............................................................................................i....
[00:56:00] ....................................................................................................
[00:56:03] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:04:25] 
[01:04:25] running 107 tests
[01:04:28] i..ii...iii....i...i............iii...........i....Fi....ii...i.i.ii..............i...ii..ii.i....ii
[01:04:28] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:04:28] failures:
[01:04:28] 
[01:04:28] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:04:28] 
[01:04:28] 
[01:04:28] error: verification with 'FileCheck' failed
[01:04:28] status: exit code: 1
[01:04:28] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] stderr:
[01:04:28] stderr:
[01:04:28] ------------------------------------------
[01:04:28] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:04:28] // CHECK: Function Attrs: naked uwtable
[01:04:28]           ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:04:28] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:04:28] ^
[01:04:28] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:04:28] ; Function Attrs: naked nonlazybind uwtable
[01:04:28] 
[01:04:28] ------------------------------------------
[01:04:28] 
[01:04:28] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:04:28] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:04:28] 
[01:04:28] 
[01:04:28] 
[01:04:28] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:04:28] 
[01:04:28] 
[01:04:28] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Build completed unsuccessfully in 0:17:44
[01:04:28] Makefile:58: recipe for target 'check' failed
[01:04:28] make: *** [check] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0d3ba4a0
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:0204f06f:start=1537985051202083688,finish=1537985051358059111,duration=155975423
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:1243b65a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:23d227fe
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

GabrielMajeri · 2018-09-26T18:23:30Z

I'm not sure if this is the right place in the codegen to enable this attribute, or if we'd better enable it somewhere else.

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

nagisa · 2018-09-26T18:31:20Z

Please add a flag that controls this behaviour. For now it can be a debug -Z flag, similar to other such flags (e.g. -Zmutable-noalias).

Also, I'm not sure how to fix the failing test. Is it possible for CHECK: Function Attrs to allow extra attributes in the output, besides the ones being tested?

CHECK lines only check for matching line prefix. That test in particular seems to be testing for naked only, in which case you can probably remove the other attribute from the CHECK line to have it pass. Alternatively, you might have some success with pattern matching syntax.

nagisa · 2018-09-26T18:35:28Z

objdump says that the number of function calls using the PLT goes down, but a lot of functions calls are still using the PLT. So I'm guessing I have to add this in other places too, but I'm not very familiar with the code.

A large number of @PLT symbols likely come from outside the rust ecosystem (e.g. glibc, llvm, etc.). Those might need to be taken care of independently (by changing build system configuration, perhaps?). You might want to submit a similar patch to the cc crate.

(Addressed not to author, but somebody who knows how to do perf runs) I also think a perf run would be great, but not sure how to start it.

varkor · 2018-09-26T18:50:17Z

@bors try

@michaelwoerister

[WIP] Support for disabling PLT for better function call performance This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection. AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway. This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds). Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang. ## Performance I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs): ``` name control ns/iter no-plt ns/iter diff ns/iter diff % speedup build_app_long 11,097 10,733 -364 -3.28% x 1.03 build_app_short 11,089 10,742 -347 -3.13% x 1.03 build_help_long 186,835 182,713 -4,122 -2.21% x 1.02 build_help_short 80,949 78,455 -2,494 -3.08% x 1.03 parse_clean 12,385 12,044 -341 -2.75% x 1.03 parse_complex 19,438 19,017 -421 -2.17% x 1.02 parse_lots 431,493 421,421 -10,072 -2.33% x 1.02 ``` A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%. ## To do - [ ] Do a perf run to see the effect this has on the compiler (cc @michaelwoerister), and possibly run benchmarks on some more crates - [ ] Add a code gen test - [ ] Should this be always enabled or should it be behind a command line option? If so, what should it be called? `-Z no-plt`? `-Z plt=no`?

bors · 2018-09-26T18:50:29Z

⌛ Trying commit ddf98c1 with merge 5747631...

rust-highfive · 2018-09-26T19:34:11Z

The job x86_64-gnu-llvm-5.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:58:24] ....................................................................................................
[00:58:27] ..............................................................i.....................................
[00:58:30] ....................................................................................................
[00:58:33] ....................................................................................................
[00:58:36] ............iiiiiiiii...............................................................................
[00:58:42] ....................................................................................................
[00:58:46] ...............................................................................................i....
[00:58:49] ....................................................................................................
[00:58:52] .......................................................i.i..ii......................................
---
travis_time:start:test_codegen
Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[01:07:32] 
[01:07:32] running 107 tests
[01:07:35] i..ii...iii....i...i............iii...........i.....iF...ii...i.i.ii..............i...ii..ii.i....ii
[01:07:35] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:496:22
[01:07:35] failures:
[01:07:35] 
[01:07:35] ---- [codegen] codegen/naked-functions.rs stdout ----
[01:07:35] 
[01:07:35] 
[01:07:35] error: verification with 'FileCheck' failed
[01:07:35] status: exit code: 1
[01:07:35] command: "/usr/lib/llvm-5.0/bin/FileCheck" "--input-file" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll" "/checkout/src/test/codegen/naked-functions.rs"
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] stderr:
[01:07:35] stderr:
[01:07:35] ------------------------------------------
[01:07:35] /checkout/src/test/codegen/naked-functions.rs:18:11: error: expected string not found in input
[01:07:35] // CHECK: Function Attrs: naked uwtable
[01:07:35]           ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:1:1: note: scanning from here
[01:07:35] ; ModuleID = 'naked_functions.3a1fbbbh-cgu.0'
[01:07:35] ^
[01:07:35] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/naked-functions/naked-functions.ll:6:3: note: possible intended match here
[01:07:35] ; Function Attrs: naked nonlazybind uwtable
[01:07:35] 
[01:07:35] ------------------------------------------
[01:07:35] 
[01:07:35] thread '[codegen] codegen/naked-functions.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3238:9
---
[01:07:35] test result: FAILED. 77 passed; 1 failed; 29 ignored; 0 measured; 0 filtered out
[01:07:35] 
[01:07:35] 
[01:07:35] 
[01:07:35] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/codegen" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "codegen" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-5.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "5.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:07:35] 
[01:07:35] 
[01:07:35] failed to run: /checkout/obj/build/756 ./src/tools/lldb/www
37080 ./obj/build/x86_64-unknown-linux-gnu/stage0-std/release
---
travis_time:end:0c6f72ba:start=1537990447136677365,finish=1537990447141035866,duration=4358501
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:017849be
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then print

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

bors · 2018-09-26T21:19:21Z

☀️ Test successful - status-travis
State: approved= try=True

varkor · 2018-09-26T21:38:02Z

@rust-timer build 5747631

rust-timer · 2018-09-26T21:38:03Z

Please provide the full 40 character commit hash.

rust-timer · 2018-09-26T22:24:16Z

Success: Queued 5747631 with parent 6846f22, comparison URL.

GabrielMajeri · 2018-09-27T08:31:30Z

Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes about 20% of the total PLT calls, there's probably still some more performance to be gained.

@nagisa

Please add a flag that controls this behaviour.

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

symbols likely come from outside the rust ecosystem

Thanks for the tip, but I'm still unable to get rid of the PLT. I've added CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but rustc still generates lots of calls which use the PLT.

If I build C binaries on my system, the final binary doesn't even have a .plt section, it is completly removed.

EDIT: it seems we need to set some module-level metadata to ensure this also works for intrinsics.

nagisa · 2018-09-27T09:26:07Z

it is fine to have plt disabled by default, i think. it would be a good idea to design the flag so that it would make sense regardless of the default. so something like `-Zplt=on`, `-Zplt=off` instead of -Zno-plt. i recommend looking at -Zmutable-noalias for an implementation example.

…

On Thu, Sep 27, 2018, 11:32 Gabriel Majeri ***@***.***> wrote: Perf results are in, nice improvements on wall time. From what I've seen, the patch currently only removes *about 20%* of the total PLT calls, there's probably still some more performance to be gained. @nagisa <https://github.com/nagisa> Please add a flag that controls this behaviour. Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default? Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)? symbols likely come from outside the rust ecosystem Thanks for the tip, but I'm still unable to get rid of the PLT. I've added CFLAGS=-fno-plt to my system, and rebuilt the compiler from source, but rustc still generates *lots* of calls which use the PLT. If I build C binaries on my system, the final binary doesn't even have a .plt section, it is completly removed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54592 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0m5bSn_cJdwPG4TpD2HeVS__i6glks5ufI0AgaJpZM4W7FzR> .

cuviper · 2018-09-27T18:18:20Z

Alright, I've implemented a -Z no-plt flag, but should this option be enabled or disabled by default?

I think the way you've documented it is fine, "(default: PLT is disabled if full relro is enabled)".

Also, what should we do if the user specified -Z no-plt, but the flag is then ignored, because full relro isn't supported (for example, as cuviper mentioned, linker issues on PowerPC64)?

Don't ignore it. These are advanced options -- if the user asks for plt=off without full relro, let them deal with the implications. So the check is something like plt.unwrap_or(relro != Full).

GabrielMajeri · 2018-10-11T03:56:02Z

The way I see it, -Z plt=off is an optimization, we don't guarantee it does anything (it's a best effort kind of thing). For now, I changed the code to unconditionally disable the optimization on gnux32 and always enable the PLT on that target, at least until LLVM gets fixed.

nagisa · 2018-10-11T04:47:39Z

This should still be a part of target specification because there are targwta defined outside of the compiler and we do t test them all.

…

On Thu, Oct 11, 2018, 06:57 Gabriel Majeri ***@***.***> wrote: The way I see it, -Z plt=off is an optimization, we don't guarantee it does anything (it's a best effort kind of thing). For now, I changed the code to unconditionally disable the optimization on gnux32, at least until LLVM gets fixed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54592 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0qa2Eaq3zyK0cBG93zjHz_aFFJelks5ujsGYgaJpZM4W7FzR> .

GabrielMajeri · 2018-10-11T05:38:32Z

@nagisa As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue.

The code checks for the (custom) target's llvm-target attribute to see if it contains gnux32. This is as far as we know the only ABI where LLVM currently has an issue (due to a bug).

For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes).

nagisa · 2018-10-11T09:30:35Z

I'm arguing that the author of external target specifications should have a full control over such matters. While gnux32 is the only target we know that crashes, our triplet coverage is incomplete and we can't say for sure that any future triplets llvm may support will be problem free either. In the past platform specific settings and workarounds were always stored within the target files and I see no reason whatsoever for us to change that now. And it is strictly more flexible than checking whether llvm triple string ends in a specific substring. I also don't see much of an issue with allowing people to force arbitrary configurations (i.e. allowing them to specify -Zplt=off) even if we know they might not be working, ignored, irrelevant etc. At worst they'll encounter an ICE. In fact, if they do explicitly request for some behaviour, I believe that should override any workaround or default (which is not what happens with the current iteration of the PR)

…

On Thu, Oct 11, 2018, 08:39 Gabriel Majeri ***@***.***> wrote: @nagisa <https://github.com/nagisa> As far as I understand, even if somebody defines some external target with this ABI, there shouldn't be an issue. The code checks for the (custom) target's llvm-target attribute to see if it contains gnux32. This is as far as we know the only ABI where LLVM currently has an issue (due to a bug). For example, Clang accepts this option for all targets and ABIs (even Windows). On targets where it doesn't do anything, it emits the attributes, and LLVM just ignores them (except for this buggy ABI which crashes). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#54592 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0mkkFmjCAO_7YT5ecCUdrWQai3liks5ujtmQgaJpZM4W7FzR> .

pnkfelix · 2018-10-11T14:59:45Z

(@nagisa said at T-compiler meeting that we can un-nominate this)

Disable the PLT where possible to improve performance for indirect calls into shared libraries. This optimization is enabled by default where possible. - Add the `NonLazyBind` attribute to `rustllvm`: This attribute informs LLVM to skip PLT calls in codegen. - Disable PLT unconditionally: Apply the `NonLazyBind` attribute on every function. - Only enable no-plt when full relro is enabled: Ensures we only enable it when we have linker support. - Add `-Z plt` as a compiler option

GabrielMajeri · 2018-10-11T18:13:21Z

@nagisa ok, I've added a needs_plt target option which can be customized for each target. It is used to help determine a default for the PLT option (and -Z plt always overrides the setting).

nagisa · 2018-10-11T18:24:07Z

Perfect. Thanks!

@bors r+

bors · 2018-10-11T18:24:08Z

📌 Commit 6009da0 has been approved by nagisa

bors · 2018-10-11T19:38:22Z

⌛ Testing commit 6009da0 with merge 77af314...

Support for disabling PLT for better function call performance This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection. AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](#43170), lazy binding was disabled anyway. This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds). Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang. ## Performance I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs): ``` name control ns/iter no-plt ns/iter diff ns/iter diff % speedup build_app_long 11,097 10,733 -364 -3.28% x 1.03 build_app_short 11,089 10,742 -347 -3.13% x 1.03 build_help_long 186,835 182,713 -4,122 -2.21% x 1.02 build_help_short 80,949 78,455 -2,494 -3.08% x 1.03 parse_clean 12,385 12,044 -341 -2.75% x 1.03 parse_complex 19,438 19,017 -421 -2.17% x 1.02 parse_lots 431,493 421,421 -10,072 -2.33% x 1.02 ``` A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%. ## Security benefits **Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723). ## Remaining PLT calls The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.

bors · 2018-10-11T22:21:51Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing 77af314 to master...

alexcrichton · 2018-10-23T14:39:58Z

FWIW this looks like it may cause bugs in LLVM on i686-unknown-linux-gnu. I noticed that stdsimd's CI was failing for i686-unknown-linux-gnu because rustc was segfaulting. Some local investigation showed a segfault in LLVM. We compile that target with -C relocation-model=static on CI, and I believe the combination of 32-bit Linux with -C relocation-model=static was causing the issue. I haven't had a chance to dig deeper. I've worked around it with -Z plt=yes

eddyb · 2019-06-05T10:51:37Z

FWIW, this broke pretty badly on certain (older) distro toolchains, but because dylib is largely unused, it took someone (ab)using proc_macro::bridge to run proc macros outside of rustc to trigger it: #61539.

MaskRay · 2023-01-02T21:07:07Z

I think -Z plt=no is not a good default. Sent #106380 to disable it. (Thanks to @GabrielMajeri for mentioning this review as I cannot find it with the commits).

If you see positive benchmark results, it is likely because dynamically linked libc calls dominate. If one statically links libc, -Z plt=no is going to be a pessimization. For many benchmarks where cross-translation-unit functions calls resolve to the same component, -Z plt=no is going to be a pessimization.

Plus, x86-32 requires very new lld (as a maintainer, I just added support for ___tls_get_addr; older lld will create a silently corrupted executable).
For GNU ld, a relatively new one is needed: 2016-06 https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=e2cbcd9156d1606a9f2153aecd93a89fe6e29180 (and a counterpart for x86-32)

rust-highfive assigned nikomatsakis Sep 26, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 26, 2018

GabrielMajeri changed the title ~~Support for disabling PLT for better function call performance~~ [WIP] Support for disabling PLT for better function call performance Sep 26, 2018

nikomatsakis added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 26, 2018

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 26, 2018

nikomatsakis added the I-nominated label Sep 26, 2018

GabrielMajeri mentioned this pull request Sep 27, 2018

Add support for -fno-plt rust-lang/cc-rs#351

Merged

pnkfelix removed the I-nominated label Oct 11, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 11, 2018

bors merged commit 6009da0 into rust-lang:master Oct 11, 2018

bors mentioned this pull request Oct 11, 2018

Cleanup rustc/session #54963

Merged

GabrielMajeri deleted the no-plt branch October 12, 2018 03:34

rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 14, 2018

nikic mentioned this pull request Dec 15, 2018

crossbeam_epoch::pin perf regression #55317

Closed

This was referenced Jun 5, 2019

Panic inside panic when procedural macro is called with proc_macro::bridge::client #60593

Closed

1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28 or older. #61539

Closed

eddyb mentioned this pull request Dec 9, 2019

For dylib crates, warn about GNU ld <=2.28 #66839

Closed

GabrielMajeri mentioned this pull request Jan 2, 2023

Default to -Z plt=yes #106380

Closed

MaskRay mentioned this pull request Jan 2, 2023

fix _mm_castsi128_pd and _mm_castpd_si128 impls rust-lang/stdarch#581

Merged

pnkfelix mentioned this pull request Jan 26, 2023

Switch PLT default to "yes" for all targets except x86_64. rust-lang/compiler-team#581

Closed

3 tasks

Support for disabling PLT for better function call performance #54592

Support for disabling PLT for better function call performance #54592

Uh oh!

Conversation

GabrielMajeri commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Security benefits

Remaining PLT calls

Uh oh!

rust-highfive commented Sep 26, 2018

Uh oh!

nikomatsakis commented Sep 26, 2018

Uh oh!

nikomatsakis commented Sep 26, 2018

Uh oh!

rfcbot commented Sep 26, 2018 • edited by Zoxc Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikomatsakis commented Sep 26, 2018

Uh oh!

eddyb commented Sep 26, 2018

Uh oh!

cuviper commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Sep 26, 2018

Uh oh!

GabrielMajeri commented Sep 26, 2018

Uh oh!

nagisa commented Sep 26, 2018

Uh oh!

nagisa commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varkor commented Sep 26, 2018

Uh oh!

bors commented Sep 26, 2018

Uh oh!

rust-highfive commented Sep 26, 2018

Uh oh!

bors commented Sep 26, 2018

Uh oh!

varkor commented Sep 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-timer commented Sep 26, 2018

Uh oh!

rust-timer commented Sep 26, 2018

Uh oh!

GabrielMajeri commented Sep 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nagisa commented Sep 27, 2018 via email

Uh oh!

cuviper commented Sep 27, 2018

Uh oh!

GabrielMajeri commented Oct 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nagisa commented Oct 11, 2018 via email

Uh oh!

GabrielMajeri commented Oct 11, 2018

Uh oh!

nagisa commented Oct 11, 2018 via email

Uh oh!

pnkfelix commented Oct 11, 2018

Uh oh!

GabrielMajeri commented Oct 11, 2018

Uh oh!

nagisa commented Oct 11, 2018

Uh oh!

bors commented Oct 11, 2018

Uh oh!

bors commented Oct 11, 2018

Uh oh!

bors commented Oct 11, 2018

Uh oh!

alexcrichton commented Oct 23, 2018

Uh oh!

GabrielMajeri commented Sep 26, 2018 •

edited

Loading

rfcbot commented Sep 26, 2018 •

edited by Zoxc

Loading

cuviper commented Sep 26, 2018 •

edited

Loading

nagisa commented Sep 26, 2018 •

edited

Loading

varkor commented Sep 26, 2018 •

edited

Loading

GabrielMajeri commented Sep 27, 2018 •

edited

Loading

GabrielMajeri commented Oct 11, 2018 •

edited

Loading

MaskRay commented Jan 2, 2023 •

edited

Loading