Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead atomic loads aren't optimized away #50035

Closed
alexcrichton opened this issue Apr 17, 2018 · 12 comments
Closed

Dead atomic loads aren't optimized away #50035

alexcrichton opened this issue Apr 17, 2018 · 12 comments
Labels
A-codegen Area: Code generation A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation

Comments

@alexcrichton
Copy link
Member

https://play.rust-lang.org/?gist=7555908f8176d50b10cd06a4a5dba401&version=nightly&mode=release

Given code like:

#![crate_type = "lib"]

use std::sync::atomic::*;

static A: AtomicUsize = ATOMIC_USIZE_INIT;

pub unsafe fn foo() {
    A.load(Ordering::SeqCst);
}

the generated LLVM IR is:

; ModuleID = 'playground0-99a4292f5ecb26a54a3204a5a67308ec.rs'
source_filename = "playground0-99a4292f5ecb26a54a3204a5a67308ec.rs"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@_ZN10playground1A17h5c8cc46ff3ae8084E = internal unnamed_addr constant <{ [8 x i8] }> zeroinitializer, align 8

; playground::foo
; Function Attrs: norecurse nounwind uwtable
define void @_ZN10playground3foo17h1cb1a21ad3c6e213E() unnamed_addr #0 {
start:
  %0 = load atomic i64, i64* bitcast (<{ [8 x i8] }>* @_ZN10playground1A17h5c8cc46ff3ae8084E to i64*) seq_cst, align 8
  ret void
}

attributes #0 = { norecurse nounwind uwtable "probe-stack"="__rust_probestack" }

and unfortunately our atomic load wasn't optimized away!

I believe this is a regression from stable to beta due to the way that miri defines statics (looks like LLVM doesn't recognize the bitcast) cc @oli-obk

@alexcrichton alexcrichton added A-codegen Area: Code generation regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 17, 2018
@kripken
Copy link

kripken commented Apr 17, 2018

Discussion on irc suggests the same underlying issue might cause LLVM's SimplifyLibCalls to not optimize stuff like printf of a string with no formatting into puts. E.g. on nightly this code doesn't get the printf turned into a puts:

#![crate_type = "lib"]

extern {
    fn printf(s: *const i8, ...);
}

pub unsafe fn foo() {
    printf("foo\n\0".as_ptr() as *const _);
}

(but on stable rust it works ok).

@oli-obk oli-obk added the A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) label Apr 17, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Apr 17, 2018

cc

@alexreg @eddyb

there's a comment about constants in

// FIXME: consider not copying constants through stack. (fixable by translating
which I didn't see before, might be related (since below that line there are casts which don't have that comment, but maybe should do the same optimization)

Maybe we need to do const bitcast instead of bitcast if it's a constant? Not really sure why llvm isn't able to grok a simple bitcast. The docs say bitcasts are fine: https://llvm.org/docs/LangRef.html#constant-expressions

@alexreg
Copy link
Contributor

alexreg commented Apr 18, 2018

@oli-obk Hmm, I think @eddyb will answer this much better than me, but I think this may be a quirk as to how LLVM optimisation works. Perhaps it's even a bug? I really know almost nothing about LLVM optimisation though, even though I agree it looks like it ought to work.

@eddyb
Copy link
Member

eddyb commented Apr 18, 2018

Looks like a duplicate of #48627 and both appear to be outright LLVM (optimization) bugs.
This specific instance happens to be caused by miri, but there are other ways this can be triggered.
cc @rkruppe

@hanna-kruppe
Copy link
Contributor

There is a difference to #48627 -- in this IR, the load is dead, you wouldn't even need to constant fold it to be able to remove it. It appears this issue is specific to atomics and is independent of how the global is represented. Making the load non-atomic allows DCE to remove it in all circumstances.

@hanna-kruppe
Copy link
Contributor

Come to think of it, it's quite difficult to justify removing an atomic load entirely (as opposed to e.g., merging two atomic loads from the same location). The mere act of doing an atomic load can make later (in program order) non-atomic loads happen-before stores in other threads that would have been data races without the atomic load.

@eddyb
Copy link
Member

eddyb commented Apr 19, 2018

@rkruppe Wait, why would LLVM ever have been able to remove the load then?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 19, 2018

Huh, didn't realize it was optimized out previously. Going back to older opt versions on godbolt, -dce never worked, but -O1 works when run again on the current IR (the one posted by @alexcrichton above). Which makes me think of a pass ordering issue?

As for why that is valid at all, I realize now it's because the static is internal, so LLVM can be sure there are no other accesses (indeed in the IR posted by @alexcrichton it's been promoted from global to static edit: constant by some LLVM pass). So I believe we can fix this specific regression, we can't expect such an optimization to be legal in less trivial programs where

  • the atomic is not internal
  • or its address escapes
  • or there are writes to the atomic (at least if they could be executed from a different thread)

@pnkfelix pnkfelix added WG-llvm Working group: LLVM backend code generation P-medium Medium priority I-slow Issue: Problems and improvements with respect to performance of generated code. labels Apr 19, 2018
@pnkfelix
Copy link
Member

visiting for triage. Compiler team agreed that this appears to be "at most" a performance issue, if it is an issue at all (in the sense of whether this optimization can be relied upon). Thus, P-medium.

@pietroalbini pietroalbini added this to the 1.26 milestone Apr 26, 2018
@alexcrichton
Copy link
Member Author

I'm going to categorize this into "Won't fix" under the 1.26 regressions as I think it's pretty reasonable to not worry about this and it seems unlikely we'll otherwise change codegen to fix this in time.

@alexcrichton alexcrichton removed this from the 1.26 milestone May 7, 2018
@alexcrichton alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels May 7, 2018
@XAMPPRocky XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Aug 27, 2018
@nikic nikic self-assigned this Dec 13, 2018
@nikic nikic removed their assignment May 23, 2020
@bjorn3
Copy link
Member

bjorn3 commented May 16, 2022

The linked playground compiles to

playground::foo: # @playground::foo
# %bb.0:
	retq
                                        # -- End function

with optimizations enabled since rustc 1.60.0.

@bjorn3 bjorn3 closed this as completed May 16, 2022
@nikic
Copy link
Contributor

nikic commented May 16, 2022

FWIW llvm/llvm-project@aae5f81 should make this happen more reliably. We were previously only dropping such atomic loads inside specific passes, as a side-effect of other optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-const-eval Area: Constant evaluation, covers all const contexts (static, const fn, ...) C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests