-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Room for optimization improvement on reference counting #13018
Comments
I think that this is partly because LLVM doesn't understand that |
The situation is improved with the parameter marked as |
With aliasing out of the way, LLVM still can't know that the refcount isn't already 0 when entering the function, so it can't remove the code. |
Maybe an intrinsic for doing a load with a range assertion would work. |
Range asserts don't help here, they're not used to optimize the icmp away.
|
The reference count can never be 0, unless we're about to drop the data completely. Using the `assume` intrinsic allows us to inform LLVM about that invariant, meaning it can avoid unnecessary drops. --- Before and after IR: https://gist.github.com/Aatch/3786d20df2edaad6a0e8 Generated from the example in rust-lang#13018 Fixes rust-lang#13018
#21418 didn't actually land :( (had to revert) |
Triage: runnable code
I'm not aware of any improvements in this area, nor am I good enough with LLVM IR to tell if things have changed. |
Adding Even with this change, the overflow check in inc_strong might also prevent the optimization. Not sure what to do about that :/ |
@jruderman that is similar to what was tried in #21418, but was found to be a serious compilation time hit (#21418 (comment)). The |
LLVM IR we produce is still as junk as was before, even with MIR. |
It might sense to attempt reapplying the assume stuff. A year and half has passed since first attempt, so LLVM might be able to handle it this time. |
I just notice this issue when I'm reading As @jruderman mentioned, we are using Should we think other solution or close the issue ? |
The current LLVM IR for the snippet in the issue description is: Click to expand LLVM IR; ModuleID = 'playground0-3a317160927575f9dc8b26750a523322.rs'
source_filename = "playground0-3a317160927575f9dc8b26750a523322.rs"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%"unwind::libunwind::_Unwind_Exception" = type { [0 x i64], i64, [0 x i64], void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [0 x i64], [6 x i64], [0 x i64] }
%"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }
; Function Attrs: uwtable
define void @foo(i64** noalias nocapture readonly dereferenceable(8) %t) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
%t.val = load i64*, i64** %t, align 8
%.val.i.i.i = load i64, i64* %t.val, align 8
%0 = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %.val.i.i.i, i64 1) #5
%1 = extractvalue { i64, i1 } %0, 1
br i1 %1, label %bb2.i.i.i, label %"_ZN61_$LT$alloc..rc..Rc$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$5clone17h127f785fcc974269E.exit"
bb2.i.i.i: ; preds = %start
tail call void @llvm.trap() #5
unreachable
"_ZN61_$LT$alloc..rc..Rc$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$5clone17h127f785fcc974269E.exit": ; preds = %start
%2 = extractvalue { i64, i1 } %0, 0
%3 = add i64 %2, -1
store i64 %3, i64* %t.val, align 1
%4 = icmp eq i64 %3, 0
br i1 %4, label %bb4.i.i.i, label %_ZN4core3mem4drop17h222f528d1fa068ceE.exit
bb4.i.i.i: ; preds = %"_ZN61_$LT$alloc..rc..Rc$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$5clone17h127f785fcc974269E.exit"
%5 = getelementptr inbounds i64, i64* %t.val, i64 1
%.val.i.i5.i.i.i = load i64, i64* %5, align 8
%6 = add i64 %.val.i.i5.i.i.i, -1
store i64 %6, i64* %5, align 1
%7 = icmp eq i64 %6, 0
br i1 %7, label %bb9.i.i.i, label %_ZN4core3mem4drop17h222f528d1fa068ceE.exit
bb9.i.i.i: ; preds = %bb4.i.i.i
%8 = bitcast i64* %t.val to i8*
tail call void @__rust_dealloc(i8* nonnull %8, i64 24, i64 8) #5
br label %_ZN4core3mem4drop17h222f528d1fa068ceE.exit
_ZN4core3mem4drop17h222f528d1fa068ceE.exit: ; preds = %"_ZN61_$LT$alloc..rc..Rc$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$5clone17h127f785fcc974269E.exit", %bb4.i.i.i, %bb9.i.i.i
ret void
}
declare i32 @rust_eh_personality(i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1
; Function Attrs: nounwind readnone speculatable
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) #2
; Function Attrs: noreturn nounwind
declare void @llvm.trap() #3
; Function Attrs: nounwind
declare void @__rust_dealloc(i8*, i64, i64) unnamed_addr #4
attributes #0 = { uwtable "probe-stack"="__rust_probestack" }
attributes #1 = { "probe-stack"="__rust_probestack" }
attributes #2 = { nounwind readnone speculatable }
attributes #3 = { noreturn nounwind }
attributes #4 = { nounwind "probe-stack"="__rust_probestack" }
attributes #5 = { nounwind } @eddyb said it's an LLVM bug and then walked off in the distance. Cc @rust-lang/wg-codegen @rust-lang/wg-compiler-performance |
Minimal example, which right now adds & subtracts instead of checking & returning the original #![crate_type = "lib"]
#[no_mangle]
pub fn assert_not_max_usize(x: usize) -> usize {
x.checked_add(1).unwrap().wrapping_sub(1)
} |
It's unclear to me that any optimizing compiler can remove the increment/decrement in this case -- we've specified that if the reference count overflows, we expect the program to abort (panic in eddyb's minimal example). LLVM cannot remove that; it would change the program's behavior. As such, it's not clear to me that this issue can ever be fixed, so I'm going to close it. |
I don't think this issue should be closed, as shown in #13018 (comment) this is not about changing the program behaviour but about making the compiler check whether the value is the maximum value instead of doing the increment-decrement dance. |
Right, but today that code snippet will panic if passed usize::max_value(); if we somehow make the compiler not do that then we've changed program behavior and fixed this bug. Maybe I'm wrong, but my understanding is that compilers aren't generally allowed to change that type of program behavior. |
The optimisation improvement here is not about making the program not panicking (that would indeed be a behaviour change and thus shouldn't be done), but making it not increment and decrement the actual reference count and instead just panic if it is |
To be precise, something in the compiler toolchain could realise that: #![crate_type = "lib"]
#[no_mangle]
pub fn assert_not_max_usize(x: usize) -> usize {
x.checked_add(1).unwrap().wrapping_sub(1)
} can become: #![crate_type = "lib"]
#[no_mangle]
pub fn assert_not_max_usize(x: usize) -> usize {
if x == usize::max_value() {
None.unwrap();
}
x
} |
Hm, okay, so looking at this more it seems that the answer is that LLVM can, we just need to help out a bit; it seems that LLVM can't quite handle the case where the abort is conditional on the add. It almost looks like LLVM doesn't see that the overflow (and hence, abort) only occurs when usize is max_value. Anyway, reopening. I think we can switch the code in liballoc to explicitly check before hand and abort; that should help with this if at least the small test cases are accurate. pub fn check(x: usize) -> usize {
if x == usize::max_value() {
abort();
}
x.checked_add(1).unwrap().wrapping_sub(1)
} produces check:
push rax
cmp rdi, -1
je .LBB0_1
mov rax, rdi
pop rcx
ret
.LBB0_1:
call std::process::abort@PLT
ud2 |
Change `Rc::inc_{weak,strong}` to better hint optimization to LLVM As discussed in #13018, `Rc::inc_strong` and `Rc::inc_weak` are changed to allow compositions of `clone` and `drop` to be better optimized. Almost entirely as in [this comment](#13018 (comment)), except that `abort` on zero is added so that a `drop(t.clone())` does not produce a zero check followed by conditional deallocation. This is different from #21418 in that it doesn't rely on `assume`, avoiding the prohibitive compilation slowdown. [Before and after IR](https://gist.github.com/hermord/266e55451b7fe0bb8caa6e35d17c86e1).
Change `Rc::inc_{weak,strong}` to better hint optimization to LLVM As discussed in #13018, `Rc::inc_strong` and `Rc::inc_weak` are changed to allow compositions of `clone` and `drop` to be better optimized. Almost entirely as in [this comment](#13018 (comment)), except that `abort` on zero is added so that a `drop(t.clone())` does not produce a zero check followed by conditional deallocation. This is different from #21418 in that it doesn't rely on `assume`, avoiding the prohibitive compilation slowdown. [Before and after IR](https://gist.github.com/hermord/266e55451b7fe0bb8caa6e35d17c86e1).
@alexcrichton considering #53080 has been merged, could this issue be closed? |
Technically this is still an issue in that the above function doesn't optimize to nothing, but practically I think it's "fixed enough", so sure! |
Fix `manual_unwrap_or` false positive changelog: [`manual_unwrap_or`]: fix false positive of `if let Option<T>` Closes rust-lang#13018
Playground
I would expect this function to be ideally optimized to a no-op:
However, when invoked with
Rc<int>
, we generate the following optimized IRI would expect that we would be able to do better.
The text was updated successfully, but these errors were encountered: