println!() prevents optimization by capturing pointers #50519

df5602 · 2018-05-07T20:22:09Z

This weekend I ran some benchmarks on some of my code. After making a seemingly insignificant code change I noticed a small, but measurable performance regression. After investigating the generated assembly, I stumbled upon a case, where the compiler emits code that is not optimal.

This minimal example shows the same behaviour (Playground link):

extern crate rand;

use std::f32;
use rand::Rng;

fn main() {
    let mut list = [0.0; 16];
    let mut rg = rand::thread_rng();

    // Random initialization to prevent the compiler from optimizing the whole example away
    for i in 0..list.len() {
        list[i] = rg.gen_range(0.0, 0.1);
    }

    let mut lowest = f32::INFINITY;

    for i in 0..list.len() {
        lowest = if list[i] < lowest {    // <<<<<<<<<<<<<<<
            list[i]
        } else {
            lowest
        };
    }

    println!("{}", lowest);
}

When compiling with the --release flag, the compiler generates the following instructions for the marked block:

...
minss	%xmm0, %xmm1
movss	88(%rsp), %xmm0
minss	%xmm1, %xmm0
movss	92(%rsp), %xmm1
...

However, if I replace those lines with the following:

if list[i] < lowest {
    lowest = list[i];
}

the compiler emits a strange series of float compare and jump instructions:

.LBB5_38:
	movss	92(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	ja	.LBB5_39
...
.LBB5_42:
	movss	100(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	ja	.LBB5_43
...
.LBB5_39:
	movss	%xmm1, 12(%rsp)
	movaps	%xmm1, %xmm0
	movss	96(%rsp), %xmm1
	ucomiss	%xmm1, %xmm0
	jbe	.LBB5_42

As a comparison, both gcc and clang can optimize a similar C++ example:

#include <stdlib.h>
#include <iostream>

using namespace std;

int main() {
    float list[16];
    for(size_t i = 0; i < 16; ++i) {
        list[i] = rand();
    }

    float lowest = 1000.0f;

    for (size_t i = 0; i < 16; ++i) {
        
        /* Variant A: */
        //lowest = list[i] < lowest ? list[i] : lowest;

        /* Variant B: */
        if (list[i] < lowest) {
            lowest = list[i];
        }
    }

    cout << lowest;
}

Both compilers generate minss instructions for both variants.
(Godbolt)

I wasn't sure whether rustc or LLVM were responsible for this behaviour, however after a quick glance at the generated LLVM IR, I'm tending towards rustc, since in the first case it emits fcmp and select instructions, while in the latter it generates fcmp and br.

What do you think?

The text was updated successfully, but these errors were encountered:

nikic · 2018-12-15T12:27:58Z

In this reduced example, minss is generated for both cases since rust 1.25: https://godbolt.org/z/wz8Kmk

Replacing the return with println brings the problem back.

nikic · 2018-12-15T12:45:33Z

Okay, the relevant difference that println! introduces is that it takes the address of lowest. If you use println!("{}", {lowest}) instead, the issue goes away.

Taking the address prevents the conversion of lowest from an alloca into an SSA value, and that's going to inhibit lots of optimizations (including the select formation desired here).

The good news is that this is probably not going to affect real code much, though I am concerned about cases where you have conditional debugging code that includes formatting.

Two ways this could be fixed:

On the LLVM side: Perform calculation on SSA values and only write it back into the alloca slot when the address is taken. Might not always be profitable and probably isn't even possible with the information LLVM has (e.g. it does not know that we won't modify memory through the pointer).
On the rust side: Avoid taking the reference, at least in cases where the value is small.

nikic · 2018-12-23T12:41:24Z

@rkruppe @nagisa @eddyb Any idea what we can do here? I think it's pretty bad that println!() (or anything else using formatting, such as logging) breaks unrelated optimizations by capturing pointers. It's enough that any part of a structure is formatted to break optimization on all of it, as LLVM doesn't know which part of the structure we're actually going to use use, it only sees that some pointer into it is stored.

It would be great if we could force a copy of the formatted value before taking the pointer, but I'm not sure how to do that on a technical level. We'd only want to do this for specific types (integers and floats), but println! is expanded long before this type information is available.

hanna-kruppe · 2018-12-23T12:59:23Z

If changing how println etc. are expanded is on the table, maybe this is one more reason to consider a more performance-oriented expansion that generates direct inlineable calls, as outlined by @seanmonstar here. If for example the <f32 as Debug>::fmt calls are inlined (at least enough to not need to pass a reference), would that solve this issue?

nagisa · 2018-12-23T13:05:25Z

It would be great if we could force a copy of the formatted value before taking the pointer, but I'm not sure how to do that on a technical level. We'd only want to do this for specific types (integers and floats), but println! is expanded long before this type information is available.

The formatting machinery has been specifically crafted to minimize the size rather than increase the speed (desired for panics), which will eventually come at some cost somewhere, which is what we are seeing here. If we can find ways to improve println! expansion to do better without increasing sacrificing size, that would be great...

elichai · 2020-08-09T07:41:52Z

Any news here? I'm getting similar things where passing expressions to println! prevents optimizations, while assigning to a variable first somewhat helps.
https://godbolt.org/z/v73bG7
Did anyone experiment with inlining the _print functions? does it really bloat real world code? does it even helps with the optimization problems?

tesuji · 2020-08-09T13:31:52Z

I don't think inlining _print function help as that function depends on print_to:

rust/library/std/src/io/stdio.rs

Lines 997 to 999 in 8e73853

    
           pub fn _print(args: fmt::Arguments<'_>) { 
        
               print_to(args, &LOCAL_STDOUT, stdout, "stdout"); 
        
           }

elichai · 2020-08-09T13:38:03Z

Sorry, I meant inlining all the way down the call graph.

oxalica · 2021-04-13T14:05:38Z

Ran into the same issue.

In the code below, println prevents a hot variable becoming a register and also prevents SIMD optimization.
The referenced variable count is stored in stack and being written every time when modified.
The let count = count; work-around makes this program about 7% faster (142ms -> 132ms).

Link: https://godbolt.org/z/4Yxbf1dox

const N: usize = 20_000_000;

pub fn sieve(vis: &mut [bool; N + 1]) {
    let mut i = 2;
    let mut count = 0;
    while i * i <= N {
        if vis[i] {
            count += 1; // This writes to stack every time.
            let mut k = 2 * i;
            while k <= N {
                vis[k] = true;
                k += i;
            }
        }
        i += 1;
    }

    // This loop can be optimized into SIMD when using walk-around below.
    while i <= N {
        if vis[i] {
            count += 1;  // This writes to stack every time.
        }
        i += 1;
    }

    // let count = count; // Uncomment this line to walk-around this issue.
    println!("{}", count);
}

elichai · 2021-04-13T14:09:19Z

(FWIW even doing println!("{}", (||count)()); is enough to allow the compiler to optimize correctly)

nikic · 2021-04-13T20:10:45Z

@oxalica Interesting case! I would have expected scalar promotion in LICM to sink the store outside the loop for that example. Unfortunately it ends up running afoul of this thread safety check: https://github.com/llvm/llvm-project/blob/64c24f493e5f4637ee193f10f469cdd2695b4ba6/llvm/lib/Transforms/Scalar/LICM.cpp#L2196-L2198 Because you only store count conditionally, there is no guaranteed write on each iteration, and because printf captures it, the pointer may leak to a different thread.

nikic · 2021-04-13T20:34:55Z

I think scalar store promotion candidates are sufficiently rare, and the resulting optimization sufficiently valuable, that doing a loop header reachability capture check is probably worth the compile-time cost in this case.

nikic · 2021-04-17T15:42:08Z

https://reviews.llvm.org/D100706 for what I mentioned above. This should help any cases that involve a println after a loop.

nikic · 2021-04-19T20:14:57Z

Upstream commit: llvm/llvm-project@d440f9a

sanxiyn added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 19, 2018

XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 2, 2018

nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Dec 15, 2018

nikic changed the title ~~Missed optimization: Compiler sometimes emits float compare + jump instead of MINSS/MAXSS~~ println!() prevents optimization by capturing pointers Dec 23, 2018

nikic mentioned this issue Dec 23, 2018

Unnecessary memcpy when returning a struct #57077

Closed

nikic mentioned this issue Feb 18, 2022

Compiler doesn't optimise away memory allocation. #94109

Closed

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

println!() prevents optimization by capturing pointers #50519

println!() prevents optimization by capturing pointers #50519

df5602 commented May 7, 2018

nikic commented Dec 15, 2018

Uh oh!

nikic commented Dec 15, 2018 •

edited

Loading

Uh oh!

nikic commented Dec 23, 2018

Uh oh!

hanna-kruppe commented Dec 23, 2018

Uh oh!

nagisa commented Dec 23, 2018

Uh oh!

elichai commented Aug 9, 2020 •

edited

Loading

Uh oh!

tesuji commented Aug 9, 2020

Uh oh!

elichai commented Aug 9, 2020

Uh oh!

oxalica commented Apr 13, 2021 •

edited

Loading

Uh oh!

elichai commented Apr 13, 2021

Uh oh!

nikic commented Apr 13, 2021

Uh oh!

nikic commented Apr 13, 2021

Uh oh!

nikic commented Apr 17, 2021

Uh oh!

nikic commented Apr 19, 2021

Uh oh!

println!() prevents optimization by capturing pointers #50519

println!() prevents optimization by capturing pointers #50519

Comments

df5602 commented May 7, 2018

nikic commented Dec 15, 2018

Uh oh!

nikic commented Dec 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Dec 23, 2018

Uh oh!

hanna-kruppe commented Dec 23, 2018

Uh oh!

nagisa commented Dec 23, 2018

Uh oh!

elichai commented Aug 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tesuji commented Aug 9, 2020

Uh oh!

elichai commented Aug 9, 2020

Uh oh!

oxalica commented Apr 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elichai commented Apr 13, 2021

Uh oh!

nikic commented Apr 13, 2021

Uh oh!

nikic commented Apr 13, 2021

Uh oh!

nikic commented Apr 17, 2021

Uh oh!

nikic commented Apr 19, 2021

Uh oh!

nikic commented Dec 15, 2018 •

edited

Loading

elichai commented Aug 9, 2020 •

edited

Loading

oxalica commented Apr 13, 2021 •

edited

Loading