Slower binaries since some days #42935

leonardo-m · 2017-06-27T17:34:36Z

This could have the same causes of Issue 42808 (i.e. regression due to #42313), or perhaps not.

This small test program solves the Euler problem number 214:

#![feature(step_by)]

fn main() {
    fn e214() -> u64 {
        const N_MAX: usize = 40_000_000;
        const SIZE: usize = N_MAX / 2;
        const LEN: u8 = 25;

        let mut chain_length = vec![2u8; SIZE];
        let mut tot: u64 = 0;

        for i in 1 .. SIZE {
            // if not prime, then try next one
            if chain_length[i] > 2 { continue; }

            // p is prime
            let p = 2 * i + 1;

            // calculate q*2^k = p-1
            let mut q = i;
            let mut k = 1;
            while (q & 1) == 0 {
                q /= 2;
                k += 1;
            }

            // calculate cl = chain_length(p)
            let cl = chain_length[q / 2] + k;

            // add p to tot if the length matches.
            if cl == LEN { tot += p as u64; }

            // loop over powers of p.
            let mut pp = p as u64;
            while pp < N_MAX as u64 {
                // Loop over multiples of pp.
                for j in ((pp / 2) .. SIZE as u64).step_by(pp) {
                    chain_length[j as usize] += cl - 2;
                }
                pp *= p as u64;
            }
        }
        tot
    }
    assert_eq!(e214(), 1_677_366_278_943);
}

On my PC using nightly-2017-06-18-x86_64-pc-windows-gnu it runs in about 0.39 seconds, while using a more recent Nightly, like the current rustc 1.19.0-nightly (78d8416 2017-06-17), it runs in about 0.87 seconds.

The text was updated successfully, but these errors were encountered:

leonardo-m · 2017-06-27T18:12:54Z

I compile with:
rustc -C opt-level=3 -C target-cpu=native -C panic=abort

leonardo-m · 2017-06-27T19:29:30Z

Old asm compared to the new asm:
https://gist.github.com/anonymous/c4708600946f04b87c8b9586b15e3017

(Moved to a gist after suggestion by @pnkfelix).

pnkfelix · 2017-06-27T19:32:39Z

I also observe a time slowdown on when ~~compiling~~ executing this benchmark on a Linux system (using the same flags -C opt-level=3 -C target-cpu=native -C panic=abort as documented in comments above), comparing e00c040 to 1143eb2

`e00c040`	`1143eb2`
real 0m0.265s	real 0m0.690s

(I note this largely to add the data point that this is not just a windows issue.)

pnkfelix · 2017-06-27T19:33:54Z

@leonardo-m it might be good to move those asm dumps to a gist; they're pretty long to scroll through on the thread...

pnkfelix · 2017-06-27T19:36:43Z

(Seems obvious from the asm dumps that one piece of low hanging fruit would be to add some inlining directives to Layout::from_size_align, I think...)

leonardo-m · 2017-06-27T19:39:19Z

I have few other similar small programs that got slower in this timeframe (but their slowdown is smaller. On average I've seen a 10%+ increase of run-time).

I also observe a time slowdown on when compiling this benchmark

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

pnkfelix · 2017-06-27T19:47:05Z

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

Sorry, I meant "executing this benchmark when compiled with new Allocator support." (The part I was trying to emphasize was that I duplicated the problem on a non-Windows platform)

Mark-Simulacrum · 2017-06-27T19:52:32Z

Yeah, this was picked up by perf.rlo as well. MIR end regions and then allocators slowed down compilation time in all crates, I believe.

pnkfelix · 2017-06-27T19:54:16Z

(yeah a little embarrassing to have authored both of those PRs and have to deal with the fallout now...)

pnkfelix · 2017-06-28T08:49:33Z

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

leonardo-m · 2017-06-28T08:51:20Z

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

Very nice. I'll test your changes when they are in a Nightly. And I'll report if there are other performance regressions.

pnkfelix · 2017-06-28T08:52:31Z

However, the #[inline] directives make the binary size regression for my microbenchmark on #42808 worse

pnkfelix · 2017-06-28T09:21:45Z

Also, it seems like @alexcrichton has independently added many of the same #[inline] directives in PR #42727, as he mentioned here

pnkfelix · 2017-06-28T11:46:11Z

(yes, I just built a rustc atop #42727, and it looks like it will definitely fix the execution time regression described in this issue.)

leonardo-m · 2017-07-09T09:23:42Z

Replacing the step_by feature with iterator_step_by, and using .step_by(pp as usize), now the best-of-three run-time is 0.47 seconds.

pnkfelix · 2017-07-13T17:08:08Z

Hmm indeed, seems like the running times regressed further between (734c83642 2017-07-03) and (720c596ec 2017-07-08) (or maybe I'm not adapting the benchmark correctly...)

arielb1 · 2017-08-09T21:30:42Z

1.20 is now beta

alexcrichton · 2017-08-12T20:33:03Z

Looking into this, some things I've learned:

Technically I don't think this is anything other than a nightly regresion due to the nightly feature usage. Despite this though if you use RUSTC_BOOTSTRAP=1 to test 1.19.0, 1.20.0, and 1.21.0, you can see that performance regressed in 1.20.0 and was fixed in 1.21.0.
Historically this program used Range::step_by.
The Range::step_by function was removed in Delete deprecated & unstable range-specific step_by #43012 (not in 1.20.0)
The new method this program is using is Iterator::step_by, introduced in Stabilize step_by by adding it to Iterator (issue #27741) #41439 (not in 1.20.0)
The StepBy iterator implementation calls the nth method to do iteration.
The nth implementation for ops::Range calls the add_usize method
The add_usize method calls TryFrom
On 1.20.0 the number 2 function in the profile is a TryFrom-related method
This method, which should be inlined, was inlined in improve the TryFrom implementations #43248

tl;dr this is fixed, no need for a backport, closing.

leonardo-m · 2017-08-13T12:05:04Z

Thank you, this specific program is indeed fixed (run-time is now 0.40 s), but other performance regressions are not yet fixed, so I'll open other bug reports.

alexcrichton · 2017-08-13T12:08:06Z

Yes that'd be much appreciated, thanks @leonardo-m!

leonardo-m mentioned this issue Jun 27, 2017

Binary size blowup presumably because of allocator integration #42808

Closed

Mark-Simulacrum added I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jun 27, 2017

pnkfelix self-assigned this Jun 27, 2017

brson added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. P-medium Medium priority labels Jul 13, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017

arielb1 added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Aug 9, 2017

alexcrichton closed this as completed Aug 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slower binaries since some days #42935

Slower binaries since some days #42935

leonardo-m commented Jun 27, 2017 •

edited by pnkfelix

Loading

leonardo-m commented Jun 27, 2017

leonardo-m commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017

pnkfelix commented Jun 27, 2017 •

edited

Loading

leonardo-m commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

Mark-Simulacrum commented Jun 27, 2017

pnkfelix commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 28, 2017

leonardo-m commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

leonardo-m commented Jul 9, 2017

pnkfelix commented Jul 13, 2017

arielb1 commented Aug 9, 2017

alexcrichton commented Aug 12, 2017 •

edited

Loading

leonardo-m commented Aug 13, 2017

alexcrichton commented Aug 13, 2017

Slower binaries since some days #42935

Slower binaries since some days #42935

Comments

leonardo-m commented Jun 27, 2017 • edited by pnkfelix Loading

leonardo-m commented Jun 27, 2017

leonardo-m commented Jun 27, 2017 • edited Loading

pnkfelix commented Jun 27, 2017 • edited Loading

pnkfelix commented Jun 27, 2017

pnkfelix commented Jun 27, 2017 • edited Loading

leonardo-m commented Jun 27, 2017 • edited Loading

pnkfelix commented Jun 27, 2017 • edited Loading

Mark-Simulacrum commented Jun 27, 2017

pnkfelix commented Jun 27, 2017 • edited Loading

pnkfelix commented Jun 28, 2017

leonardo-m commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

pnkfelix commented Jun 28, 2017

leonardo-m commented Jul 9, 2017

pnkfelix commented Jul 13, 2017

arielb1 commented Aug 9, 2017

alexcrichton commented Aug 12, 2017 • edited Loading

leonardo-m commented Aug 13, 2017

alexcrichton commented Aug 13, 2017

leonardo-m commented Jun 27, 2017 •

edited by pnkfelix

Loading

leonardo-m commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

leonardo-m commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

pnkfelix commented Jun 27, 2017 •

edited

Loading

alexcrichton commented Aug 12, 2017 •

edited

Loading