Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower binaries since some days #42935

Closed
leonardo-m opened this issue Jun 27, 2017 · 20 comments
Closed

Slower binaries since some days #42935

leonardo-m opened this issue Jun 27, 2017 · 20 comments
Assignees
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@leonardo-m
Copy link

leonardo-m commented Jun 27, 2017

This could have the same causes of Issue 42808 (i.e. regression due to #42313), or perhaps not.

This small test program solves the Euler problem number 214:

#![feature(step_by)]

fn main() {
    fn e214() -> u64 {
        const N_MAX: usize = 40_000_000;
        const SIZE: usize = N_MAX / 2;
        const LEN: u8 = 25;

        let mut chain_length = vec![2u8; SIZE];
        let mut tot: u64 = 0;

        for i in 1 .. SIZE {
            // if not prime, then try next one
            if chain_length[i] > 2 { continue; }

            // p is prime
            let p = 2 * i + 1;

            // calculate q*2^k = p-1
            let mut q = i;
            let mut k = 1;
            while (q & 1) == 0 {
                q /= 2;
                k += 1;
            }

            // calculate cl = chain_length(p)
            let cl = chain_length[q / 2] + k;

            // add p to tot if the length matches.
            if cl == LEN { tot += p as u64; }

            // loop over powers of p.
            let mut pp = p as u64;
            while pp < N_MAX as u64 {
                // Loop over multiples of pp.
                for j in ((pp / 2) .. SIZE as u64).step_by(pp) {
                    chain_length[j as usize] += cl - 2;
                }
                pp *= p as u64;
            }
        }
        tot
    }
    assert_eq!(e214(), 1_677_366_278_943);
}

On my PC using nightly-2017-06-18-x86_64-pc-windows-gnu it runs in about 0.39 seconds, while using a more recent Nightly, like the current rustc 1.19.0-nightly (78d8416 2017-06-17), it runs in about 0.87 seconds.

@leonardo-m
Copy link
Author

I compile with:
rustc -C opt-level=3 -C target-cpu=native -C panic=abort

@Mark-Simulacrum Mark-Simulacrum added I-slow Issue: Problems and improvements with respect to performance of generated code. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jun 27, 2017
@leonardo-m
Copy link
Author

leonardo-m commented Jun 27, 2017

Old asm compared to the new asm:
https://gist.github.com/anonymous/c4708600946f04b87c8b9586b15e3017

(Moved to a gist after suggestion by @pnkfelix).

@pnkfelix
Copy link
Member

pnkfelix commented Jun 27, 2017

I also observe a time slowdown on when compiling executing this benchmark on a Linux system (using the same flags -C opt-level=3 -C target-cpu=native -C panic=abort as documented in comments above), comparing e00c040 to 1143eb2

e00c040 1143eb2
real 0m0.265s real 0m0.690s

(I note this largely to add the data point that this is not just a windows issue.)

@pnkfelix
Copy link
Member

@leonardo-m it might be good to move those asm dumps to a gist; they're pretty long to scroll through on the thread...

@pnkfelix
Copy link
Member

pnkfelix commented Jun 27, 2017

(Seems obvious from the asm dumps that one piece of low hanging fruit would be to add some inlining directives to Layout::from_size_align, I think...)

@leonardo-m
Copy link
Author

leonardo-m commented Jun 27, 2017

I have few other similar small programs that got slower in this timeframe (but their slowdown is smaller. On average I've seen a 10%+ increase of run-time).

I also observe a time slowdown on when compiling this benchmark

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

@pnkfelix
Copy link
Member

pnkfelix commented Jun 27, 2017

Yes, compilation time of my programs is also increased (perhaps because the compiler as well is slower).

Sorry, I meant "executing this benchmark when compiled with new Allocator support." (The part I was trying to emphasize was that I duplicated the problem on a non-Windows platform)

@pnkfelix pnkfelix self-assigned this Jun 27, 2017
@Mark-Simulacrum
Copy link
Member

Yeah, this was picked up by perf.rlo as well. MIR end regions and then allocators slowed down compilation time in all crates, I believe.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 27, 2017

(yeah a little embarrassing to have authored both of those PRs and have to deal with the fallout now...)

@pnkfelix
Copy link
Member

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

@leonardo-m
Copy link
Author

A collection of judiciously placed #[inline] attributes seems to completely fix the speed regression here.

Very nice. I'll test your changes when they are in a Nightly. And I'll report if there are other performance regressions.

@pnkfelix
Copy link
Member

However, the #[inline] directives make the binary size regression for my microbenchmark on #42808 worse

@pnkfelix
Copy link
Member

Also, it seems like @alexcrichton has independently added many of the same #[inline] directives in PR #42727, as he mentioned here

@pnkfelix
Copy link
Member

(yes, I just built a rustc atop #42727, and it looks like it will definitely fix the execution time regression described in this issue.)

@leonardo-m
Copy link
Author

Replacing the step_by feature with iterator_step_by, and using .step_by(pp as usize), now the best-of-three run-time is 0.47 seconds.

@brson brson added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. P-medium Medium priority labels Jul 13, 2017
@pnkfelix
Copy link
Member

Hmm indeed, seems like the running times regressed further between (734c83642 2017-07-03) and (720c596ec 2017-07-08) (or maybe I'm not adapting the benchmark correctly...)

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@arielb1
Copy link
Contributor

arielb1 commented Aug 9, 2017

1.20 is now beta

@arielb1 arielb1 added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Aug 9, 2017
@alexcrichton
Copy link
Member

alexcrichton commented Aug 12, 2017

Looking into this, some things I've learned:

tl;dr this is fixed, no need for a backport, closing.

@leonardo-m
Copy link
Author

Thank you, this specific program is indeed fixed (run-time is now 0.40 s), but other performance regressions are not yet fixed, so I'll open other bug reports.

@alexcrichton
Copy link
Member

Yes that'd be much appreciated, thanks @leonardo-m!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-medium Medium priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants