Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile-Time Performance Regression #37864

Closed
Mark-Simulacrum opened this issue Nov 18, 2016 · 17 comments
Closed

Compile-Time Performance Regression #37864

Mark-Simulacrum opened this issue Nov 18, 2016 · 17 comments
Assignees
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Mark-Simulacrum
Copy link
Member

#37660 appears to have regressed performance by ~6% on bootstrap, due to a near tripling in time for item-bodies checking (23s to 62s). I'm not sure if that was expected or not, but someone should probably investigate. Let me know if I should open a new issue about that.

See here for a comparison across all crates.

cc @nikomatsakis

@nikomatsakis
Copy link
Contributor

Hmm, yeah, not really expected.

@nikomatsakis
Copy link
Contributor

This is the diff that affected rustc_typeck crate:

https://gist.github.com/nikomatsakis/ca47ebbcd264452539074899b6d09355

Not seeing yet what might have caused such a perturbation.

@nikomatsakis nikomatsakis self-assigned this Nov 18, 2016
@nikomatsakis nikomatsakis added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 18, 2016
@michaelwoerister
Copy link
Member

@eddyb: You also had a look at that PR, any ideas?

@Mark-Simulacrum
Copy link
Member Author

Is it possible the PR increased the amount of code in librustc and hit a pathological case in some way? This shows that most crates had little-to-no difference except for librustc itself, which jumped up by ~30 seconds.

@michaelwoerister
Copy link
Member

The syntex-syntax test case also shows the regression, so there's definitely something to it.

@Mark-Simulacrum
Copy link
Member Author

It looks like I was wrong with my initial assessment about rustc being the only one to show the increase, but this graph shows that it has the largest increase by far out of most rustc crates in that pass.

@eddyb
Copy link
Member

eddyb commented Nov 19, 2016

I'd suggest comparing with callgrind: if the number of calls related to inference, for example, change, well... My suspicion is basically "impl children get checked twice" but then tests couldn't pass because errors would also be doubled? I'm not sure.

@nnethercote
Copy link
Contributor

nearest_common_ancestor is at least part of the problem, according to Cachegrind. For example, from syntex:

--------------------------------------------------------------------------------
             Ir
--------------------------------------------------------------------------------
141,932,296,063  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
6,040,510,628  /home/njn/moz/rust0/src/librustc/middle/region.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor
5,280,761,738  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:_int_malloc
4,376,166,866  /home/njn/moz/rust0/src/librustc/middle/region.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
3,449,453,667  /home/njn/moz/rust0/src/libcollections/vec.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
3,236,809,213  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:_int_free
1,979,912,076  ???:???
1,959,605,584  /build/glibc-Qz8a69/glibc-2.23/string/../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:__memcpy_avx_unaligned
1,899,088,086  /home/njn/moz/rust0/src/rt/miniz.c:tdefl_compress
1,767,561,366  /build/glibc-Qz8a69/glibc-2.23/malloc/malloc.c:malloc
1,540,380,901  /home/njn/moz/rust0/src/libstd/collections/hash/table.rs:<std::collections::hash::set::HashSet<T, S>>::insert
1,377,680,312  /home/njn/moz/rust0/src/liballoc/raw_vec.rs:rustc::middle::region::RegionMaps::nearest_common_ancestor::ancestors_of
1,144,766,653  /home/njn/moz/rust0/src/libstd/collections/hash/table.rs:<std::collections::hash::set::HashSet<T, S>>::get

The 142M instructions executed is up from 124M. I've not seen nearest_common_ancestor in any profiles prior to today.

If the problem can't be found soon I suggest reverting #37660.

@nikomatsakis
Copy link
Contributor

So I did some experimentation with a small test file: it's certainly not as simple as something in typeck happening twice, or at least if it is I didn't figure out what yet.

@nnethercote's samples suggest something about region inference, but we are not running regionck more often, as far as I can tell (nor typeck). Or at least we don't do so on a very simple test case. I will try experimenting with some bigger ones.

If the problem can't be found soon I suggest reverting #37660.

I'm not ready to revert yet. Please consult with me before considering such a thing.

@nikomatsakis
Copy link
Contributor

OK, I may have found the culprit.

@nikomatsakis
Copy link
Contributor

Fix in #37920

@nnethercote
Copy link
Contributor

I'm not ready to revert yet. Please consult with me before considering such a thing.

I wouldn't presume to revert, but my statement was intended to mark the beginning of such a consultation :)

@retep998
Copy link
Member

winapi was totally hit by this.
https://gist.github.com/Arnavion/ddcd1af4dc3393b35fa0f11c5dc3119e

@Arnavion
Copy link

And I confirmed that #37920 fixes winapi build times back to ~35s, the same that it takes with stable.

nikomatsakis added a commit to nikomatsakis/rust that referenced this issue Dec 1, 2016
The `visit_fn` code mutates its surrounding context.  Between *items*,
this was saved/restored, but between impl items it was not. This meant
that we wound up with `CallSiteScope` entries with two parents (or
more!).  As far as I can tell, this is harmless in actual type-checking,
since the regions you interact with are always from at most one of those
branches. But it can slow things down.

Before, the effect was limited, since it only applied to impl items
within an impl. After rust-lang#37660, impl items are visisted all together at
the end, and hence this could create a very messed up
hierarchy. Isolating impl item properly solves both issues.

I cannot come up with a way to unit-test this; for posterity, however,
you can observe the messed up hierarchies with a test as simple as the
following, which would create a callsite scope with two parents both
before and after

```
struct Foo {
}

impl Foo {
    fn bar(&self) -> usize {
        22
    }

    fn baz(&self) -> usize {
        22
    }
}

fn main() { }
```

Fixes rust-lang#37864.
bors added a commit that referenced this issue Dec 2, 2016
in region, treat current (and future) item-likes alike

The `visit_fn` code mutates its surrounding context.  Between *items*,
this was saved/restored, but between impl items it was not. This meant
that we wound up with `CallSiteScope` entries with two parents (or
more!).  As far as I can tell, this is harmless in actual type-checking,
since the regions you interact with are always from at most one of those
branches. But it can slow things down.

Before, the effect was limited, since it only applied to impl items
within an impl. After #37660, impl items are visisted all together at
the end, and hence this could create a very messed up
hierarchy. Isolating impl item properly solves both issues.

I cannot come up with a way to unit-test this; for posterity, however,
you can observe the messed up hierarchies with a test as simple as the
following, which would create a callsite scope with two parents both
before and after

```
struct Foo {
}

impl Foo {
    fn bar(&self) -> usize {
        22
    }

    fn baz(&self) -> usize {
        22
    }
}

fn main() { }
```

Fixes #37864.

r? @michaelwoerister

cc @pnkfelix -- can you think of a way to make a regr test?
bors added a commit that referenced this issue Dec 2, 2016
in region, treat current (and future) item-likes alike

The `visit_fn` code mutates its surrounding context.  Between *items*,
this was saved/restored, but between impl items it was not. This meant
that we wound up with `CallSiteScope` entries with two parents (or
more!).  As far as I can tell, this is harmless in actual type-checking,
since the regions you interact with are always from at most one of those
branches. But it can slow things down.

Before, the effect was limited, since it only applied to impl items
within an impl. After #37660, impl items are visisted all together at
the end, and hence this could create a very messed up
hierarchy. Isolating impl item properly solves both issues.

I cannot come up with a way to unit-test this; for posterity, however,
you can observe the messed up hierarchies with a test as simple as the
following, which would create a callsite scope with two parents both
before and after

```
struct Foo {
}

impl Foo {
    fn bar(&self) -> usize {
        22
    }

    fn baz(&self) -> usize {
        22
    }
}

fn main() { }
```

Fixes #37864.

r? @michaelwoerister

cc @pnkfelix -- can you think of a way to make a regr test?
bors added a commit that referenced this issue Dec 4, 2016
in region, treat current (and future) item-likes alike

The `visit_fn` code mutates its surrounding context.  Between *items*,
this was saved/restored, but between impl items it was not. This meant
that we wound up with `CallSiteScope` entries with two parents (or
more!).  As far as I can tell, this is harmless in actual type-checking,
since the regions you interact with are always from at most one of those
branches. But it can slow things down.

Before, the effect was limited, since it only applied to impl items
within an impl. After #37660, impl items are visisted all together at
the end, and hence this could create a very messed up
hierarchy. Isolating impl item properly solves both issues.

I cannot come up with a way to unit-test this; for posterity, however,
you can observe the messed up hierarchies with a test as simple as the
following, which would create a callsite scope with two parents both
before and after

```
struct Foo {
}

impl Foo {
    fn bar(&self) -> usize {
        22
    }

    fn baz(&self) -> usize {
        22
    }
}

fn main() { }
```

Fixes #37864.

r? @michaelwoerister

cc @pnkfelix -- can you think of a way to make a regr test?
@nnethercote
Copy link
Contributor

I remeasured and I can confirm the regression is fixed. Performance on rustc-benchmarks is basically equivalent to what it was on Nov 14, my last measurement prior to the regression.

@michaelwoerister
Copy link
Member

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants