Performance audit, Spring 2017 #41410

arielb1 · 2017-04-20T00:21:35Z

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.

rust-highfive · 2017-04-20T00:21:46Z

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

KalitaAlexey · 2017-04-20T08:08:53Z

@arielb1,
Are you going to collect data to show improvement?

arielb1 · 2017-04-20T08:51:43Z

@KalitaAlexey

I'm doing measurements locally, but after this is done it'll show up on perf.rust-lang.org (or the mirror https://perf-rlo.herokuapp.com).

This improves LLVM performance by 10% lost during the shimmir transition.

this improves typeck & trans performance by 1%. This looked hotter on callgrind than it is on a CPU.

michaelwoerister · 2017-04-20T15:23:02Z

this avoids parsing item attributes on each call to item_attrs, which takes
off 33% (!) of translation time and 50% (!) of trans-item collection time.

WOW! :)

nikomatsakis · 2017-04-20T16:08:29Z

src/librustc/middle/mem_categorization.rs

-            (self.tcx().mk_region(ty::ReStatic),
-             self.tcx().mk_region(ty::ReStatic))
+            (self.tcx().types.re_static,
+             self.tcx().types.re_static)


I just noticed that we don't have these constants pre-interned. This is going to conflict like hell with one of my in-progress branches, but oh well. =)

nikomatsakis

r=me on the stuff so far

eddyb · 2017-04-20T16:50:08Z

src/librustc/ty/fold.rs

@@ -573,7 +589,7 @@ pub fn shift_regions<'a, 'gcx, 'tcx, T>(tcx: TyCtxt<'a, 'gcx, 'tcx>,
           value, amount);

    value.fold_with(&mut RegionFolder::new(tcx, &mut false, &mut |region, _current_depth| {
-        tcx.mk_region(shift_region(*region, amount))


Oh wow. By-value shift_region may have been used by elision or something. Can we just kill it?

eddyb · 2017-04-20T16:51:07Z

src/librustc/ty/layout.rs

@@ -1942,6 +1940,6 @@ impl<'a, 'tcx> TyLayout<'tcx> {
    }

    pub fn field<C: LayoutTyper<'tcx>>(&self, cx: C, i: usize) -> C::TyLayout {
-        cx.layout_of(self.field_type(cx, i))
+        cx.layout_of(cx.normalize_associated_type(self.field_type(cx, i)))


Why do you need to do it here, when layout_of does it at the start?

now it doesn't. types in trans are always normalized.

eddyb · 2017-04-20T16:55:37Z

src/librustc_trans/context.rs

+        if let Some(&layout) = self.tcx().layout_cache.borrow().get(&ty) {
+            return TyLayout { ty: ty, layout: layout, variant_index: None };
+        }
+
        self.tcx().infer_ctxt((), traits::Reveal::All).enter(|infcx| {


Is creating the infer_ctxt costly?

Can you (trans-)normalize before checking the cache above? Would that solve the problem?

I'd still like to get rid of the normalize_associated_type method - can't field_of rely on layout_of normalizing before checking the cache? if !ty.has_projection_types() { is really fast, right?

Except when there are projection types (nested binders) etc. This method is hot.

Is has_projection_types anything other than a flag check? I'm not sure I understand what's going on. Can the cache be hit with the unnormalized type if has_projection_types returns true?

eddyb · 2017-04-20T18:01:11Z

src/librustc/ty/mod.rs

-                ref item => bug!("trait_impl_polarity: {:?} not an impl", item)
-            }
-        } else {
-            self.sess.cstore.impl_polarity(id)


Can you remove the CrateStore method? They keep piling up.

I'm not sure this commit does what it's supposed to.

alexcrichton · 2017-04-20T18:54:58Z

src/liballoc/rc.rs

@@ -438,6 +437,38 @@ impl Rc<str> {
    }
 }

+impl<T> Rc<[T]> {


Out of curiosity, for the purposes of the compiler, does Rc<[T]> provide a measurable improvement over Rc<Vec<T>>?

This may also be more easily implementable by consuming Vec<T> as you've got to copy data anyway. With Vec<T> the box_free also doesn't need to be exposed as you can just .set_len(0) to drop all the elements.

Didn't bother checking. But Rc<Vec<T>> is too ugly to me.

arielb1 · 2017-04-20T19:16:28Z

I think this is enough for one PR.

That method is *incredibly* hot, so this ends up saving 10% of trans time. BTW, we really should be doing dependency tracking there - and possibly be taking the respective perf hit (got to find a way to make DTMs fast), but `layout_cache` is a non-dep-tracking map.

eddyb · 2017-04-20T20:28:20Z

@bors r=nikomatsakis,eddyb

bors · 2017-04-20T20:28:21Z

📌 Commit f964da5 has been approved by nikomatsakis,eddyb

…nikomatsakis,eddyb Performance audit, Spring 2017 Fix up some quite important performance "surprises" I've found running callgrind on rustc. This really should land in 1.18.

eddyb · 2017-04-20T21:25:46Z

@bors r- Build failed.

eddyb · 2017-04-20T21:27:23Z

src/liballoc/heap.rs

 #[inline]
-unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
+pub(crate) unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
    let size = size_of_val(&*ptr);
    let align = min_align_of_val(&*ptr);


These two functions are not imported during testing (so this fails to compile then).

this avoids parsing item attributes on each call to `item_attrs`, which takes off 33% (!) of translation time and 50% (!) of trans-item collection time.

improves trans performance by *another* 10%.

this improves trans performance by *another* 10%.

this is another one of these things that looks *much* worse on valgrind.

arielb1 · 2017-04-20T23:14:20Z

@bors r=eddyb

bors · 2017-04-20T23:14:21Z

📌 Commit dae49f1 has been approved by eddyb

…eddyb Performance audit, Spring 2017 Fix up some quite important performance "surprises" I've found running callgrind on rustc. This really should land in 1.18.

arielb1 · 2017-04-22T12:10:23Z

So now I solved the "specialization caching" problem for real.

In some cases (e.g. <[int-var] as Add<[int-var]>>), selection can turn up a large number of candidates. Bailing out early avoids O(n^2) performance. This improves item-type checking time by quite a bit, resulting in ~2% of total time-to-typeck.

eddyb · 2017-04-22T12:20:48Z

@bors r=nikomatsakis,eddyb

bors · 2017-04-22T12:20:49Z

📌 Commit 1b207ca has been approved by nikomatsakis,eddyb

bors · 2017-04-22T16:46:58Z

🔒 Merge conflict

bors · 2017-04-22T16:49:28Z

☔ The latest upstream changes (presumably #41464) made this pull request unmergeable. Please resolve the merge conflicts.

arielb1 · 2017-04-22T18:03:28Z

Moving the branch over to arielb1/rust.

rust-highfive assigned nikomatsakis Apr 20, 2017

arielb1 changed the title ~~Performance audit, Spring 2017~~ [WIP] Performance audit, Spring 2017 Apr 20, 2017

Ariel Ben-Yehuda added 2 commits April 20, 2017 15:14

remove cleanup branches to the resume block

64c6978

This improves LLVM performance by 10% lost during the shimmir transition.

avoid calling mk_region unnecessarily

02ad572

this improves typeck & trans performance by 1%. This looked hotter on callgrind than it is on a CPU.

arielb1 force-pushed the rustc-spring-cleaning branch from ce14cf8 to c8fe505 Compare April 20, 2017 12:14

arielb1 force-pushed the rustc-spring-cleaning branch from c8fe505 to 71d3270 Compare April 20, 2017 15:54

nikomatsakis reviewed Apr 20, 2017

View reviewed changes

nikomatsakis approved these changes Apr 20, 2017

View reviewed changes

eddyb reviewed Apr 20, 2017

View reviewed changes

alexcrichton reviewed Apr 20, 2017

View reviewed changes

arielb1 force-pushed the rustc-spring-cleaning branch from 82092ca to c357feb Compare April 20, 2017 19:15

arielb1 changed the title ~~[WIP] Performance audit, Spring 2017~~ Performance audit, Spring 2017 Apr 20, 2017

arielb1 force-pushed the rustc-spring-cleaning branch from c357feb to f964da5 Compare April 20, 2017 19:56

frewsxcv mentioned this pull request Apr 20, 2017

Rollup of 4 pull requests #41428

Closed

eddyb reviewed Apr 20, 2017

View reviewed changes

Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13

cache attributes of items from foreign crates

22328a5

this avoids parsing item attributes on each call to `item_attrs`, which takes off 33% (!) of translation time and 50% (!) of trans-item collection time.

weak_lang_items: check for lang attribute before calling value_str

7aaf841

improves trans performance by *another* 10%.

Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13

allocate less strings in symbol_names

32ca8c5

this improves trans performance by *another* 10%.

fix specialization caching

dae49f1

this is another one of these things that looks *much* worse on valgrind.

arielb1 force-pushed the rustc-spring-cleaning branch from f964da5 to dae49f1 Compare April 20, 2017 23:14

TimNN mentioned this pull request Apr 21, 2017

Rollup of 8 pull requests #41440

Closed

shepmaster added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 21, 2017

arielb1 force-pushed the rustc-spring-cleaning branch from 9cab5fd to 461bee5 Compare April 22, 2017 12:10

arielb1 force-pushed the rustc-spring-cleaning branch from 461bee5 to 1b207ca Compare April 22, 2017 12:15

arielb1 closed this Apr 22, 2017

arielb1 mentioned this pull request Apr 22, 2017

Performance audit, Spring 2017 #41469

Merged

arielb1 deleted the rustc-spring-cleaning branch April 22, 2017 18:06

Performance audit, Spring 2017 #41410

Performance audit, Spring 2017 #41410

Uh oh!

Conversation

arielb1 commented Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Apr 20, 2017

Uh oh!

KalitaAlexey commented Apr 20, 2017

Uh oh!

arielb1 commented Apr 20, 2017

Uh oh!

michaelwoerister commented Apr 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikomatsakis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arielb1 Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eddyb Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arielb1 Apr 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arielb1 commented Apr 20, 2017

Uh oh!

eddyb commented Apr 20, 2017

Uh oh!

bors commented Apr 20, 2017

Uh oh!

eddyb commented Apr 20, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arielb1 commented Apr 20, 2017

Uh oh!

bors commented Apr 20, 2017

Uh oh!

arielb1 commented Apr 22, 2017

Uh oh!

eddyb commented Apr 22, 2017

Uh oh!

bors commented Apr 22, 2017

Uh oh!

bors commented Apr 22, 2017

Uh oh!

bors commented Apr 22, 2017

Uh oh!

arielb1 commented Apr 22, 2017

Uh oh!

arielb1 commented Apr 20, 2017 •

edited

Loading

arielb1 Apr 20, 2017 •

edited

Loading

eddyb Apr 20, 2017 •

edited

Loading

arielb1 Apr 20, 2017 •

edited

Loading