Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance audit, Spring 2017 #41410

Closed
wants to merge 8 commits into from
Closed

Performance audit, Spring 2017 #41410

wants to merge 8 commits into from

Conversation

arielb1
Copy link
Contributor

@arielb1 arielb1 commented Apr 20, 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.

@arielb1 arielb1 changed the title Performance audit, Spring 2017 [WIP] Performance audit, Spring 2017 Apr 20, 2017
@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@KalitaAlexey
Copy link
Contributor

@arielb1,
Are you going to collect data to show improvement?

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

@KalitaAlexey

I'm doing measurements locally, but after this is done it'll show up on perf.rust-lang.org (or the mirror https://perf-rlo.herokuapp.com).

Ariel Ben-Yehuda added 2 commits April 20, 2017 15:14
This improves LLVM performance by 10% lost during the shimmir transition.
this improves typeck & trans performance by 1%. This looked hotter on
callgrind than it is on a CPU.
@michaelwoerister
Copy link
Member

this avoids parsing item attributes on each call to item_attrs, which takes
off 33% (!) of translation time and 50% (!) of trans-item collection time.

WOW! :)

(self.tcx().mk_region(ty::ReStatic),
self.tcx().mk_region(ty::ReStatic))
(self.tcx().types.re_static,
self.tcx().types.re_static)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we don't have these constants pre-interned. This is going to conflict like hell with one of my in-progress branches, but oh well. =)

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me on the stuff so far

@@ -573,7 +589,7 @@ pub fn shift_regions<'a, 'gcx, 'tcx, T>(tcx: TyCtxt<'a, 'gcx, 'tcx>,
value, amount);

value.fold_with(&mut RegionFolder::new(tcx, &mut false, &mut |region, _current_depth| {
tcx.mk_region(shift_region(*region, amount))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. By-value shift_region may have been used by elision or something. Can we just kill it?

@@ -1942,6 +1940,6 @@ impl<'a, 'tcx> TyLayout<'tcx> {
}

pub fn field<C: LayoutTyper<'tcx>>(&self, cx: C, i: usize) -> C::TyLayout {
cx.layout_of(self.field_type(cx, i))
cx.layout_of(cx.normalize_associated_type(self.field_type(cx, i)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to do it here, when layout_of does it at the start?

Copy link
Contributor Author

@arielb1 arielb1 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it doesn't. types in trans are always normalized.

if let Some(&layout) = self.tcx().layout_cache.borrow().get(&ty) {
return TyLayout { ty: ty, layout: layout, variant_index: None };
}

self.tcx().infer_ctxt((), traits::Reveal::All).enter(|infcx| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is creating the infer_ctxt costly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

@eddyb eddyb Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (trans-)normalize before checking the cache above? Would that solve the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to get rid of the normalize_associated_type method - can't field_of rely on layout_of normalizing before checking the cache? if !ty.has_projection_types() { is really fast, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when there are projection types (nested binders) etc. This method is hot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is has_projection_types anything other than a flag check? I'm not sure I understand what's going on. Can the cache be hit with the unnormalized type if has_projection_types returns true?

ref item => bug!("trait_impl_polarity: {:?} not an impl", item)
}
} else {
self.sess.cstore.impl_polarity(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the CrateStore method? They keep piling up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this commit does what it's supposed to.

@@ -438,6 +437,38 @@ impl Rc<str> {
}
}

impl<T> Rc<[T]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, for the purposes of the compiler, does Rc<[T]> provide a measurable improvement over Rc<Vec<T>>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also be more easily implementable by consuming Vec<T> as you've got to copy data anyway. With Vec<T> the box_free also doesn't need to be exposed as you can just .set_len(0) to drop all the elements.

Copy link
Contributor Author

@arielb1 arielb1 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't bother checking. But Rc<Vec<T>> is too ugly to me.

@arielb1 arielb1 changed the title [WIP] Performance audit, Spring 2017 Performance audit, Spring 2017 Apr 20, 2017
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

I think this is enough for one PR.

That method is *incredibly* hot, so this ends up saving 10% of trans
time.

BTW, we really should be doing dependency tracking there - and possibly be
taking the respective perf hit (got to find a way to make DTMs fast), but
`layout_cache` is a non-dep-tracking map.
@eddyb
Copy link
Member

eddyb commented Apr 20, 2017

@bors r=nikomatsakis,eddyb

@bors
Copy link
Contributor

bors commented Apr 20, 2017

📌 Commit f964da5 has been approved by nikomatsakis,eddyb

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 20, 2017
…nikomatsakis,eddyb

Performance audit, Spring 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.
@eddyb
Copy link
Member

eddyb commented Apr 20, 2017

@bors r- Build failed.

#[inline]
unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
pub(crate) unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
let size = size_of_val(&*ptr);
let align = min_align_of_val(&*ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions are not imported during testing (so this fails to compile then).

Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13
this avoids parsing item attributes on each call to `item_attrs`, which takes
off 33% (!) of translation time and 50% (!) of trans-item collection time.
improves trans performance by *another* 10%.
Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13
this improves trans performance by *another* 10%.
this is another one of these things that looks *much* worse on valgrind.
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

@bors r=eddyb

@bors
Copy link
Contributor

bors commented Apr 20, 2017

📌 Commit dae49f1 has been approved by eddyb

TimNN added a commit to TimNN/rust that referenced this pull request Apr 21, 2017
…eddyb

Performance audit, Spring 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.
@shepmaster shepmaster added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 21, 2017
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 22, 2017

So now I solved the "specialization caching" problem for real.

In some cases (e.g. <[int-var] as Add<[int-var]>>), selection can turn up
a large number of candidates. Bailing out early avoids O(n^2) performance.

This improves item-type checking time by quite a bit, resulting in ~2% of total
time-to-typeck.
@eddyb
Copy link
Member

eddyb commented Apr 22, 2017

@bors r=nikomatsakis,eddyb

@bors
Copy link
Contributor

bors commented Apr 22, 2017

📌 Commit 1b207ca has been approved by nikomatsakis,eddyb

@bors
Copy link
Contributor

bors commented Apr 22, 2017

🔒 Merge conflict

@bors
Copy link
Contributor

bors commented Apr 22, 2017

☔ The latest upstream changes (presumably #41464) made this pull request unmergeable. Please resolve the merge conflicts.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 22, 2017

Moving the branch over to arielb1/rust.

@arielb1 arielb1 closed this Apr 22, 2017
@arielb1 arielb1 deleted the rustc-spring-cleaning branch April 22, 2017 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants