-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More ObligationForest
improvements
#64545
Conversation
`Node` has an optional parent and a list of other descendents. Most of the time the parent is treated the same as the other descendents -- error-handling is the exception -- and chaining them together for iteration has a non-trivial cost. This commit changes the representation. There is now a single list of descendants, and a boolean flag that indicates if there is a parent (in which case it is first descendent). This representation encodes the same information, in a way that is less idiomatic but cheaper to iterate over for the common case where the parent doesn't need special treatment. As a result, some benchmark workloads are up to 2% faster.
The size of the indices doesn't matter much here, and having a `newtype_index!` index type without also using `IndexVec` requires lots of conversions. So this commit removes `NodeIndex` in favour of uniform use of `usize` as the index type. As well as making the code slightly more concise, it also slightly speeds things up.
Now that all indices have type `usize`, it makes sense to be more consistent about their naming. This commit removes all uses of `i` in favour of `index`.
Some local measurements:
Let's see how a CI perf run compares. @bors try |
More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis
☀️ Try build successful - checks-azure |
@rust-timer build 2ae267a |
Queued 2ae267a with parent 5670d04, future comparison URL. |
Finished benchmarking try commit 2ae267a, comparison URL. |
Perf results on CI are similar to what I saw locally: big wins on I think the lesson from this PR (especially commits 1, 4 and 5) is "don't rely on the compiler to fully optimize higher-level constructs in the hottest code". |
PR rust-lang#64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in rust-lang#64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.
Simplify some `Iterator` methods. PR #64545 got a big speed-up by replacing a hot call to `all()` with explicit iteration. This is because the implementation of `all()` is excessively complex: it wraps the given predicate in a closure that returns a `LoopState`, passes that closure to `try_for_each()`, which wraps the first closure in a second closure, passes that second closure to `try_fold()`, which does the actual iteration using the second closure. A sufficient smart compiler could optimize all this away; rustc is currently not sufficiently smart. This commit does the following. - Changes the implementations of `all()`, `any()`, `find()` and `find_map()` to use the simplest possible code, rather than using `try_for_each()`. (I am reminded of "The Evolution of a Haskell Programmer".) These are both shorter and faster than the current implementations, and will permit the undoing of the `all()` removal in #64545. - Changes `ResultShunt::next()` so it doesn't call `self.find()`, because that was causing infinite recursion with the new implementation of `find()`, which itself calls `self.next()`. (I honestly don't know how the old implementation of `ResultShunt::next()` didn't cause an infinite loop, given that it also called `self.next()`, albeit via `try_for_each()` and `try_fold()`.) - Changes `nth()` to use `self.next()` in a while loop rather than `for x in self`, because using self-iteration within an iterator method seems dubious, and `self.next()` is used in all the other iterator methods.
r=me with comment, thanks @nnethercote! |
Amazingly enough, this is a 3.5% instruction count win on `keccak`.
The super-hot call site of `inlined_shallow_resolve()` basically does `r.inlined_shallow_resolve(ty) != ty`. This commit introduces a version of that function specialized for that particular call pattern, `shallow_resolve_changed()`. Incredibly, this reduces the instruction count for `keccak` by 5%. The commit also renames `inlined_shallow_resolve()` as `shallow_resolve()` and removes the `inline(always)` annotation, as it's no longer nearly so hot.
b261edf
to
3b85597
Compare
@bors r=nmatsakis |
📌 Commit 3b85597 has been approved by |
Because this has a perf impact, let's do: |
Since we've already run perf there's probably no reason for that? command is rollup=never btw |
My preference is for perf-affecting PRs to land on their own. If you look at perf.rust-lang.org and see an improvement or regression, and find it's a roll-up, it can be a real pain to work out which PR caused the change. @bors rollup=never |
More `ObligationForest` improvements Following on from #64500, these commits alsomake the code both nicer and faster. r? @nikomatsakis
☀️ Test successful - checks-azure |
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like rust-lang#64545).
Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)
Even more `ObligationForest` improvements Following on from #64545, more speed and readability improvements. r? @nikomatsakis
Remove manual unrolling from slice::Iter(Mut)::try_fold While this definitely helps sometimes (particularly for trivial closures), it's also a pessimization sometimes, so it's better to leave this to (hypothetical) future LLVM improvements instead of forcing this on everyone. I think it's better for the advice to be that sometimes you need to unroll manually than you sometimes need to not-unroll manually (like #64545). --- For context see #64572 (comment)
Following on from #64500, these commits alsomake the code both nicer and faster.
r? @nikomatsakis