Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocating before we have checked if it is in the cache + don't rehash subtree #59

Closed
wants to merge 9 commits into from

Conversation

simonvandel
Copy link

@simonvandel simonvandel commented May 17, 2020

We now check if the node/token is in the cache. If it is not in the cache, then we allocate.
In addition, avoid hashing the complete sub-tree.

This speeds up analysis-stats on rust-analyser by ~5% on my local machine.

Benchmark
➜ rust-analyzer-base git:(master) hyperfine --warmup 1 --min-runs=3 '/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .'
Benchmark #1: /home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .
Time (mean ± σ): 46.027 s ± 0.419 s [User: 45.120 s, System: 0.785 s]
Range (min … max): 45.551 s … 46.343 s 3 runs

Benchmark #2: /home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .
Time (mean ± σ): 48.445 s ± 0.457 s [User: 47.657 s, System: 0.715 s]
Range (min … max): 47.978 s … 48.892 s 3 runs

Summary
'/home/simon/Documents/rust-analyzer/target/release/rust-analyzer analysis-stats .' ran
1.05 ± 0.01 times faster than '/home/simon/Documents/rust-analyzer-base/target/release/rust-analyzer analysis-stats .'

src/green/token.rs Outdated Show resolved Hide resolved
@simonvandel simonvandel mentioned this pull request May 17, 2020
Copy link
Collaborator

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, especially since the builder API already has the limit of 3 children for what it deduplicates. I should probably port this optimization over to sorbus as well.

src/green/builder.rs Outdated Show resolved Hide resolved
src/green/builder.rs Outdated Show resolved Hide resolved
@CAD97
Copy link
Collaborator

CAD97 commented May 18, 2020

@simonvandel tangentially related: would you mind testing #58 to see what impact it has? I'm working on an experimental(ish) rewrite of rowan's green trees and don't have the experience to actually measure the impact properly here on Windows.

@simonvandel
Copy link
Author

@CAD97 for benchmarking I simply ran hyperfine with a release build with and without the changes applied. It’s probably easier if you just run that locally yourself.

@simonvandel simonvandel changed the title Avoid allocating before we have checked if it is in the cache Avoid allocating before we have checked if it is in the cache + don't rehash subtree May 19, 2020
@simonvandel
Copy link
Author

simonvandel commented May 19, 2020

The newest revision avoids the api change, simplifies the hashing and does not hash the subtree. ~8% improvement locally.
@CAD97, @matklad this is ready for review

Copy link
Collaborator

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to fix the cache to deal with hash collisions properly, but otherwise this looks like a clear positive on all other factors.

src/green/builder.rs Show resolved Hide resolved
@simonvandel
Copy link
Author

Hi @CAD97
The latest commit should resolve the hash collision handling. Would you like to take a look again?
With the latest changes, the performance dropped a bit, but is still 5% faster than without this PR. I'll update the description with the new results.

I propose landing this in the current state. As a follow-up PR, I would like to experiment with eliminating the need for the SmallVec collection. Essentially I would like to compute the hash of kind + children pointers by just iterating through the input children iterator. That probably requires using the raw_entry api to avoid the hashmap key needing to be owned. Is it possible to iterate the children iterator twice? One time for building the hash, and then potentially a second time in case the node did not exist in the cache, and therefore needs to be allocated.

@CAD97
Copy link
Collaborator

CAD97 commented May 25, 2020

As I mentioned on zulip, I'm concerned now that this as currently implemented will noticeably lower the number of cache hits, as two nodes are only considered deduplicatable by the current test if all children were deduplicated (and thus are the same identity).

Obviously, skipping the heap allocation in the common case is beneficial, as this patch is still positive. But before we do anything other than that one small step, I really would like to see some measurement on heuristics for what nodes are cachable as discussed on zulip.

@CAD97
Copy link
Collaborator

CAD97 commented May 25, 2020

Is it possible to iterate the children iterator twice?

Not while we take [Into]Iterator.

(Apologies: I'm going to use sorbus's names for things below because I'm looking at sorbus right now and don't remember exactly how sorbus's builder names map back into rowan's.)

The tricky part is that a key part of how TreeBuilder functions is by calling Builder::node with an argument of type Drain. The only other argument type I see being used for this API is Vec (and maybe eventually [_; N] once it has a by-value into_iter).

The thing is, all of these types offer a fn as_slice(&self) -> &[T], which would be very useful for the "iter once by ref then again by val" that we would really like to do here to avoid a temporary collection... but no generic way to talk about that functionality.

Off I go to try to make a crate to define an iterator trait for "iterables that can be iterated non-destructively by-ref and then destructively by-val". It won't be perfect because I think the optimal definition wants GAT, but I think I might be able to make something work.


EDIT: wasn't able to figure anything out, and vec::Drain::as_slice is unstable anyway.

@CAD97
Copy link
Collaborator

CAD97 commented May 25, 2020

@simonvandel
Copy link
Author

Hmm, I don't think we can land this as an improvement before we can reuse the allocation of the children iterator passed into NodeCache::node. I tried removing the check for <= 3 children, but that regresses performance. Presumably because of the need to allocate the children all the time into a vector.

@simonvandel simonvandel closed this Jun 5, 2020
bors bot added a commit to CAD97/sorbus that referenced this pull request Jun 23, 2020
45: Remove allocation for builder cache hits r=CAD97 a=CAD97

Credit to @simonvandel [over at the main rowan repository](rust-analyzer/rowan#59) for pointing out how this is possible. Previously: #40 

Now that the necessary APIs for this are riding the trains to stable, here's an actual decent implementation of alloc-free cache hits!

Also adds a bit more explicit typing on pointers, as that led to a bug in the construction of this patch.

Co-authored-by: CAD97 <cad97@cad97.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants