Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cloning the typechecker in elaborate_statement #543

Merged
merged 2 commits into from
Aug 19, 2024

Conversation

GuerricChupin
Copy link
Contributor

Running numbat with cargo flamegraph, I found that a significant portion of the startup time was spent in the DefineFunction branch of elaborate_statement in the typechecker just cloning TypeChecker (line 1313 of numbat/src/typechecker/mod.rs). I think this is only due to the need to save and restore the environment, value_namespace and type_namespace members of TypeChecker after we have finished checking the function definition to remove any local names before continuing.

To avoid this, I experimented with replacing the HashMaps in Environment and NameSpace with a stack of HashMaps (a MapStack for lack of a better name) that can be used to save the current state of the map (when we start checking a definition) and restore it (when we are done) but otherwise behaves like a hash map. This seems to give good results to reduce startup time with little impact on the typechecker's code itself. Running numbat on the files in the example/ directory with hyperfine, I observe a ~20-25% reduction in time. I've attached the results as Markdown files for master and this branch as well as the output of cargo flamegraph for master.

bench-master.md
bench-avoid-cloning-the-typechecker.md

flamegraph.svg

#[derive(Debug, Clone)]
pub(crate) struct MapStack<K, V> {
top: HashMap<K, V>,
stack: Vec<HashMap<K, V>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably only use two layers at the moment, but it's great to have this more general stack, since we're going to need it once we add nested functions, I guess.

/// which one can then restore with `restore`.
#[derive(Debug, Clone)]
pub(crate) struct MapStack<K, V> {
top: HashMap<K, V>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why you chose to keep this as a separate element in the struct instead of keeping it in the Vec? Doing so might simplify the implementation of some functions below, but probably has other disadvantages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it was a bit more convenient to have it this way, because it saved some unwrap/except in a bunch of places. However now that the data structure is in its own file, it's not that different. I've changed the implementation to just use a stack with the function restore in charge of ensuring its not empty and insert panicking if it can't find anything at the top of the stack (which should be impossible because restore will panic before).

Also that allowed me to see a bug in my implementation: the stack vector should be traversed in reverse order, I've fixed iter_dict and iter_dict_mut.

}

pub(crate) fn insert(&mut self, key: K, value: V) -> Option<V> {
self.top.insert(key, value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this yield proper semantics? HashMap::insert returns Some(…) if the map previously contained the value that is being inserted. But you are not checking the other maps on the stack here.

It might not be problematic for our current usage of the map (maybe we never check the return value of insert?). But then we should maybe change the return type to ()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to (), it makes more sense indeed

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent analysis, great idea & execution and very nice results!

I can reproduce your results using cargo bench (almost 30% speedup). We have a "micro" benchmark to measure the time it takes to load the "prelude", which is basically equivalent to the startup time of Numbat:

master:

Import prelude          time:   [71.882 ms 72.311 ms 72.914 ms]

this branch:

Import prelude          time:   [51.336 ms 51.584 ms 51.917 ms]
                        change: [-29.391% -28.663% -28.060%] (p = 0.00 < 0.05)
                        Performance has improved.

I only have two minor questions.

This datastructure is a stack of hash maps. It behaves
like a normal hash map with regards to get and insert,
except that one can save and restore the state of the
map.
@GuerricChupin GuerricChupin force-pushed the avoid-cloning-the-typechecker branch from 3d06430 to 6068c6e Compare August 19, 2024 18:37
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@sharkdp sharkdp merged commit 47d90c1 into sharkdp:master Aug 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants