Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Strings replaced with &strs #579

Merged
merged 17 commits into from
Sep 26, 2024
Merged

Conversation

rben01
Copy link
Contributor

@rben01 rben01 commented Sep 26, 2024

A decent heuristic is that anything only used during parsing and not stored long term in the Environment can borrow from the input.

Unfortunately due to borrow checker limitations, this required moving `input` fields out of both `Parser` and `Tokenizer`, as with the immutable borrow in place, there is no way to tell Rust that a mutable borrow won't touch the input
So, the input is now an (immutable ref) argument to all the methods that used to refer to `&self.input`
And there were a lot...
But now `Token` doesn't carry an owned `String`
Also shrunk `Tokenizer` by doing all math in terms of byte offsets into its input (using existing `SourceCodePosition` fields) instead of storing a separate `Vec<char>` with char indices
… used on a newline and would've had no effect, it is semantically correct to keep the original `+= 1`)
…self.context.lock()` as the lock must be dropped to allow `self` to be borrowed in the interm)
…ut where possible

Note that this generally means the input must actually be longer lived than the evaluated stuff (not a problem except in testing, evidently)
Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this! I'm currently away from home but I plan to do some benchmarks soon to see if your changes result in performance improvements. They definitely result in memory improvements and save a ton of allocations which certainly also improves performance, especially on platforms with slow allocators (wasm?).

Comment on lines 287 to 291
let statements = statements.into_iter();
let mut ans = Vec::with_capacity(statements.size_hint().1.unwrap_or(0));
for s in statements {
ans.push(self.transform_statement(s)?);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a performance improvement or was it somehow necessary to rewrite this iterator expression into a loop for another reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a vestige of me trying to fix a borrow checker error. Reverted.

@@ -144,7 +144,7 @@ pub enum TokenKind {
Eof,
}

#[derive(Debug, Clone, PartialEq, Eq)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Even with your recent improvements, Token is still > 24 byte large(?). Maybe it's better to keep clones of this explicit, by not deriving Copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a vestige of me trying to fix a borrow checker error. Reverted.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates!

@sharkdp sharkdp merged commit 1d4324f into sharkdp:master Sep 26, 2024
15 checks passed
@rben01
Copy link
Contributor Author

rben01 commented Oct 2, 2024

I plan to do some benchmarks soon to see if your changes result in performance improvements

According to hyperfine, master d9e0e2c3e4ab6347ed34b3ab9bce4468b75abe75 is about 1.43 faster at evaluating the prelude than tag 1.13.0 (installed with homebrew). I do not know the various contributions of all the commits that have been made since then but I imagine reducing allocations is a big contributor.

Benchmark 1: numbat -e ""
  Time (mean ± σ):      58.4 ms ±   0.2 ms    [User: 47.5 ms, System: 10.7 ms]
  Range (min … max):    58.0 ms …  58.9 ms    49 runs
 
Benchmark 2: ./target/release/numbat -e ""
  Time (mean ± σ):      40.8 ms ±   0.2 ms    [User: 30.3 ms, System: 10.2 ms]
  Range (min … max):    40.4 ms …  41.3 ms    70 runs
 
Summary
  ./target/release/numbat -e "" ran
    1.43 ± 0.01 times faster than numbat -e ""

@sharkdp
Copy link
Owner

sharkdp commented Oct 2, 2024

According to hyperfine, master d9e0e2c3e4ab6347ed34b3ab9bce4468b75abe75 is about 1.43 faster at evaluating the prelude than tag 1.13.0 (installed with homebrew).

Yes. This is mostly due to the performance improvements documented in #525.

I do not know the various contributions of all the commits that have been made since then but I imagine reducing allocations is a big contributor.

I don't think so. I did a few experiments, and it seems like your allocation-reducing PRs had no measurable runtime effect on x86. This does not mean that they were useless. As I said before, I could imagine that they help on other platforms (WASM). But I'm not sure how to measure performance in the browser. And your changes definitely help with memory consumption, which is an interesting metric in itself. Especially on non-desktop devices. Or when Numbat is used as a library in other applications.

@sharkdp
Copy link
Owner

sharkdp commented Oct 2, 2024

Ok, I'll take this back. There is a small performance improvement. I just checked out 29a99e9 from Aug 29, before any of your changes:

commit 29a99e9e235e3371777fc6eaa0a69c8f9786443b
Author: David Peter <mail@david-peter.de>
Date:   Thu Aug 29 20:38:10 2024 +0200

    Add regression test for #534

and compared it with master (@ d9e0e2c).

The results are here:

▶ hyperfine --warmup=10 --runs=300 './numbat-master -e ""' './numbat-29a99e9 -e ""' --export-json results.json

Benchmark 1: ./numbat-master -e ""
  Time (mean ± σ):      65.0 ms ±   1.2 ms    [User: 55.5 ms, System: 8.8 ms]
  Range (min … max):    62.4 ms …  75.6 ms    300 runs
 
Benchmark 2: ./numbat-29a99e9 -e ""
  Time (mean ± σ):      68.1 ms ±   3.7 ms    [User: 58.5 ms, System: 9.0 ms]
  Range (min … max):    65.1 ms … 111.7 ms    300 runs
 
Summary
  ./numbat-master -e "" ran
    1.05 ± 0.06 times faster than ./numbat-29a99e9 -e ""

image

There seems to be a statistically-significant 5% performance improvement 👍

I ran both versions with the prelude from 29a99e9 for a fair comparison.

@sharkdp
Copy link
Owner

sharkdp commented Oct 2, 2024

I ran both versions with the prelude from 29a99e9 for a fair comparison.

I think this is the key thing that I did wrong previously. If I measure numbat-master against the prelude on master (instead of the prelude @ 29a99e9), it takes 69 ms, i.e. it seemed to be slightly slower than the old Numbat version. But that comparison is not fair, as the prelude on master is different, and slightly larger:

▶ git checkout 29a99e9
▶ fd -e nbt . numbat/modules/ -X wc -l
…
2508
▶ git checkout master
▶ fd -e nbt . numbat/modules/ -X wc -l
…
2688

Some of this increase is from a new extra::color module with 44 lines which is not included by default. So the increase in lines is (2688 - 44) / 2508 = 1.054, i.e. about 5%. That seems to explain it perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants