Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak LEB128 reading some more. #69157

Closed

Conversation

nnethercote
Copy link
Contributor

PR #69050 changed LEB128 reading and writing. After it landed I did some
double-checking and found that the writing changes were universally a
speed-up, but the reading changes were not. I'm not exactly sure why,
perhaps there was a quirk of inlining in the particular revision I was
originally working from.

This commit reverts some of the reading changes, while still avoiding
unsafe code. I have checked it on multiple revisions and the speed-ups
seem to be robust.

r? @michaelwoerister

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2020
@nnethercote
Copy link
Contributor Author

Local check results:

clap-rs-check
        avg: -0.9%      min: -1.9%      max: 0.0%
packed-simd-check
        avg: -0.3%      min: -0.8%      max: 0.0%
issue-46449-check
        avg: 0.4%       min: 0.3%       max: 0.6%
tuple-stress-check
        avg: -0.1%      min: -0.5%      max: 0.0%
wg-grammar-check
        avg: -0.2%      min: -0.5%      max: 0.0%
helloworld-check
        avg: 0.2%       min: -0.2%      max: 0.5%
keccak-check
        avg: -0.1%      min: -0.4%      max: 0.0%
webrender-check
        avg: -0.2%      min: -0.4%      max: 0.0%
regex-check
        avg: -0.2%      min: -0.4%      max: 0.1%
ripgrep-check
        avg: -0.1%      min: -0.4%      max: 0.1%
piston-image-check
        avg: -0.2%      min: -0.4%      max: 0.1%
unify-linearly-check
        avg: 0.1%       min: -0.3%      max: 0.4%
serde-check
        avg: -0.1%      min: -0.4%      max: 0.0%
cranelift-codegen-check
        avg: -0.1%      min: -0.4%      max: 0.0%
script-servo-check
        avg: -0.2%      min: -0.4%      max: 0.0%
style-servo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
wf-projection-stress-65510-che...
        avg: -0.1%      min: -0.3%      max: 0.0%
coercions-check
        avg: 0.1%?      min: 0.0%?      max: 0.3%?
cargo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
trait-stress-check
        avg: 0.1%       min: -0.0%      max: 0.3%
unused-warnings-check
        avg: -0.1%      min: -0.3%      max: 0.0%
deeply-nested-check
        avg: 0.1%       min: -0.3%      max: 0.3%
futures-check
        avg: -0.1%      min: -0.3%      max: 0.1%
tokio-webpush-simple-check
        avg: 0.1%       min: -0.3%      max: 0.3%
syn-check
        avg: -0.1%      min: -0.3%      max: 0.1%
hyper-2-check
        avg: -0.1%      min: -0.3%      max: 0.1%
webrender-wrench-check
        avg: -0.1%      min: -0.3%      max: 0.2%
await-call-tree-check
        avg: 0.1%       min: -0.3%      max: 0.2%
serde-serde_derive-check
        avg: -0.1%      min: -0.2%      max: 0.0%
encoding-check
        avg: -0.1%      min: -0.2%      max: 0.1%
unicode_normalization-check
        avg: -0.1%      min: -0.2%      max: 0.0%
html5ever-check
        avg: -0.1%      min: -0.2%      max: 0.1%
ucd-check
        avg: -0.1%      min: -0.2%      max: 0.0%
inflate-check
        avg: -0.0%      min: -0.2%      max: 0.0%
regression-31157-check
        avg: 0.0%       min: -0.1%      max: 0.2%
deep-vector-check
        avg: -0.0%      min: -0.1%      max: 0.0%
ctfe-stress-4-check
        avg: -0.0%?     min: -0.1%?     max: 0.1%?
token-stream-stress-check
        avg: -0.0%      min: -0.1%      max: 0.0%

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Feb 14, 2020

⌛ Trying commit 6cc131cd42435647d537bf73c1b1b5e16d54f14e with merge 3c3657919929d11ff9535e70167778577db99f0a...

@bors
Copy link
Contributor

bors commented Feb 14, 2020

☀️ Try build successful - checks-azure
Build commit: 3c3657919929d11ff9535e70167778577db99f0a (3c3657919929d11ff9535e70167778577db99f0a)

@rust-timer
Copy link
Collaborator

Queued 3c3657919929d11ff9535e70167778577db99f0a with parent 21ed505, future comparison URL.

PR rust-lang#69050 changed LEB128 reading and writing. After it landed I did some
double-checking and found that the writing changes were universally a
speed-up, but the reading changes were not. I'm not exactly sure why,
perhaps there was a quirk of inlining in the particular revision I was
originally working from.

This commit reverts some of the reading changes, while still avoiding
`unsafe` code. I have checked it on multiple revisions and the speed-ups
seem to be robust.
@nnethercote
Copy link
Contributor Author

The CI results show a clear regression, in contrast to my local results, hmm. I have rebased against a more recent revision. Let's try doing another perf CI run, just for interest's sake.

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Feb 16, 2020

⌛ Trying commit e25bd1f with merge 921540b...

bors added a commit that referenced this pull request Feb 16, 2020
Tweak LEB128 reading some more.

PR #69050 changed LEB128 reading and writing. After it landed I did some
double-checking and found that the writing changes were universally a
speed-up, but the reading changes were not. I'm not exactly sure why,
perhaps there was a quirk of inlining in the particular revision I was
originally working from.

This commit reverts some of the reading changes, while still avoiding
`unsafe` code. I have checked it on multiple revisions and the speed-ups
seem to be robust.

r? @michaelwoerister
@bors
Copy link
Contributor

bors commented Feb 17, 2020

☀️ Try build successful - checks-azure
Build commit: 921540b (921540bfb9cb9af5713af43fc21417559eb5d218)

@rust-timer
Copy link
Collaborator

Queued 921540b with parent 5e7af46, future comparison URL.

@nnethercote
Copy link
Contributor Author

So, we have two quite different sets of results when the same change is measured on top of different revisions. I'm going to abandon this PR for the following reasons.

  • The instruction counts regression from the first run is a bit worse than the instruction count improvement from the second run.
  • Both runs look like regressions if you look at the cycle counts.
  • This PR makes the code uglier.

@nnethercote nnethercote deleted the tweak-LEB128-reading branch February 18, 2020 21:26
d-e-s-o added a commit to d-e-s-o/blazesym that referenced this pull request Jun 5, 2024
As it turns out, the Rust compiler uses variable length LEB128 encoded
integers internally. It so happens that they spent a fair amount of
effort micro-optimizing the decoding functionality [0] [1], as it's in
the hot path.
With this change we replace our decoding routines with these optimized
ones. To make that happen more easily (and to gain some base line speed
up), also remove the "shift" return from the respective methods. As a
result of these changes, we see a respective speed up:

Before:
  test util::tests::bench_u64_leb128_reading  ... bench:  128 ns/iter (+/- 10)

After:
  test util::tests::bench_u64_leb128_reading  ... bench:  103 ns/iter (+/- 5)

Gsym decoding, which uses these routines, improved as follows:
  main/symbolize_gsym_multi_no_setup
    time:   [146.26 µs 146.69 µs 147.18 µs]
    change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02)
    Performance has improved.

[0] rust-lang/rust#69050
[1] rust-lang/rust#69157

Signed-off-by: Daniel Müller <deso@posteo.net>
d-e-s-o added a commit to d-e-s-o/blazesym that referenced this pull request Jun 5, 2024
As it turns out, the Rust compiler uses variable length LEB128 encoded
integers internally. It so happens that they spent a fair amount of
effort micro-optimizing the decoding functionality [0] [1], as it's in
the hot path.
With this change we replace our decoding routines with these optimized
ones. To make that happen more easily (and to gain some base line speed
up), also remove the "shift" return from the respective methods. As a
result of these changes, we see a respectable speed up:

Before:
  test util::tests::bench_u64_leb128_reading  ... bench:  128 ns/iter (+/- 10)

After:
  test util::tests::bench_u64_leb128_reading  ... bench:  103 ns/iter (+/- 5)

Gsym decoding, which uses these routines, improved as follows:
  main/symbolize_gsym_multi_no_setup
    time:   [146.26 µs 146.69 µs 147.18 µs]
    change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02)
    Performance has improved.

[0] rust-lang/rust#69050
[1] rust-lang/rust#69157

Signed-off-by: Daniel Müller <deso@posteo.net>
d-e-s-o added a commit to libbpf/blazesym that referenced this pull request Jun 5, 2024
As it turns out, the Rust compiler uses variable length LEB128 encoded
integers internally. It so happens that they spent a fair amount of
effort micro-optimizing the decoding functionality [0] [1], as it's in
the hot path.
With this change we replace our decoding routines with these optimized
ones. To make that happen more easily (and to gain some base line speed
up), also remove the "shift" return from the respective methods. As a
result of these changes, we see a respectable speed up:

Before:
  test util::tests::bench_u64_leb128_reading  ... bench:  128 ns/iter (+/- 10)

After:
  test util::tests::bench_u64_leb128_reading  ... bench:  103 ns/iter (+/- 5)

Gsym decoding, which uses these routines, improved as follows:
  main/symbolize_gsym_multi_no_setup
    time:   [146.26 µs 146.69 µs 147.18 µs]
    change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02)
    Performance has improved.

[0] rust-lang/rust#69050
[1] rust-lang/rust#69157

Signed-off-by: Daniel Müller <deso@posteo.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants