Tweak LEB128 reading some more. #69157

nnethercote · 2020-02-14T05:58:10Z

PR #69050 changed LEB128 reading and writing. After it landed I did some
double-checking and found that the writing changes were universally a
speed-up, but the reading changes were not. I'm not exactly sure why,
perhaps there was a quirk of inlining in the particular revision I was
originally working from.

This commit reverts some of the reading changes, while still avoiding
unsafe code. I have checked it on multiple revisions and the speed-ups
seem to be robust.

r? @michaelwoerister

nnethercote · 2020-02-14T05:59:21Z

Local check results:

clap-rs-check
        avg: -0.9%      min: -1.9%      max: 0.0%
packed-simd-check
        avg: -0.3%      min: -0.8%      max: 0.0%
issue-46449-check
        avg: 0.4%       min: 0.3%       max: 0.6%
tuple-stress-check
        avg: -0.1%      min: -0.5%      max: 0.0%
wg-grammar-check
        avg: -0.2%      min: -0.5%      max: 0.0%
helloworld-check
        avg: 0.2%       min: -0.2%      max: 0.5%
keccak-check
        avg: -0.1%      min: -0.4%      max: 0.0%
webrender-check
        avg: -0.2%      min: -0.4%      max: 0.0%
regex-check
        avg: -0.2%      min: -0.4%      max: 0.1%
ripgrep-check
        avg: -0.1%      min: -0.4%      max: 0.1%
piston-image-check
        avg: -0.2%      min: -0.4%      max: 0.1%
unify-linearly-check
        avg: 0.1%       min: -0.3%      max: 0.4%
serde-check
        avg: -0.1%      min: -0.4%      max: 0.0%
cranelift-codegen-check
        avg: -0.1%      min: -0.4%      max: 0.0%
script-servo-check
        avg: -0.2%      min: -0.4%      max: 0.0%
style-servo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
wf-projection-stress-65510-che...
        avg: -0.1%      min: -0.3%      max: 0.0%
coercions-check
        avg: 0.1%?      min: 0.0%?      max: 0.3%?
cargo-check
        avg: -0.1%      min: -0.3%      max: 0.0%
trait-stress-check
        avg: 0.1%       min: -0.0%      max: 0.3%
unused-warnings-check
        avg: -0.1%      min: -0.3%      max: 0.0%
deeply-nested-check
        avg: 0.1%       min: -0.3%      max: 0.3%
futures-check
        avg: -0.1%      min: -0.3%      max: 0.1%
tokio-webpush-simple-check
        avg: 0.1%       min: -0.3%      max: 0.3%
syn-check
        avg: -0.1%      min: -0.3%      max: 0.1%
hyper-2-check
        avg: -0.1%      min: -0.3%      max: 0.1%
webrender-wrench-check
        avg: -0.1%      min: -0.3%      max: 0.2%
await-call-tree-check
        avg: 0.1%       min: -0.3%      max: 0.2%
serde-serde_derive-check
        avg: -0.1%      min: -0.2%      max: 0.0%
encoding-check
        avg: -0.1%      min: -0.2%      max: 0.1%
unicode_normalization-check
        avg: -0.1%      min: -0.2%      max: 0.0%
html5ever-check
        avg: -0.1%      min: -0.2%      max: 0.1%
ucd-check
        avg: -0.1%      min: -0.2%      max: 0.0%
inflate-check
        avg: -0.0%      min: -0.2%      max: 0.0%
regression-31157-check
        avg: 0.0%       min: -0.1%      max: 0.2%
deep-vector-check
        avg: -0.0%      min: -0.1%      max: 0.0%
ctfe-stress-4-check
        avg: -0.0%?     min: -0.1%?     max: 0.1%?
token-stream-stress-check
        avg: -0.0%      min: -0.1%      max: 0.0%

nnethercote · 2020-02-14T05:59:30Z

@bors try @rust-timer queue

rust-timer · 2020-02-14T05:59:31Z

Awaiting bors try build completion

bors · 2020-02-14T05:59:46Z

⌛ Trying commit 6cc131cd42435647d537bf73c1b1b5e16d54f14e with merge 3c3657919929d11ff9535e70167778577db99f0a...

bors · 2020-02-14T08:46:17Z

☀️ Try build successful - checks-azure
Build commit: 3c3657919929d11ff9535e70167778577db99f0a (3c3657919929d11ff9535e70167778577db99f0a)

rust-timer · 2020-02-14T08:46:19Z

Queued 3c3657919929d11ff9535e70167778577db99f0a with parent 21ed505, future comparison URL.

PR rust-lang#69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from. This commit reverts some of the reading changes, while still avoiding `unsafe` code. I have checked it on multiple revisions and the speed-ups seem to be robust.

nnethercote · 2020-02-16T23:04:48Z

The CI results show a clear regression, in contrast to my local results, hmm. I have rebased against a more recent revision. Let's try doing another perf CI run, just for interest's sake.

@bors try @rust-timer queue

rust-timer · 2020-02-16T23:04:50Z

Awaiting bors try build completion

bors · 2020-02-16T23:04:58Z

⌛ Trying commit e25bd1f with merge 921540b...

@michaelwoerister

Tweak LEB128 reading some more. PR #69050 changed LEB128 reading and writing. After it landed I did some double-checking and found that the writing changes were universally a speed-up, but the reading changes were not. I'm not exactly sure why, perhaps there was a quirk of inlining in the particular revision I was originally working from. This commit reverts some of the reading changes, while still avoiding `unsafe` code. I have checked it on multiple revisions and the speed-ups seem to be robust. r? @michaelwoerister

bors · 2020-02-17T01:48:27Z

☀️ Try build successful - checks-azure
Build commit: 921540b (921540bfb9cb9af5713af43fc21417559eb5d218)

rust-timer · 2020-02-17T01:48:28Z

Queued 921540b with parent 5e7af46, future comparison URL.

nnethercote · 2020-02-17T21:32:16Z

So, we have two quite different sets of results when the same change is measured on top of different revisions. I'm going to abandon this PR for the following reasons.

The instruction counts regression from the first run is a bit worse than the instruction count improvement from the second run.
Both runs look like regressions if you look at the cycle counts.
This PR makes the code uglier.

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respective speed up: Before: test util::tests::bench_u64_leb128_reading ... bench: 128 ns/iter (+/- 10) After: test util::tests::bench_u64_leb128_reading ... bench: 103 ns/iter (+/- 5) Gsym decoding, which uses these routines, improved as follows: main/symbolize_gsym_multi_no_setup time: [146.26 µs 146.69 µs 147.18 µs] change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02) Performance has improved. [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respectable speed up: Before: test util::tests::bench_u64_leb128_reading ... bench: 128 ns/iter (+/- 10) After: test util::tests::bench_u64_leb128_reading ... bench: 103 ns/iter (+/- 5) Gsym decoding, which uses these routines, improved as follows: main/symbolize_gsym_multi_no_setup time: [146.26 µs 146.69 µs 147.18 µs] change: [−7.2075% −5.7106% −4.4870%] (p = 0.00 < 0.02) Performance has improved. [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respectable speed up: Before: > test bench_reading_leb128_unsigned ... bench: 235.83 ns/iter (+/- 32.53) After: > test bench_reading_leb128_unsigned ... bench: 157.38 ns/iter (+/- 17.09) [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respectable speed up: System gimli-rs#1: Before: test bench_reading_leb128_unsigned ... bench: 235.83 ns/iter (+/- 32.53) After: test bench_reading_leb128_unsigned ... bench: 157.38 ns/iter (+/- 17.09) System gimli-rs#2: Before: test bench_reading_leb128_unsigned ... bench: 183.70 ns/iter (+/- 2.72) After: test bench_reading_leb128_unsigned ... bench: 109.08 ns/iter (+/- 3.11) [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. To make that happen more easily (and to gain some base line speed up), also remove the "shift" return from the respective methods. As a result of these changes, we see a respectable speed up: System gimli-rs#1: Before: test bench_reading_leb128_unsigned ... bench: 235.83 ns/iter (+/- 32.53) After: test bench_reading_leb128_unsigned ... bench: 157.38 ns/iter (+/- 17.09) System gimli-rs#2: Before: test bench_reading_leb128_unsigned ... bench: 183.70 ns/iter (+/- 2.72) After: test bench_reading_leb128_unsigned ... bench: 103.83 ns/iter (+/- 3.28) [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

As it turns out, the Rust compiler uses variable length LEB128 encoded integers internally. It so happens that they spent a fair amount of effort micro-optimizing the decoding functionality [0] [1], as it's in the hot path. With this change we replace our decoding routines with these optimized ones. As a result of these changes, we see a respectable speed up: System gimli-rs#1: Before: test bench_reading_leb128_unsigned ... bench: 235.83 ns/iter (+/- 32.53) After: test bench_reading_leb128_unsigned ... bench: 157.38 ns/iter (+/- 17.09) System gimli-rs#2: Before: test bench_reading_leb128_unsigned ... bench: 183.70 ns/iter (+/- 2.72) After: test bench_reading_leb128_unsigned ... bench: 103.83 ns/iter (+/- 3.28) [0] rust-lang/rust#69050 [1] rust-lang/rust#69157 Signed-off-by: Daniel Müller <deso@posteo.net>

rust-highfive assigned michaelwoerister Feb 14, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2020

nnethercote force-pushed the tweak-LEB128-reading branch from 6cc131c to e25bd1f Compare February 16, 2020 23:03

nnethercote closed this Feb 17, 2020

nnethercote deleted the tweak-LEB128-reading branch February 18, 2020 21:26

d-e-s-o mentioned this pull request Jun 5, 2024

Optimize LEB128 data reading libbpf/blazesym#719

Merged

d-e-s-o mentioned this pull request Sep 10, 2025

Optimize LEB128 data reading gimli-rs/gimli#795

Merged

Tweak LEB128 reading some more. #69157

Tweak LEB128 reading some more. #69157

Uh oh!

Conversation

nnethercote commented Feb 14, 2020

Uh oh!

nnethercote commented Feb 14, 2020

Uh oh!

nnethercote commented Feb 14, 2020

Uh oh!

rust-timer commented Feb 14, 2020

Uh oh!

bors commented Feb 14, 2020

Uh oh!

bors commented Feb 14, 2020

Uh oh!

rust-timer commented Feb 14, 2020

Uh oh!

nnethercote commented Feb 16, 2020

Uh oh!

rust-timer commented Feb 16, 2020

Uh oh!

bors commented Feb 16, 2020

Uh oh!

bors commented Feb 17, 2020

Uh oh!

rust-timer commented Feb 17, 2020

Uh oh!

nnethercote commented Feb 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants