Iterate in reverse when comparing Stacks #2214

saethlin · 2022-06-07T21:09:34Z

I was originally going to propose this after I (eventually, sorry) am done with #1935 but @Nilstrieb independently identified this code path as hot and came up with a different optimization, which is basically this: rust-lang/rust#49892 (comment)

Anecdotally, I've found this optimization to be much more effective but of course that is probably workload-dependent.

src/stacked_borrows.rs

RalfJung · 2022-06-07T23:47:34Z

Thanks, this makes sense!

Bummer that we don't have perf tracking infrastructure... I'll just trust that this works.

saethlin · 2022-06-08T00:04:56Z

We could also wait for @Nilstrieb to confirm that this is at least as impactful as they optimization they were considering.

RalfJung · 2022-06-08T00:29:03Z

Well, have you tested this PR and confirmed that it helps on some benchmarks?

Noratrieb · 2022-06-08T16:16:53Z

Hmm, that is interesting. Running the serde1 benchmark with hyperfine results in this:

faster-tag-partial-eq (local)
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      5.644 s ±  0.216 s    [User: 5.559 s, System: 0.078 s]
  Range (min … max):    5.542 s …  6.244 s    10 runs

master
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.160 s ±  0.069 s    [User: 6.078 s, System: 0.076 s]
  Range (min … max):    6.080 s …  6.313 s    10 runs

faster-stack-cmp
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.202 s ±  0.087 s    [User: 6.114 s, System: 0.080 s]
  Range (min … max):    6.042 s …  6.332 s    10 runs

Noratrieb · 2022-06-08T17:24:29Z

Even more interesting, I ran the mse test as well

faster-tag-partial-eq
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     29.984 s ±  0.170 s    [User: 29.650 s, System: 0.314 s]
  Range (min … max):   29.788 s … 30.355 s    10 runs

master
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     34.884 s ±  0.130 s    [User: 34.583 s, System: 0.294 s]
  Range (min … max):   34.743 s … 35.135 s    10 runs

faster-stack-cmp
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     35.406 s ±  0.128 s    [User: 35.090 s, System: 0.311 s]
  Range (min … max):   35.227 s … 35.634 s    10 runs

Optimize `SbTag::eq` The code before generated really bad code with a branch. This nudges LLVM towards being smarter and simply comparing the integers. See #2214 (comment)

RalfJung · 2022-06-08T17:34:44Z

So on its own this seems to make it worse, if it has an effect at all.
Does rebasing on top of #2218 change anything?

Noratrieb · 2022-06-08T17:54:20Z

Rebasing the two and comparing them with master again

both serde1
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      5.260 s ±  0.106 s    [User: 5.158 s, System: 0.079 s]
  Range (min … max):    5.201 s …  5.554 s    10 runs

both mse
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     30.479 s ±  0.226 s    [User: 30.155 s, System: 0.317 s]
  Range (min … max):   30.340 s … 31.095 s    10 runs

master serde1
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.152 s ±  0.062 s    [User: 6.071 s, System: 0.074 s]
  Range (min … max):    6.069 s …  6.268 s    10 runs

master mse
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     35.022 s ±  0.408 s    [User: 34.702 s, System: 0.313 s]
  Range (min … max):   34.658 s … 36.133 s    10 runs

RalfJung · 2022-06-08T17:56:02Z

Those are nice wins, but what I meant is comparing this PR with master after #2218 landed. The wins you are showing might be entirely due to #2218?

saethlin · 2022-06-08T22:57:49Z

Based on some experimentation I just did on my end, I think the thing to do here is land a few of my local test cases as bench-cargo-miri programs as well as land #1935 then re-evaluate micro-optimizations in this area.

saethlin · 2022-07-03T05:32:54Z

I'll re-open this if it still seems interesting after making all the decisions on #2315

saethlin · 2022-07-16T16:15:48Z

Since #2315, the stack comparison code path is now optimized to a memcmp, and in combination with fact that we are comparing less bits makes this code path a lot less interesting. Awesome!

RalfJung reviewed Jun 7, 2022

View reviewed changes

src/stacked_borrows.rs Outdated Show resolved Hide resolved

Iterate in reverse when comparing Stacks

f8c422f

saethlin force-pushed the faster-stack-cmp branch from 6c22b22 to f8c422f Compare June 8, 2022 00:08

Noratrieb mentioned this pull request Jun 8, 2022

Optimize SbTag::eq #2218

Merged

saethlin marked this pull request as draft June 8, 2022 22:57

saethlin closed this Jul 3, 2022

saethlin deleted the faster-stack-cmp branch July 16, 2022 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iterate in reverse when comparing Stacks #2214

Iterate in reverse when comparing Stacks #2214

Uh oh!

saethlin commented Jun 7, 2022

Uh oh!

Uh oh!

RalfJung commented Jun 7, 2022

Uh oh!

saethlin commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

Noratrieb commented Jun 8, 2022 •

edited

Loading

Uh oh!

Noratrieb commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

Noratrieb commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

saethlin commented Jun 8, 2022

Uh oh!

saethlin commented Jul 3, 2022

Uh oh!

saethlin commented Jul 16, 2022

Uh oh!

Uh oh!

Iterate in reverse when comparing Stacks #2214

Iterate in reverse when comparing Stacks #2214

Uh oh!

Conversation

saethlin commented Jun 7, 2022

Uh oh!

Uh oh!

RalfJung commented Jun 7, 2022

Uh oh!

saethlin commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

Noratrieb commented Jun 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Noratrieb commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

Noratrieb commented Jun 8, 2022

Uh oh!

RalfJung commented Jun 8, 2022

Uh oh!

saethlin commented Jun 8, 2022

Uh oh!

saethlin commented Jul 3, 2022

Uh oh!

saethlin commented Jul 16, 2022

Uh oh!

Uh oh!

Noratrieb commented Jun 8, 2022 •

edited

Loading