Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterate in reverse when comparing Stacks #2214

Closed
wants to merge 1 commit into from

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Jun 7, 2022

I was originally going to propose this after I (eventually, sorry) am done with #1935 but @Nilstrieb independently identified this code path as hot and came up with a different optimization, which is basically this: rust-lang/rust#49892 (comment)

Anecdotally, I've found this optimization to be much more effective but of course that is probably workload-dependent.

src/stacked_borrows.rs Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member

RalfJung commented Jun 7, 2022

Thanks, this makes sense!

Bummer that we don't have perf tracking infrastructure... I'll just trust that this works.

@saethlin
Copy link
Member Author

saethlin commented Jun 8, 2022

We could also wait for @Nilstrieb to confirm that this is at least as impactful as they optimization they were considering.

@saethlin saethlin force-pushed the faster-stack-cmp branch from 6c22b22 to f8c422f Compare June 8, 2022 00:08
@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2022

Well, have you tested this PR and confirmed that it helps on some benchmarks?

@Noratrieb
Copy link
Member

Noratrieb commented Jun 8, 2022

Hmm, that is interesting. Running the serde1 benchmark with hyperfine results in this:

faster-tag-partial-eq (local)
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      5.644 s ±  0.216 s    [User: 5.559 s, System: 0.078 s]
  Range (min … max):    5.542 s …  6.244 s    10 runs

master
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.160 s ±  0.069 s    [User: 6.078 s, System: 0.076 s]
  Range (min … max):    6.080 s …  6.313 s    10 runs

faster-stack-cmp
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.202 s ±  0.087 s    [User: 6.114 s, System: 0.080 s]
  Range (min … max):    6.042 s …  6.332 s    10 runs

@Noratrieb Noratrieb mentioned this pull request Jun 8, 2022
@Noratrieb
Copy link
Member

Even more interesting, I ran the mse test as well

faster-tag-partial-eq
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     29.984 s ±  0.170 s    [User: 29.650 s, System: 0.314 s]
  Range (min … max):   29.788 s … 30.355 s    10 runs

master
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     34.884 s ±  0.130 s    [User: 34.583 s, System: 0.294 s]
  Range (min … max):   34.743 s … 35.135 s    10 runs

faster-stack-cmp
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     35.406 s ±  0.128 s    [User: 35.090 s, System: 0.311 s]
  Range (min … max):   35.227 s … 35.634 s    10 runs

bors added a commit that referenced this pull request Jun 8, 2022
Optimize `SbTag::eq`

The code before generated really bad code with a branch.
This nudges LLVM towards being smarter and simply comparing
the integers.

See #2214 (comment)
@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2022

So on its own this seems to make it worse, if it has an effect at all.
Does rebasing on top of #2218 change anything?

@Noratrieb
Copy link
Member

Rebasing the two and comparing them with master again

both serde1
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      5.260 s ±  0.106 s    [User: 5.158 s, System: 0.079 s]
  Range (min … max):    5.201 s …  5.554 s    10 runs

both mse
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     30.479 s ±  0.226 s    [User: 30.155 s, System: 0.317 s]
  Range (min … max):   30.340 s … 31.095 s    10 runs

master serde1
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):      6.152 s ±  0.062 s    [User: 6.071 s, System: 0.074 s]
  Range (min … max):    6.069 s …  6.268 s    10 runs

master mse
Benchmark 1: cargo +miri miri run
  Time (mean ± σ):     35.022 s ±  0.408 s    [User: 34.702 s, System: 0.313 s]
  Range (min … max):   34.658 s … 36.133 s    10 runs

@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2022

Those are nice wins, but what I meant is comparing this PR with master after #2218 landed. The wins you are showing might be entirely due to #2218?

@saethlin
Copy link
Member Author

saethlin commented Jun 8, 2022

Based on some experimentation I just did on my end, I think the thing to do here is land a few of my local test cases as bench-cargo-miri programs as well as land #1935 then re-evaluate micro-optimizations in this area.

@saethlin saethlin marked this pull request as draft June 8, 2022 22:57
@saethlin
Copy link
Member Author

saethlin commented Jul 3, 2022

I'll re-open this if it still seems interesting after making all the decisions on #2315

@saethlin saethlin closed this Jul 3, 2022
@saethlin
Copy link
Member Author

Since #2315, the stack comparison code path is now optimized to a memcmp, and in combination with fact that we are comparing less bits makes this code path a lot less interesting. Awesome!

@saethlin saethlin deleted the faster-stack-cmp branch July 16, 2022 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants