Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move checking ptr tracking on item pop into cold helper function #2328

Merged
merged 1 commit into from
Jul 14, 2022

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Jul 4, 2022

Before:

Benchmark 1: cargo miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      6.729 s ±  0.050 s    [User: 6.608 s, System: 0.124 s]
  Range (min … max):    6.665 s …  6.799 s    5 runs

Benchmark 2: cargo miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):     20.923 s ±  0.271 s    [User: 20.386 s, System: 0.537 s]
  Range (min … max):   20.580 s … 21.165 s    5 runs

After:

Benchmark 1: cargo miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      6.562 s ±  0.023 s    [User: 6.430 s, System: 0.135 s]
  Range (min … max):    6.544 s …  6.594 s    5 runs

Benchmark 2: cargo miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):     20.375 s ±  0.228 s    [User: 19.964 s, System: 0.413 s]
  Range (min … max):   20.201 s … 20.736 s    5 runs

Nothing major, but we'll take it I guess. 🤷

Fixes #2132

@RalfJung
Copy link
Member Author

RalfJung commented Jul 5, 2022

@saethlin do you think this makes sense, or should we close the issue without further action?

@saethlin
Copy link
Member

saethlin commented Jul 5, 2022

I think that micro-optimizations like this should wait until #2315 lands, especially because that makes other changes in this function.

@RalfJung RalfJung added the S-blocked Status: blocked on something happening somewhere else label Jul 5, 2022
@bors
Copy link
Contributor

bors commented Jul 13, 2022

☔ The latest upstream changes (presumably #2315) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung RalfJung removed the S-blocked Status: blocked on something happening somewhere else label Jul 13, 2022
@RalfJung
Copy link
Member Author

Rebased after @saethlin 's PR landed, and found another cold path in the same item_popped function. It now contains barely any code itself. :D (which is probably a good thing)

Before:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      9.047 s ±  0.099 s    [User: 8.890 s, System: 0.157 s]
  Range (min … max):    8.957 s …  9.196 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      2.470 s ±  0.017 s    [User: 2.233 s, System: 0.239 s]
  Range (min … max):    2.456 s …  2.498 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.324 s ±  0.047 s    [User: 3.190 s, System: 0.137 s]
  Range (min … max):    3.262 s …  3.387 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      3.299 s ±  0.019 s    [User: 3.172 s, System: 0.130 s]
  Range (min … max):    3.284 s …  3.331 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):      6.613 s ±  0.052 s    [User: 5.982 s, System: 0.633 s]
  Range (min … max):    6.534 s …  6.660 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      4.800 s ±  0.022 s    [User: 4.638 s, System: 0.164 s]
  Range (min … max):    4.768 s …  4.828 s    5 runs

After:

Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/backtraces/Cargo.toml
  Time (mean ± σ):      8.969 s ±  0.062 s    [User: 8.803 s, System: 0.168 s]
  Range (min … max):    8.876 s …  9.049 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/mse/Cargo.toml
  Time (mean ± σ):      2.461 s ±  0.012 s    [User: 2.234 s, System: 0.229 s]
  Range (min … max):    2.442 s …  2.475 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde1/Cargo.toml
  Time (mean ± σ):      3.244 s ±  0.019 s    [User: 3.115 s, System: 0.131 s]
  Range (min … max):    3.217 s …  3.265 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/serde2/Cargo.toml
  Time (mean ± σ):      3.229 s ±  0.044 s    [User: 3.092 s, System: 0.139 s]
  Range (min … max):    3.180 s …  3.281 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/slice-get-unchecked/Cargo.toml
  Time (mean ± σ):      6.248 s ±  0.106 s    [User: 5.618 s, System: 0.628 s]
  Range (min … max):    6.164 s …  6.422 s    5 runs
 
Benchmark 1: cargo +miri miri run --manifest-path bench-cargo-miri/unicode/Cargo.toml
  Time (mean ± σ):      4.862 s ±  0.038 s    [User: 4.671 s, System: 0.191 s]
  Range (min … max):    4.819 s …  4.920 s    5 runs

This is just on the edge of 1-2σ most of the time, but seems fairly clear on slice-get-unchecked (6% faster). Strangely it seems like a 2σ regression on unicode.

So overall my inclination is to land this? I can't see how it could hurt, anyway...

@saethlin
Copy link
Member

Hm, give me a bit to twiddle with this...

@saethlin
Copy link
Member

Oh yeah this is perfect, item_popped inlines now and the hot path through it doesn't call any functions, which is the transformation I was hoping for 👍

I'm not surprised this is ~perf neutral on most functions and shows the most effect on slice-get-unchecked, because that benchmark stresses this code path more than anything else. A 2σ regression could easily be noise due to the scheduler or some minor perf effect we don't have control over.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 14, 2022

item_popped inlines now

Should we add #[inline] or even #[inline(always)] to to ensure this remains the case, or is that a bad idea?

@saethlin
Copy link
Member

I think that #[inline(always)] is a bad idea, and #[inline] doesn't apply because we aren't trying to enable cross-crate inlining. In my experience applying it when a function isn't being inlined almost always results in slower code, and the way to get the inlining you want is to extract cold paths into #[inline(never)] functions as you've done here.

Ideally LLVM could extract cold paths into functions via PGO but I've never seen that happen. And unlike profile data we know what paths are guaranteed to be cold, as opposed to where we just don't have good profiles yet.

@RalfJung
Copy link
Member Author

Okay, fair. I thought #[inline] would also shift some inlining weights, not just help for cross-crate things, but I don't really know.

Thanks for the review!
@bors r+

@bors
Copy link
Contributor

bors commented Jul 14, 2022

📌 Commit 3bd0e8a has been approved by RalfJung

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jul 14, 2022

⌛ Testing commit 3bd0e8a with merge af2c50f...

@bors
Copy link
Contributor

bors commented Jul 14, 2022

☔ The latest upstream changes (presumably #2369) made this pull request unmergeable. Please resolve the merge conflicts.

@saethlin
Copy link
Member

saethlin commented Jul 14, 2022

It does add an inlining hint, but the hint is small, and some people including me and some people in LLVM keep wanting to get rid of it (most notably on C inline). The fact that both rustc's implementation of #[inline] and C/C++ inline do two things is kind of a problem because you can't get the ODR behavior or the cross-crate inlining without the inline hint as well.

And the fact that it adds a hint I think means that it would be worse to add in a situation like this. What I would really like is a #[rustc::warn_if_not_inlined] or something like that.

@bors
Copy link
Contributor

bors commented Jul 14, 2022

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing af2c50f to master...

@bors bors merged commit af2c50f into rust-lang:master Jul 14, 2022
@RalfJung RalfJung deleted the perf branch July 14, 2022 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perf regression related to cold path code size growth in #2075
3 participants