-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use once_cell
crate instead of custom data structure
#72256
Conversation
@@ -371,13 +371,13 @@ impl<'sess> OnDiskCache<'sess> { | |||
let pos = index.get(&dep_node_index).cloned()?; | |||
|
|||
// Initialize `cnum_map` using the value from the thread that finishes the closure first. | |||
self.cnum_map.init_nonlocking_same(|| Self::compute_cnum_map(tcx, &self.prev_cnums[..])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only use of optimistic initialization strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be replaceable with something like this:
if self.cnum_map.get().is_none() {
self.cnum_map.set(Self::compute_cnum_map(...)));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Do you see a reason why we don't just lock here though? I believe @Zoxc authored this originally, but they are on hiatus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this is called for every single query result we deserialize from disk -- that means likely hundreds of thousands of calls, if not more, and in quite hot code -- I imagine locking may be quite expensive as such. I'm not sure why we're unable to initially the crate number map earlier, though; I would expect there to be a point in between query deserialization starting and having the prev_cnums available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll switch to your suggestion and leave a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, OnceCell
already has a fast path that doesn't take the mutex when the value is already set. The only difference in behavior would be that only a single thread executes compute_cnum_map
while the rest get de-scheduled. This seems like it might actually be an improvement over the status quo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh in that case yeah this is fine to leave as is, I wasn't sure if that optimization was built-in yet. I can't imagine it matters whether we wait on a mutex or not ~N times when N is so low (on powerful hardware today, probably only a hundred or so at most).
The plan is to move OnceCell into std(rust-lang/rfcs#2788), so another option, instead of adding it to the list of approved crates, is to sneaky copy-paste it into std :D |
src/librustc_session/session.rs
Outdated
} | ||
|
||
#[inline] | ||
pub fn crate_types_opt(&self) -> Option<&[CrateType]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this to a future PR but it seems quite suspect if we're using this API, as that presumes we don't really care about the answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used here:
rust/src/librustc_codegen_llvm/context.rs
Lines 100 to 103 in 31add7e
/// PIE is potentially more effective than PIC, but can only be used in executables. | |
/// If all our outputs are executables, then we can relax PIC to PIE when producing object code. | |
/// If the list of crate types is not yet known we conservatively return `false`. | |
pub fn all_outputs_are_pic_executables(sess: &Session) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at that but wasn't able to quickly tell what all is using it and at what stage of compilation -- it does seem like crate types should be populated by then, but I could be wrong of course :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I see I forgot to respond to this. I did initially try unwrapping crate_types
here, but it wasn't populated at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Glad we at least tried it - it sounds like there may be something that's not quite working somewhere as a result, maybe good to file a follow-up issue. Feel free to assign to me. Certainly not urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -371,13 +371,13 @@ impl<'sess> OnDiskCache<'sess> { | |||
let pos = index.get(&dep_node_index).cloned()?; | |||
|
|||
// Initialize `cnum_map` using the value from the thread that finishes the closure first. | |||
self.cnum_map.init_nonlocking_same(|| Self::compute_cnum_map(tcx, &self.prev_cnums[..])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be replaceable with something like this:
if self.cnum_map.get().is_none() {
self.cnum_map.set(Self::compute_cnum_map(...)));
}
self.features.set(features); | ||
match self.features.set(features) { | ||
Ok(()) => {} | ||
Err(_) => panic!("`features` was initialized twice"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use expect here like in init_crate_types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Features
doesn't implement Debug
, and I suspect a derived one would be quite large. Still want me to do this? I could also just unwrap
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I didn't realize that was the reason. Seems fine to leave as-is then
once_cell should be fine to add to the whitelist, I'm surprised it isn't in there already to be honest. |
Perf queue is empty at the moment, so I'm going to schedule a run with @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 20e661519d0cfe798345878ca78c140c781b65da with merge 8ad681519630f9fc92d9a689c38a3e33e26fdccc... |
☀️ Try build successful - checks-azure |
Queued 8ad681519630f9fc92d9a689c38a3e33e26fdccc with parent 6163394, future comparison URL. |
Finished benchmarking try commit 8ad681519630f9fc92d9a689c38a3e33e26fdccc, comparison URL. |
Queue is still empty, so I'm trying another perf run with the @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 683a4af6975ac6ca7ed757596b707372a067ba04 with merge 5c9544c6a827baf278ab73b4d724c96f1941467e... |
☀️ Try build successful - checks-azure |
Queued 5c9544c6a827baf278ab73b4d724c96f1941467e with parent 34cce58, future comparison URL. |
Finished benchmarking try commit 5c9544c6a827baf278ab73b4d724c96f1941467e, comparison URL. |
I don't see a problem with adding a dep on |
Interestingly, making the "getter" methods eligible for inlining seems to have made instruction counts slightly worse. I was expecting Are we comfortable with the perf implications of switching to |
For sure. Those perf implications are barely a blip AFAICT - I'm not even sure they go beyond noise margins. |
r=me but would be nice to squash commits down a bit (not necessarily into one, up to you) |
I disagree with this part of your statement. The noise floor for instruction counts of "check" builds is around a tenth of a percent (filter by "-check"). This PR makes That said, I think the switch is still worth it, since |
Hm, yeah, I agree that there's at least plausibly non-noise there but it may well be due to external effects -- e.g. different inlining or code layout -- I doubt it's really because of this change in some sense. It would be great if we had e.g. |
Squashed and rebased. @bors r=Mark-Simulacrum |
🔒 Merge conflict This pull request and the master branch diverged in a way that cannot be automatically merged. Please rebase on top of the latest master branch, and let the reviewer approve again. How do I rebase?Assuming
You may also read Git Rebasing to Resolve Conflicts by Drew Blessing for a short tutorial. Please avoid the "Resolve conflicts" button on GitHub. It uses Sometimes step 4 will complete without asking for resolution. This is usually due to difference between how Error message
|
☔ The latest upstream changes (presumably #72464) made this pull request unmergeable. Please resolve the merge conflicts. |
@bors r+ |
📌 Commit 307153e has been approved by |
🌲 The tree is currently closed for pull requests below priority 1000, this pull request will be tested once the tree is reopened |
📌 Commit 307153e has been approved by |
🌲 The tree is currently closed for pull requests below priority 1000, this pull request will be tested once the tree is reopened |
☀️ Test successful - checks-azure |
Internally, we use the
Once
type for shared data that is initialized exactly once and only read from afterwards.Once
uses aparking_lot::Mutex
when the parallel compiler is enabled and aRefCell
when it is not. This PR switches to theonce_cell
crate, which also uses aparking_lot::Mutex
for itssync
version (because we enable theparking_lot
feature) but has zero overhead for itsunsync
one.This PR adds
once_cell
to the list of whitelisted dependencies. I think this is acceptable because it is already used inrustc_driver
, is owned by a well-known community member (cc @matklad), and has a stable release. cc @rust-lang/compileronce_cell
has a slightly more minimal API thanOnce
, which allows for initialization to be either optimistic (evaluate the initializer and then synchronize) or pessimistic (synchronize and then evaluate the initializer).once_cell
'sget_or_init
is always pessimistic. The optimistic version is only used once in the currentmaster
.r? @Mark-Simulacrum