-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: clean up busy waiting in prefetcher blocking get #8215
refactor: clean up busy waiting in prefetcher blocking get #8215
Conversation
/// This consumes locked mutex guard to make sure to unlock it before | ||
/// notifying the condition variable. | ||
fn notify_slots_update(&self, guard: MutexGuard<SizeTrackedHashMap>) { | ||
std::mem::drop(guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually a right thing to do? My intuition is that we want to notify first and unlock after, so that the mutex is fairly distributed along all the waiters on the condvar. If it is unlocked before the notification, some other waiter that’s not in a condvar might grab the mutex, leading to all the condvar waiters block immediately after them being notified (a-la thundering herd?)
The standard library doc examples seem to corroborate with my understanding…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leading to all the condvar waiters block immediately after them being notified (a-la thundering herd?)
With unlocking the mutex before notifying I wanted to address exactly that issue (see stackoverflow). In this particular case I don't think it would matter since this is definitely not a performance bottleneck. I will change this to match the example Condvar
usage from the docs to avoid trying to be "unnecessarly smart" here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I raised a question about this upstream. They have some interesting insights :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I would love some wrapper that does not require to manually keep track of needing to signal a condvar when mutating the protected data, but I think the scope of this is restricted enough that it doesn’t matter much if we keep it as is for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff!
I believe we currently only care about notify in insert_fetched
and clear
because the other wake ups are just going back to sleep. But I actually prefer it this way, notifying on all modifications, as it makes future changes easier.
A wrapper that "automates" the notify could be added, as nagisa mentioned. I imagine it to be a new guard type that notifies on drop
. That could be neat, so if you want to explore this, feel free to do so, it can also be a follow up PR. But I wouldn't say it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I really like the Read / Write guard split with the deref impls.
I think we now have even more "unnecessary" notify calls. But it seems like no big issue.
On the plus side, I am now more confident that we haven't missed a case where we needed a notification. So overall, I'm very happy with this wrapper! (And we could even reuse it for other work we want to sync with background threads in the future, such as compilation in the background.)
// result in any flakiness since the test would still pass if we don't | ||
// sleep enough, it just won't verify the the synchronization part of | ||
// `blocking_get`. | ||
std::thread::sleep(Duration::from_micros(1000)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing from_millis
(which you use further down) and from_micros
can be confused on a quick read, I'd probably go with:
std::thread::sleep(Duration::from_micros(1000)); | |
std::thread::sleep(Duration::from_millis(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this couldn't use Barrier
s to synchronize the operations? Timers in tests are a code smell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagisa Here we want to execute insert_fetched
after blocking_get
was executed for some time, so it performs the initial check for the value and then blocks. As far as I understand Barrier
wouldn't help with that.
I definitely agree with your point regarding relying on timing in tests. I've reimplemented this to avoid using sleep
and removed the timeout as part of the test code since now we have that enforced by nextest
. Could you please take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha yeah, barriers don't work here, I had them flipped during review for some reason.
/// It enables blocking while waiting for the underlying value to be updated. | ||
/// The implementation ensures that any modification results in all blocked | ||
/// threads being notified. | ||
pub(crate) struct Monitor<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying that rolling out our own implementation is problematic but curious why we didn't use something off the shelf e.g. https://github.com/reem/rust-shared-mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered considered using shared-mutex
and actually my Monitor
implementation is inspired by it.
Unfortunately it doesn't provide the API we need here, in particular we want SharedMutexWriteGuard
to notify condvar when dropped (see this comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation.
Part of near#7723 This PR replaces busy waiting loop with update notifications on every underlying map update. It introduces single condition variable which is notified on every map update. One considered alternative is to maintain separate condition variable per slot to have more granular notifications. In practice this doesn't make any difference since additional iterations for irrelevant keys wouldn't cause any noticeable performance overhead, but it results in more complex and harder to reason about code.
Part of #7723
This PR replaces busy waiting loop with update notifications on every underlying map update. It introduces single condition variable which is notified on every map update. One considered alternative is to maintain separate condition variable per slot to have more granular notifications. In practice this doesn't make any difference since additional iterations for irrelevant keys wouldn't cause any noticeable performance overhead, but it results in more complex and harder to reason about code.