refactor: clean up busy waiting in prefetcher blocking get #8215

pugachAG · 2022-12-13T13:49:14Z

Part of #7723

This PR replaces busy waiting loop with update notifications on every underlying map update. It introduces single condition variable which is notified on every map update. One considered alternative is to maintain separate condition variable per slot to have more granular notifications. In practice this doesn't make any difference since additional iterations for irrelevant keys wouldn't cause any noticeable performance overhead, but it results in more complex and harder to reason about code.

nagisa · 2022-12-15T13:20:08Z

core/store/src/trie/prefetching_trie_storage.rs

+    /// This consumes locked mutex guard to make sure to unlock it before
+    /// notifying the condition variable.
+    fn notify_slots_update(&self, guard: MutexGuard<SizeTrackedHashMap>) {
+        std::mem::drop(guard);


Is this actually a right thing to do? My intuition is that we want to notify first and unlock after, so that the mutex is fairly distributed along all the waiters on the condvar. If it is unlocked before the notification, some other waiter that’s not in a condvar might grab the mutex, leading to all the condvar waiters block immediately after them being notified (a-la thundering herd?)

The standard library doc examples seem to corroborate with my understanding…

leading to all the condvar waiters block immediately after them being notified (a-la thundering herd?)

With unlocking the mutex before notifying I wanted to address exactly that issue (see stackoverflow). In this particular case I don't think it would matter since this is definitely not a performance bottleneck. I will change this to match the example Condvar usage from the docs to avoid trying to be "unnecessarly smart" here :)

FWIW I raised a question about this upstream. They have some interesting insights :)

nagisa

LGTM. I would love some wrapper that does not require to manually keep track of needing to signal a condvar when mutating the protected data, but I think the scope of this is restricted enough that it doesn’t matter much if we keep it as is for the time being.

jakmeier

Awesome stuff!

I believe we currently only care about notify in insert_fetched and clear because the other wake ups are just going back to sleep. But I actually prefer it this way, notifying on all modifications, as it makes future changes easier.

A wrapper that "automates" the notify could be added, as nagisa mentioned. I imagine it to be a new guard type that notifies on drop. That could be neat, so if you want to explore this, feel free to do so, it can also be a follow up PR. But I wouldn't say it's necessary.

pugachAG · 2022-12-19T12:10:03Z

@jakmeier @nagisa I've introduced Monitor to wrap mutex and condvar, could you please take another look

jakmeier

Nice, I really like the Read / Write guard split with the deref impls.

I think we now have even more "unnecessary" notify calls. But it seems like no big issue.

On the plus side, I am now more confident that we haven't missed a case where we needed a notification. So overall, I'm very happy with this wrapper! (And we could even reuse it for other work we want to sync with background threads in the future, such as compilation in the background.)

jakmeier · 2022-12-19T14:57:12Z

core/store/src/trie/prefetching_trie_storage.rs

+        // result in any flakiness since the test would still pass if we don't
+        // sleep enough, it just won't verify the the synchronization part of
+        // `blocking_get`.
+        std::thread::sleep(Duration::from_micros(1000));


Mixing from_millis (which you use further down) and from_micros can be confused on a quick read, I'd probably go with:

Suggested change

std::thread::sleep(Duration::from_micros(1000));

std::thread::sleep(Duration::from_millis(1));

Is there a reason why this couldn't use Barriers to synchronize the operations? Timers in tests are a code smell.

@nagisa Here we want to execute insert_fetched after blocking_get was executed for some time, so it performs the initial check for the value and then blocks. As far as I understand Barrier wouldn't help with that.

I definitely agree with your point regarding relying on timing in tests. I've reimplemented this to avoid using sleep and removed the timeout as part of the test code since now we have that enforced by nextest. Could you please take another look.

aha yeah, barriers don't work here, I had them flipped during review for some reason.

akhi3030 · 2022-12-20T10:39:15Z

core/store/src/sync_utils.rs

+/// It enables blocking while waiting for the underlying value to be updated.
+/// The implementation ensures that any modification results in all blocked
+/// threads being notified.
+pub(crate) struct Monitor<T> {


Not saying that rolling out our own implementation is problematic but curious why we didn't use something off the shelf e.g. https://github.com/reem/rust-shared-mutex.

I've considered considered using shared-mutex and actually my Monitor implementation is inspired by it.
Unfortunately it doesn't provide the API we need here, in particular we want SharedMutexWriteGuard to notify condvar when dropped (see this comment).

Got it, thanks for the explanation.

Part of near#7723 This PR replaces busy waiting loop with update notifications on every underlying map update. It introduces single condition variable which is notified on every map update. One considered alternative is to maintain separate condition variable per slot to have more granular notifications. In practice this doesn't make any difference since additional iterations for irrelevant keys wouldn't cause any noticeable performance overhead, but it results in more complex and harder to reason about code.

Initial implementation

f55cded

pugachAG requested a review from jakmeier December 13, 2022 13:49

pugachAG requested a review from a team as a code owner December 13, 2022 13:49

Minor updates

0627abd

nagisa reviewed Dec 15, 2022

View reviewed changes

pugachAG added 2 commits December 15, 2022 15:35

Avoid passing mutex guard to notify_slots_update

3b9b27a

Use explicit guard variable

446b32e

pugachAG requested a review from nagisa December 15, 2022 14:39

nagisa approved these changes Dec 15, 2022

View reviewed changes

jakmeier approved these changes Dec 16, 2022

View reviewed changes

Introduce Monitor

cebbd82

pugachAG requested review from nagisa and jakmeier December 19, 2022 12:10

pugachAG added 3 commits December 19, 2022 13:11

Remove tmp guard

53412e7

fmt

f861cd7

Reorder use

8e4a161

jakmeier approved these changes Dec 19, 2022

View reviewed changes

pugachAG added 3 commits December 19, 2022 17:49

from_micros -> from_millis

cb59478

Simplify test

2ec4546

Use prefetch_staging_area2 async

ceec97c

akhi3030 reviewed Dec 20, 2022

View reviewed changes

pugachAG added 2 commits December 22, 2022 09:38

Fix comment

965a75b

Fix another comment

04cd64c

pugachAG added the S-automerge label Dec 22, 2022

Merge branch 'master' into prefetcher-blocking-get

9fc336f

pugachAG mentioned this pull request Dec 22, 2022

trie prefetcher: cleanup blocking get #7723

Closed

near-bulldozer bot merged commit 915b08b into near:master Dec 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: clean up busy waiting in prefetcher blocking get #8215

refactor: clean up busy waiting in prefetcher blocking get #8215

pugachAG commented Dec 13, 2022

nagisa Dec 15, 2022

pugachAG Dec 15, 2022 •

edited

Loading

nagisa Dec 15, 2022

nagisa left a comment

jakmeier left a comment

pugachAG commented Dec 19, 2022

jakmeier left a comment

jakmeier Dec 19, 2022

nagisa Dec 19, 2022

pugachAG Dec 20, 2022 •

edited

Loading

nagisa Dec 20, 2022

akhi3030 Dec 20, 2022

pugachAG Dec 20, 2022

akhi3030 Dec 20, 2022

	std::thread::sleep(Duration::from_micros(1000));
	std::thread::sleep(Duration::from_millis(1));

refactor: clean up busy waiting in prefetcher blocking get #8215

refactor: clean up busy waiting in prefetcher blocking get #8215

Conversation

pugachAG commented Dec 13, 2022

Choose a reason for hiding this comment

pugachAG Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa left a comment

Choose a reason for hiding this comment

jakmeier left a comment

Choose a reason for hiding this comment

pugachAG commented Dec 19, 2022

jakmeier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pugachAG Dec 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pugachAG Dec 15, 2022 •

edited

Loading

pugachAG Dec 20, 2022 •

edited

Loading