Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to make the monarch more thread safe. #11189
Attempt to make the monarch more thread safe. #11189
Changes from 2 commits
4eedfcc
825dfc3
91f1845
e386d6f
f765728
5f632f4
ee463aa
3aeaeed
2f0db9e
e1c2f53
80646b3
be2cdbe
d95629b
8fd2b75
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to sync the memory with
Monarch::_handleQuitAll
?Is there a reason
_quitting
is needed? Why would a listener care if it got notified during shutdown?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is necessary, but since each peasant calls this on close I don't want them waiting for the
unique_lock
until after the_handleQuitAll
finishes. This would also trigger _GetNumberOfPeasants through a couple of event hops and that would be a bunch of needless churn and lock contention.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW This makes me certain that we should create a thread safe proxy class for the Monarch in the future.
(Basically wrap the public methods of the class and make them thread-safe. The actual MonarchImpl class can then continue pretending threads don't exist.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something to say about the simplicity of just throwing a
std::unique_lock lock {_mutex}
at the top of everything. There is a minor concern about event handlers that call back up to the monarch but that is a solvable problem too. 🤷There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if you consider how we got event handlers, it's clear that recursive mutexes are the best solution here. You already have me entirely convinced about that. 😄
In either case I suppose that's something for a different time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using
til::shared_mutex<std::unordered_map<uint64_t, IPeasant>> _peasants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did originally consider it, but disregarded because the first iteration of this had a single
std::recursive_mutex _mutex
that was used for locking both_peasants
and_mruPeasants
. Of course the current iteration does not have that, and I was able to separate out (and downgrade to) astd::shared_mutex
.Once we are confident that this locking strategy is what we want I can take another look at using
til::shared_mutex
, although I don't look forward to updating all of the remoting tests that rely on friend access.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possibly I would like this refactor to be deferred for now. I took a shot at doing so, but it proved to be a bit more difficult than I hoped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a situation where this mutex is locked recursively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_getMostRecentPeasantID calls (through a couple hops) _clearOldMruEntries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right... That's unfortunate - recursive mutexes are really annoying (it uses critical sections brrrr).
I wonder if it would simplify the code if we just slapped a single mutex to the beginning of all public methods of this class and made all private methods non-thread-safe. It would get rid of the recursion and the atomics at least... The Monarch class has an extremely slow "throughput" (API calls per second) right?
Hmm... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In normal circumstances, yes. The problem is with loading / quitting where there are potentially "many" (O(#windows)) calls, and of course that is where I found these problems in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I', not terribly worried about the perf here, there's not really going to be that many calls coming through here per second, and they're pretty much all user-input driven. Plus, O(N) where N=number of windows? That's gonna be pretty small unless you're actively trying to spawn 100's of windows per second and close them quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take a look if there is an easy refactor to get rid of the recursive mutex for a shared_mutex, but I think it is still better to have the more granular locking in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without rewriting
_getMostRecentPeasantID
and_getPeasant
the easiest safe thing I can think of is to pass astd::unique_lock<...>*
to_getPeasant
and_clearOldMruEntries
as an argument and if that is null then lock in_clearOldMruEntries
. I'm not a huge fan of that approach because it is error prone, but functional.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made
_getPeasant
take an optionalclearMruPeasantOnFailure
boolean, which is set to false in_getMostRecentPeasantID
. I then updated_getMostRecentPeasantID
to handle its own cleanup, and switched the recursive mutex to a shared mutex.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spent more time trying to make sure we can't deadlock. I am ~90% confident, but have not proved, that the current version of the code cannot deadlock. Currently
_getMostRecentPeasantID
(and transitively anything that calls it) is the only method that attempts to hold both locks at the same time. There should not be the case where someone else holds a unique lock on_peasantsMutex
and is waiting to get a lock on_mruPeasantsMutex
which would deadlock with_getMostRecentPeasantID
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you hold this lock for the entire duration of the function you don't need to lock it again below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately wanted to separate the read only vs the rw portions of the lock to reduce contention if there was nothing to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock is largely uncontented and as such there shouldn't be any noticeable advantage to using read-only locking. Personally I'd just acquire an exclusive lock in all cases. But I don't mind the current code either.