Avoid cross-thread mutex conflicts #6396
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If a continuation does not have a mutex and it is being scheduled, the original logic will assign the mutex associated with current net handler thread. If this is a long-lived, global continuation, assigning a thread's net handler mutex will cause lock contention blocking other threads. It will also mean that it is probable that the nethandler lock will not be immediately acquired as in the scenario described in PR #5950.
This PR changes the logic to allow the continuation mutex to be null during scheduling, with the assumption that the plugin writer is dealing with locking directly in the plugin. It does issue a warning so the operations team can review logs for surprising unlocked, scheduled plugins. It also changes the lock acquisition in process_event to a weak lock to allow for the possibility of a null mutex.
We have been running with this change since after the Christmas break and it eliminated crashes due to the assert I added that was detecting high contention continuation locks (presumably due to use of nethandler mutexes on the continuation).