-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3. fix(state): prevent watch channel deadlocks in the state #3870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
076ceb5
to
c536e0d
Compare
@Mergifyio refresh |
✅ Pull request refreshed |
Codecov Report
@@ Coverage Diff @@
## main #3870 +/- ##
==========================================
+ Coverage 78.68% 78.78% +0.09%
==========================================
Files 296 297 +1
Lines 33833 33879 +46
==========================================
+ Hits 26622 26690 +68
+ Misses 7211 7189 -22 |
Looks ok to me but I kinda would like @jvff to look at it. |
c536e0d
to
50fcb7d
Compare
…zed state" This reverts commit 8870944.
This avoids deadlocks, by holding the read lock for as short a time as possible.
This reduces memory usage.
50fcb7d
to
3240332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comments, since this is High 🔥 let's merge and hopefully @jvff can give a look if they are available
Motivation
This PR makes watch channel deadlocks in the state impossible, which avoids deadlocks. (And time-consuming investigations into deadlock causes.)
It also avoids some livelocks, and some performance issues. (Clones are cheap, lock contention is expensive.)
Specifications
If the watch receiver holds a read lock, then the sender tries to get the write lock, then another receiver tries to get another read lock, the watch channel can deadlock.
See the "potential deadlock example" in:
https://docs.rs/tokio/latest/tokio/sync/watch/struct.Receiver.html#method.borrow
Designs
The
WatchReceiver
wrapper clones the watched data before allowing access to it, then explicitly drops the read lock as soon as possible. This prevents:because the read lock is dropped immediately after the clone.
For more details, see the conversation at #3847 (comment)
Solution
Deadlock prevention:
Memory optimisations:
Ergonomics:
Testing
When testing syncing lightwalletd from a zebrad that is also syncing from peers, I saw pauses of up to 3 minutes. These pauses finished when lightwalletd gave up and exited. The pauses seem to be fixed by commit 3240332.
Review
This is part of an high-priority series of state refactor PRs, because they will cause merge conflicts with other PRs.
Reviewer Checklist
Follow Up Work