-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop writing to closed channel panic #6763
stop writing to closed channel panic #6763
Conversation
turbo/engineapi/request_list.go
Outdated
@@ -154,7 +154,11 @@ func (rl *RequestList) updateWaiting(val uint32) { | |||
if val == 1 { | |||
// something might be waiting to be notified of the waiting state being ready | |||
for _, c := range rl.waiters { | |||
c <- struct{}{} | |||
// ensure the channel is still open before writing to it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I did not actually found the place where these channels are closed, I was looking to see if closing happens under the same lock...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
channels are thread-safe. Mutex will not protect against sending to closed channel. Need wait until all producers/consumers shutdown the close (or close by producer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced the waiters here so can confirm they are only closed in the stageLoopIsBusy
func in ethbackend.go
. This scenario could occur when the wait in stageLoopIsBusy
times out after a second returning to the caller and the channel is closed before the loop over the waiters in request_list.go
is triggered. I didn't anticipate this originally and tests / hive were happy with the change.... until I ran up a node and spotted the panics.
This seems the cleanest solution. Although I did consider returning a new channel from the call to WaitForWaiting
so the channel creation and closing is the sole responsibility of the code in request_list.go
. The stageLoopIsBusy
func could just wait on that channel being closed as a signal, so no code would actually be writing to the channel in the first place. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The stageLoopIsBusy func could just wait on that channel being closed as a signal, so no code would actually be writing to the channel in the first place" - now I understand general idea.
My thoughts:
- "create and close channel by producer" is general pattern to avoid "write to closed channel" problem (now you close channel by consumer).
WaitForWaiting
doesn't need aquier write mutex in the beginning of func, because mutex protect onlyrl.waiters
field, all other logic don't need mutex. You can creategetter/setter
methods forrl.waiters
field and only this methods will use mutex. Because "lock mutex and then write to channel" is bad pattern - because "write to channel" has undetermenistic time and "lock mutex for as-short-as-possible" is generally-good idea.make(chan struct{})
for notifications better use buffered channels likemake(chan struct{}, 1)
then notifier can notify all subscribers without waiting for subscribers. Also when use buffered channels for notifications with 'at-least one' guranties, can use non-blockingselect with default
because "if channel already full, then no difference - send there notification or drop it, subscribed will be notified 'at-least once' ".- Maybe
WaitForWaiting
can be implemented as loop withsyncCond.Wait
? Then it will be hard to time-out such wait. And actually this is your main problem:syncCond.Wait
is not compatible with channels (with Select) in Go. So, maybe think about "not using syncCond".
Other:
5. "The stageLoopIsBusy func could just wait ..." - problem sounds similar to "send request to server, wait for response, but limit server-side work by timeout". In go we do solve this problem by context.WithTimeout()
. I don't have clear advise here, just it sounds similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Thanks for the tip, I always had in mind that the code that creates a channel should be responsible for closing it. But the distinction between producer and consumer makes it clearer. In this case should we just close the channel from the request_list side (producer) as we're just waiting on the signal, and it will always be to a single consumer?
-
Great idea, I'll get the implemented!
-
Great advice on using a buffered channel, never considered the select with default pattern there to avoid waiting.
-
I tried to use sync.Cond when first attempting this, and ended up with deadlocks as the call to Broadcast could happen before the call to Wait, which then stops any processing happening. Also the timeout was really difficult to juggle as you mentioned. I went to a channel based implementation after finding this.
-
I think normally this would make a lot of sense, just wait on the context. But here we're just checking if some other process is busy or not, we haven't made a request to it, we're just checking its status, and want to make sure we wait a little while before confirming that it is ready or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For example WaitForWaiting can create and return channel or channel + bool (for optimization - when don’t need wait).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Or accept timeout as parameter and just block inside without returning channel. In this case can accept timeout as time or as context object (then it also may be canceled by Ctrl+C).
Also I advise use |
Thanks for the pointers @AlexeyAkhunov & @AskAlexSharov. I'm modifying the code to use |
d269ed4
to
75322e1
Compare
edf824e
to
8f5fd0d
Compare
8f5fd0d
to
3c2dad4
Compare
I've pushed another implementation for this that works well locally and hive tests are happily passing as well. All waiting, and channel creation/closing is now within request_list.go. There a number of locks that might seem odd at first, but I need to take a lock on the |
turbo/engineapi/request_list.go
Outdated
isWaiting := rl.waiting.Load() | ||
if isWaiting == 1 { | ||
// we are already waiting so just return | ||
rl.waiterMtx.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I advise to unlock mutex by defer
, otherwise panic()
may cause deadlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also: why need mutex lock here? rl.waiting
is atomic already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been the problem I'm facing, if I check the state of waiting it can be changed immediately afterwards whilst the waiter is created and it will never be notified. So we end up waiting the full timeout time when we could have waited just 5ms or something. The lock here is to ensure that whilst the state of rl.waiting
is being checked that it can't be updated until we've returned from this function or that the waiter is in place to monitor for the update
turbo/engineapi/request_list.go
Outdated
} | ||
}() | ||
rl.waiterMtx.Unlock() | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems can replace this waitGroup and goroutine by select on 2 channels right here.
turbo/engineapi/request_list.go
Outdated
} | ||
rl.waiters = make([]chan struct{}, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not thread-safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function is only ever called within a locked context, but it's not all that clear that this is the case. You mentioned the other day about only public functions managing mutex locks to avoid double locking scenarios. Is this still desired? If not I can move the locks closer to this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it's ok
3c2dad4
to
6726732
Compare
No description provided.