fix: eliminate deadlocks in the barrier node when delete is used #2181
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Modified two different parts of the barrier node to prevent any
deadlocks. There were two possible methods to deadlock with one of these
found to be hit by people using these nodes.
The first was when a point was received,
Point
would be called and amessage would be sent to reset the timer. If the message queue was
backed up, then it was possible for
emitBarrier
to be called and forit to try and forward a message to itself. But, the goroutine that was
reading the points from the edge was attempting to reset the timer so
the delete group message was never able to be consumed and the timer was
never able to be reset.
This first one was fixed by adding a buffer of size one to that channel
and making the reset non-blocking when the channel was full. This
guarantees that the message will be sent and prevents it from blocking
if the reset message is sent twice in quick succession.
The second is if the delete group message was received. This would call
Stop
. IfemitBarrier
was called in between having the delete groupmessage, then it could deadlock in the same way as the above because the
delete group message needs emit to be called and emit is blocked on
waiting for the idle handler to exit.
The stop method was also not thread safe and it was easy enough to make
it thread safe since it could potentially be called from a different
thread.
This is fixed by changing the stop channel to have a buffer of 1 and
making it so stopping was sending a goroutine to avoid a potential
double close of the goroutine. When the stop message is sent, it will be
done non-blocking to prevent a deadlock since only one message should be
received anyway. The call to
Stop()
in theidleHandler
is changed toa non-blocking version and does not wait for the goroutine to exit while
the other stop will wait for the goroutine to exit.
Fixes #2144.