-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p2p: Fix error logging for connection stop #3644
p2p: Fix error logging for connection stop #3644
Conversation
@@ -249,9 +261,6 @@ func (c *MConnection) FlushStop() { | |||
// so we dont race on calling sendSomePacketMsgs | |||
<-c.doneSendRoutine | |||
|
|||
// Send and flush all pending msgs. | |||
// By now, IsRunning == false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this comment because it is false. In my testing IsRunning
was still true - that is why I needed to add the additional signaling. Maybe a better fix is to identify why IsRunning
is still true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked closer and the reason is that FlushStop is outside of the Stop
base service routine - this the base does not yet know that it is being stopped.
Thoughts on this change? While cosmetic - it does help cleanup log output which is useful for node and validator operators that scrape logs for errors and anomalies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Needs a bit more careful thought on our side. Couple questions here first.
Also related to #3679
p2p/conn/connection.go
Outdated
return true | ||
default: | ||
} | ||
|
||
// inform the recvRouting that we are shutting down | ||
select { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be sent on here or can we just close it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be sent here to inform the receive routine that it needs to stop. The is no mechanism in go (that I have found) to receive socket events therefore you need to roll your own notifications to the go routine for receive that was setup previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can the channel just be closed here? Or even later? It's not immediately clear why we'd need to send on the channel, and then close it shortly there after (note close(ch)
has the effect of unblocking anything receiving on the channel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I did not know that about channels. I have removed this select that manually sent a bool in favor of closing the channel (already done lower in the same method). I have changed the channel to a struct{}
type since nothing is sent on it anymore.
} | ||
|
||
// stopServices was invoked and we are shutting down | ||
// receiving is excpected to fail since we will close the connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this happens when err != io.EOF ? What is the error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to handle graceful shutdown via stopServices()
. Go does not have a nice way to see that we have closed the socket - only when we go to read the socket and it fails do we need to see if we expected this failure or not. This logic queries the channel for the quit signal to notify us that we are expecting to exit and do not care about the error.
This changeset fixes two types of false-positive errors occurring during connection shutdown. The first occurs when the process invokes FlushStop() or Stop() on a connection. While the previous behavior did properly wait for the sendRoutine to finish, it did not notify the recvRoutine that the connection was shutting down. This would cause the recvRouting to receive and error when reading and log this error. The changeset fixes this by notifying the recvRoutine that the connection is shutting down. The second occurs when the connection is terminated (gracefully) by the other side. The recvRoutine would get an EOF error during the read, log it, and stop the connection with an error. The changeset detects EOF and gracefully shuts down the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good and isolated enough, that the root cause is with the layering of service abstractions is just another strong signal to fix peer lifecycle managment - another day.
👍 💃
@@ -90,6 +90,9 @@ type MConnection struct { | |||
quitSendRoutine chan struct{} | |||
doneSendRoutine chan struct{} | |||
|
|||
// Closing quitRecvRouting will cause the recvRouting to evnetually quit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evnetually -> eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Closing quitRecvRouting will cause the recvRouting to evnetually quit. | |
// Closing quitRecvRouting will cause the recvRouting to eventually quit. |
@@ -249,9 +255,6 @@ func (c *MConnection) FlushStop() { | |||
// so we dont race on calling sendSomePacketMsgs | |||
<-c.doneSendRoutine | |||
|
|||
// Send and flush all pending msgs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove // Send and flush all pending msgs.
? looks like we're still flushing the messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - I will put this back. The IsRunning
comment is the part that was wrong. I accidentally removed this when I should not have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks
@defunctzombie will you have time to fix #3644 (comment) and #3644 (comment)? @ebuchman ok to merge? |
Replaced with #3824 |
This changeset fixes two types of false-positive errors occurring during
connection shutdown.
The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.
The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.
I am not intimately familiar with idiomatic go practices so if there is a better approach for performing the signaling I did please let me know and I will update the PR. I used a channel buffer size of 1 to ensure the signal is received and non-blocking messaging to avoid breaking existing logic.