Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: Fix error logging for connection stop #3644

Closed
wants to merge 4 commits into from
Closed

p2p: Fix error logging for connection stop #3644

wants to merge 4 commits into from

Conversation

defunctzombie
Copy link
Contributor

This changeset fixes two types of false-positive errors occurring during
connection shutdown.

The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.

The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.


I am not intimately familiar with idiomatic go practices so if there is a better approach for performing the signaling I did please let me know and I will update the PR. I used a channel buffer size of 1 to ensure the signal is received and non-blocking messaging to avoid breaking existing logic.

  • Updated all relevant documentation in docs (There is no documentation for this)
  • Updated all code comments where relevant
  • Wrote tests (The tests would need to check log output - I could not find a good example of how to do this).
  • Updated CHANGELOG_PENDING.md

p2p/conn/connection.go Outdated Show resolved Hide resolved
@@ -249,9 +261,6 @@ func (c *MConnection) FlushStop() {
// so we dont race on calling sendSomePacketMsgs
<-c.doneSendRoutine

// Send and flush all pending msgs.
// By now, IsRunning == false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this comment because it is false. In my testing IsRunning was still true - that is why I needed to add the additional signaling. Maybe a better fix is to identify why IsRunning is still true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked closer and the reason is that FlushStop is outside of the Stop base service routine - this the base does not yet know that it is being stopped.

@defunctzombie
Copy link
Contributor Author

Thoughts on this change? While cosmetic - it does help cleanup log output which is useful for node and validator operators that scrape logs for errors and anomalies.

Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Needs a bit more careful thought on our side. Couple questions here first.

Also related to #3679

return true
default:
}

// inform the recvRouting that we are shutting down
select {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be sent on here or can we just close it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be sent here to inform the receive routine that it needs to stop. The is no mechanism in go (that I have found) to receive socket events therefore you need to roll your own notifications to the go routine for receive that was setup previously.

Copy link
Contributor

@ebuchman ebuchman May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can the channel just be closed here? Or even later? It's not immediately clear why we'd need to send on the channel, and then close it shortly there after (note close(ch) has the effect of unblocking anything receiving on the channel).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I did not know that about channels. I have removed this select that manually sent a bool in favor of closing the channel (already done lower in the same method). I have changed the channel to a struct{} type since nothing is sent on it anymore.

}

// stopServices was invoked and we are shutting down
// receiving is excpected to fail since we will close the connection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this happens when err != io.EOF ? What is the error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle graceful shutdown via stopServices(). Go does not have a nice way to see that we have closed the socket - only when we go to read the socket and it fails do we need to see if we expected this failure or not. This logic queries the channel for the quit signal to notify us that we are expecting to exit and do not care about the error.

p2p/conn/connection.go Outdated Show resolved Hide resolved
This changeset fixes two types of false-positive errors occurring during
connection shutdown.

The first occurs when the process invokes FlushStop() or Stop() on a
connection. While the previous behavior did properly wait for the sendRoutine
to finish, it did not notify the recvRoutine that the connection was shutting
down. This would cause the recvRouting to receive and error when reading and
log this error. The changeset fixes this by notifying the recvRoutine that
the connection is shutting down.

The second occurs when the connection is terminated (gracefully) by the other side.
The recvRoutine would get an EOF error during the read, log it, and stop the connection
with an error. The changeset detects EOF and gracefully shuts down the connection.
Copy link
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good and isolated enough, that the root cause is with the layering of service abstractions is just another strong signal to fix peer lifecycle managment - another day.

👍 :shipit: :octocat: 💃

@@ -90,6 +90,9 @@ type MConnection struct {
quitSendRoutine chan struct{}
doneSendRoutine chan struct{}

// Closing quitRecvRouting will cause the recvRouting to evnetually quit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evnetually -> eventually

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Closing quitRecvRouting will cause the recvRouting to evnetually quit.
// Closing quitRecvRouting will cause the recvRouting to eventually quit.

@@ -249,9 +255,6 @@ func (c *MConnection) FlushStop() {
// so we dont race on calling sendSomePacketMsgs
<-c.doneSendRoutine

// Send and flush all pending msgs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove // Send and flush all pending msgs.? looks like we're still flushing the messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I will put this back. The IsRunning comment is the part that was wrong. I accidentally removed this when I should not have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks

@xla xla changed the title p2p: fix false-positive error logging when stopping connections p2p: Fix error logging for connection stop Jun 6, 2019
@ebuchman ebuchman added the S:waiting Status: Waiting for response label Jun 21, 2019
@tac0turtle tac0turtle changed the base branch from develop to master June 26, 2019 15:40
@melekes
Copy link
Contributor

melekes commented Jul 4, 2019

@defunctzombie will you have time to fix #3644 (comment) and #3644 (comment)?

@ebuchman ok to merge?

@melekes melekes closed this Jul 22, 2019
@melekes
Copy link
Contributor

melekes commented Jul 22, 2019

Replaced with #3824

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S:waiting Status: Waiting for response
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants