-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server-side goroutine leak on receive message error #142
Comments
#141 is an attempt to resolve by stopping goroutine A on any receive error since goroutine B stops regardless of the error message it sends to A. |
What is the error you saw? |
I am seeing this error message in my shim logs.
I do not believe ttrpc is the root cause, but as I was exploring the branch of code after message read fails I found the issue above. |
I think it first makes sense to get a change in to fix the unwrapping, that is an obvious bug to fix. The larger change of always returning needs a little more consideration, there could be errors we don't want to immediately return on (such as context cancellation). |
Okay yeah that would be the safer route. Also I'd like to spend some time and brainstorm how to write some unit tests for this edge case. Let me update the PR for just the unwrapping and go from there. |
Hey folks, while tracking down a production error I inadvertently found a rare error in the receive message logic for server connections.
To set some context the receive message logic relies on two goroutines working together.
goroutine A: https://github.com/containerd/ttrpc/blob/main/server.go#L136
goroutine B: https://github.com/containerd/ttrpc/blob/main/server.go#L369-L487
A [built-in Go] error channel is used for communication of errors between goroutines A and B.
https://github.com/containerd/ttrpc/blob/main/server.go#L339
The first thing to note is on message header read error the channel returns the underlying error to goroutine B. Based on the current implementation, these errors can only be from
io.ReadFull
.io.ReadFull
can returnio.EOF
orio.ErrUnexpectedEOF
. Reference: https://pkg.go.dev/io#ReadFullhttps://github.com/containerd/ttrpc/blob/main/channel.go#L73-L76
This is fine because goroutine A has error handling to check for these errors and stop execution if they occur.
https://github.com/containerd/ttrpc/blob/main/server.go#L545-L555
The issue begins when considering errors on message read by goroutine B. In this branch, goroutine B has successfully read a message header for a message with some length > 0; however, when reading the message an error has occurred by
io.ReadFull
eitherio.EOF
orio.ErrUnexpectedEOF
.What I have found is for these errors we wrap the error with more information to clarify the error (opposed to on message header read) occurred on message read.
https://github.com/containerd/ttrpc/blob/main/channel.go#L138
However, goroutine A's error handling logic is not currently setup to check for wrapped errors.
https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554
So the effect is goroutine B is stopped due to a [non-gRPC] receive error, but goroutine A is never stopped resulting in leaked goroutine and server connection.
Stop goroutine B on receive error: https://github.com/containerd/ttrpc/blob/main/server.go#L380-L385
Again (currently) only stop goroutine A on some receive errors: https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554
The text was updated successfully, but these errors were encountered: