Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In the 1.7.1.alpha-0.4 build, which fixed some edge conditions where thread saturation and lockup could occur, a bug was introduced where a connection was not properly disposed in EndpointManager and would be blocked forever, blocking any new connections to the same endpoint.
In our environment what we saw was a cluster client (run as client = true) would restart because of some unrelated issue, and then would reconnect and be able to send messages, but would never get anything back. (note this is not the client behind a firewall scenario where the server sends messages back on the same connection it's receiving, so in proto.remote world - it's called a server connection, while it's also considered a client in regards to the cluster).
Then I discovered that my changes to the EndpointManager had a bug, where a variable called endpoint (in OnEndpointTerminated) was getting overwritten with null, so the endpoint would never be disposed and it would never be unblocked.
The fix was simply changing an
if
to anelse if
. However this PR also adds some other logging around endpoint termination, a logging check when a connection is received, and a new line in ServerConnector which cancels the token for the connection (after it should already be closed) to ensure no read/write loops are left running and orphaned.Purpose
This pull request is a:
Checklist