Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

benbenwilde · 2024-11-14T09:03:28Z

Description

In the 1.7.1.alpha-0.4 build, which fixed some edge conditions where thread saturation and lockup could occur, a bug was introduced where a connection was not properly disposed in EndpointManager and would be blocked forever, blocking any new connections to the same endpoint.

In our environment what we saw was a cluster client (run as client = true) would restart because of some unrelated issue, and then would reconnect and be able to send messages, but would never get anything back. (note this is not the client behind a firewall scenario where the server sends messages back on the same connection it's receiving, so in proto.remote world - it's called a server connection, while it's also considered a client in regards to the cluster).

Then I discovered that my changes to the EndpointManager had a bug, where a variable called endpoint (in OnEndpointTerminated) was getting overwritten with null, so the endpoint would never be disposed and it would never be unblocked.

The fix was simply changing an if to an else if. However this PR also adds some other logging around endpoint termination, a logging check when a connection is received, and a new line in ServerConnector which cancels the token for the connection (after it should already be closed) to ensure no read/write loops are left running and orphaned.

Purpose

This pull request is a:

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

…ced in 1.7.1.alpha-0.4 build.

Fix issue reconnecting to a cluster client or member. Bug was introdu…

fe73a02

…ced in 1.7.1.alpha-0.4 build.

rogeralsing merged commit b01daea into asynkron:dev Nov 14, 2024
18 checks passed

benbenwilde deleted the fix-outbound-client-connections branch November 14, 2024 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

benbenwilde commented Nov 14, 2024

Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

Conversation

benbenwilde commented Nov 14, 2024

Description

Purpose

Checklist