Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue reconnecting to a cluster client or member. Fixes bug introduced in 1.7.1.alpha-0.4 build #2142

Merged

Conversation

benbenwilde
Copy link
Contributor

Description

In the 1.7.1.alpha-0.4 build, which fixed some edge conditions where thread saturation and lockup could occur, a bug was introduced where a connection was not properly disposed in EndpointManager and would be blocked forever, blocking any new connections to the same endpoint.

In our environment what we saw was a cluster client (run as client = true) would restart because of some unrelated issue, and then would reconnect and be able to send messages, but would never get anything back. (note this is not the client behind a firewall scenario where the server sends messages back on the same connection it's receiving, so in proto.remote world - it's called a server connection, while it's also considered a client in regards to the cluster).

Then I discovered that my changes to the EndpointManager had a bug, where a variable called endpoint (in OnEndpointTerminated) was getting overwritten with null, so the endpoint would never be disposed and it would never be unblocked.

The fix was simply changing an if to an else if. However this PR also adds some other logging around endpoint termination, a logging check when a connection is received, and a new line in ServerConnector which cancels the token for the connection (after it should already be closed) to ensure no read/write loops are left running and orphaned.

Purpose

This pull request is a:

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

@rogeralsing rogeralsing merged commit b01daea into asynkron:dev Nov 14, 2024
18 checks passed
@benbenwilde benbenwilde deleted the fix-outbound-client-connections branch November 14, 2024 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants