Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.x: Add optional fallback for ControlConnection#reconnect() #341

Merged

Conversation

Bouncheck
Copy link
Collaborator

@Bouncheck Bouncheck commented Sep 10, 2024

Adds additional config option that changes behavior of ControlConnection's reconnection. In case LBP returns an empty query plan we will attempt to connect to the original endpoints regardless of their last observed status.

The option is by default disabled and driver without enabling it should keep behaving the same way as before.
If enabled it should be possible to reconnect to the cluster if all endpoints containing hostnames were lost during normal operation and then cluster was replaced.

@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch 2 times, most recently from 5213e1a to 57245e1 Compare September 12, 2024 13:43
@Bouncheck Bouncheck self-assigned this Sep 12, 2024
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch 2 times, most recently from 3134093 to 1ebd326 Compare September 12, 2024 14:31
@Bouncheck Bouncheck changed the title Draft Add optional fallback for ControlConnection#reconnect() Sep 12, 2024
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch from 1ebd326 to 8b9cd3d Compare September 12, 2024 16:33
@Bouncheck Bouncheck marked this pull request as ready for review September 12, 2024 17:12
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch from 8b9cd3d to 88ed886 Compare September 12, 2024 17:14
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch from 88ed886 to c5bffec Compare September 12, 2024 18:24
@dkropachev
Copy link
Collaborator

dkropachev commented Sep 19, 2024

@Bouncheck , I have been thinking about this solution, more I think less I like it.
This solution to feed hosts to controll connection when it reconnects looks hacky to me.
I think driver should sync initial contact points to the list of nodes on node events (UP/DOWN/ADD/DEL).
Let's discuss it.

@dkropachev dkropachev changed the title Add optional fallback for ControlConnection#reconnect() 4.x: Add optional fallback for ControlConnection#reconnect() Sep 30, 2024
@dkropachev
Copy link
Collaborator

@Bouncheck , I see that you still use newQueryPlan at one particular place, can you please switch to newControlReconnectionQueryPlan

        Queue<Node> nodes = context.getLoadBalancingPolicyWrapper().newQueryPlan();

@dkropachev
Copy link
Collaborator

@Bouncheck , I have been thinking about this solution, more I think less I like it. This solution to feed hosts to controll connection when it reconnects looks hacky to me. I think driver should sync initial contact points to the list of nodes on node events (UP/DOWN/ADD/DEL). Let's discuss it.

Regarding this, I have tried to do it, unfortunately due to the API this fix will be extremely intrusive, let's go ahead with your solution.

@Bouncheck
Copy link
Collaborator Author

@Bouncheck , I see that you still use newQueryPlan at one particular place, can you please switch to newControlReconnectionQueryPlan

        Queue<Node> nodes = context.getLoadBalancingPolicyWrapper().newQueryPlan();

Yes, but it's during the initialization. I believe that part works fine.

@dkropachev
Copy link
Collaborator

@Bouncheck , I see that you still use newQueryPlan at one particular place, can you please switch to newControlReconnectionQueryPlan

        Queue<Node> nodes = context.getLoadBalancingPolicyWrapper().newQueryPlan();

Yes, but it's during the initialization. I believe that part works fine.

It works fine, but creates unnecessary confusion.

Adds a method for testing the issues that surface after cluster
replacements. Due to the variable, sometimes long runtime it is not added
to any of the test groups.
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch 2 times, most recently from b33da6c to be5f5f8 Compare October 8, 2024 11:41
@Bouncheck
Copy link
Collaborator Author

Bouncheck commented Oct 8, 2024

I reverted that 1 call in ControlConnection init back to newQueryPlan(). I forgot that removing it would require changing around 20 test methods which expect that exact api call

@Bouncheck
Copy link
Collaborator Author

I can adjust them, but I don't know if it's worth it

@dkropachev dkropachev force-pushed the scylla-4.x-unresolved-socket-issue-2 branch 2 times, most recently from 5e4ca90 to d2fcb14 Compare October 8, 2024 13:45
@dkropachev
Copy link
Collaborator

I can adjust them, but I don't know if it's worth it

Please check last commit and tell if you are ok with it.

Adds an experimental option to allow `ControlConnection` to try
reconnecting to the original contact points held by `MetadataManager`,
in case of getting empty query plan from the load balancing policy.

In order to separate this logic from query plans of other queries
`LoadBalancingPolicyWrapper#newControlReconnectionQueryPlan()` was introduced
and is called during reconnection in place of `newQueryPlan()`.
@Bouncheck Bouncheck force-pushed the scylla-4.x-unresolved-socket-issue-2 branch from e88d7f1 to 2b280a2 Compare October 8, 2024 14:05
@dkropachev dkropachev merged commit abe2811 into scylladb:scylla-4.x Oct 8, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants