Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes list is filtered by rack #749

Closed
wants to merge 1 commit into from

Conversation

rukai
Copy link
Member

@rukai rukai commented Aug 18, 2022

As per our design each shotover instance should proxy to a single cassandra rack.
However we havent yet taken that into account in our implementation.

This PR adds a config field for the user to specify which rack the shotover instance is proxying to.
Then we use that configured rack to filter out any nodes outside of that rack.

@benbromhead
Copy link
Member

My comment here would be to make this optional (filtering by rack) or a bias (prefer nodes in that rack, but if none are up, use the others).

There are a few reasons why we would want to allow shotover to talk to all nodes:

  • If shotover goes down in one AZ but C* is all 100% still up, the other shotover instances in other AZs can still route requests to all AZs
  • If nodes are down in an AZ, but the shotover instance is still up, it can route to other instances without returning an error to the client and forcing it to retry again (reducing latency)

Of course we do want to be able to make shotover keep most of the traffic in the same AZ, otherwise we could rack up more cross-AZ traffic costs than is strictly needed.

I might get the Instaclustr C* team to provide some thoughts as well.

@rukai
Copy link
Member Author

rukai commented Aug 22, 2022

If shotover goes down in one AZ but C* is all 100% still up, the other shotover instances in other AZs can still route requests to all AZs

I dont think the other AZ would be hit in that case. If each shotover instance always prefers nodes in its own AZ then it wont suddenly route outside of its AZ because the shotover proxying to that AZ went down.

However:

If nodes are down in an AZ, but the shotover instance is still up, it can route to other instances without returning an error to the client and forcing it to retry again (reducing latency)

Still makes sense.
So I'm happy to go ahead and rewrite accordingly, although that will quite possibly look like removing the routing changes from this PR so that they can be properly addressed later.

Just to confirm the hard filter on data_center should remain right?
That is how the datastax driver behaves and the reasoning is documented here: https://docs.datastax.com/en/developer/java-driver/4.2/manual/core/load_balancing/

@rukai
Copy link
Member Author

rukai commented Aug 22, 2022

We cant yet implement this new design becuase it needs to be implemented alongside token aware routing.
So i'm going to go ahead and close this PR, #744 can just introduce the rack config field without modifying the routing logic.

@rukai rukai closed this Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants