Nodes list is filtered by rack #749

rukai · 2022-08-18T04:02:53Z

As per our design each shotover instance should proxy to a single cassandra rack.
However we havent yet taken that into account in our implementation.

This PR adds a config field for the user to specify which rack the shotover instance is proxying to.
Then we use that configured rack to filter out any nodes outside of that rack.

benbromhead · 2022-08-22T06:07:27Z

My comment here would be to make this optional (filtering by rack) or a bias (prefer nodes in that rack, but if none are up, use the others).

There are a few reasons why we would want to allow shotover to talk to all nodes:

If shotover goes down in one AZ but C* is all 100% still up, the other shotover instances in other AZs can still route requests to all AZs
If nodes are down in an AZ, but the shotover instance is still up, it can route to other instances without returning an error to the client and forcing it to retry again (reducing latency)

Of course we do want to be able to make shotover keep most of the traffic in the same AZ, otherwise we could rack up more cross-AZ traffic costs than is strictly needed.

I might get the Instaclustr C* team to provide some thoughts as well.

rukai · 2022-08-22T06:38:48Z

If shotover goes down in one AZ but C* is all 100% still up, the other shotover instances in other AZs can still route requests to all AZs

I dont think the other AZ would be hit in that case. If each shotover instance always prefers nodes in its own AZ then it wont suddenly route outside of its AZ because the shotover proxying to that AZ went down.

However:

If nodes are down in an AZ, but the shotover instance is still up, it can route to other instances without returning an error to the client and forcing it to retry again (reducing latency)

Still makes sense.
So I'm happy to go ahead and rewrite accordingly, although that will quite possibly look like removing the routing changes from this PR so that they can be properly addressed later.

Just to confirm the hard filter on data_center should remain right?
That is how the datastax driver behaves and the reasoning is documented here: https://docs.datastax.com/en/developer/java-driver/4.2/manual/core/load_balancing/

rukai · 2022-08-22T23:43:40Z

We cant yet implement this new design becuase it needs to be implemented alongside token aware routing.
So i'm going to go ahead and close this PR, #744 can just introduce the rack config field without modifying the routing logic.

rukai requested review from conorbros and benbromhead August 18, 2022 05:12

conorbros approved these changes Aug 19, 2022

View reviewed changes

rukai mentioned this pull request Aug 19, 2022

CassandraSinkCluster rewrite system.local #744

Merged

Nodes list is filtered by rack

b71f45f

rukai force-pushed the filter_nodes_by_rack branch from 380f1f5 to b71f45f Compare August 22, 2022 06:46

rukai closed this Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes list is filtered by rack #749

Nodes list is filtered by rack #749

rukai commented Aug 18, 2022

benbromhead commented Aug 22, 2022

rukai commented Aug 22, 2022

rukai commented Aug 22, 2022

Nodes list is filtered by rack #749

Nodes list is filtered by rack #749

Conversation

rukai commented Aug 18, 2022

benbromhead commented Aug 22, 2022

rukai commented Aug 22, 2022

rukai commented Aug 22, 2022