CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

rukai · 2022-09-28T04:57:16Z

Closes #730

This PR ensures that all existing connections receive all changes to the topology tasks node list.
Previously a new connection had to be made to observe the changes to the node list.
It also simplies code in a few places that had to rely on loop+sleep to wait for changes to occur, now we can just await on the recv call.

renames:

topology_task_nodes -> nodes_rx - we are now using a receiver type so we use the standard rx naming, nodes_rx also pairs nicely to the nodes field to which it is copied.
local_nodes -> nodes - the local part no longer makes sense now that we are using a Receiver, additionally local could be confused with the local_shotover_node field.

Alternative approach

I originally attempted to implement this with an enum that gets sent from the topology task to the transforms providing granular updates:

pub enum NodeChange {
    NewList(Vec<CassandraNode>),
    Add(CassandraNode),
    Remove(SocketAddr),
    Status { address: SocketAddr, is_up: bool },
}

It allows us to avoid copying an entire node list in many cases.
However these cases are error cases and our performance doesnt have to be amazing there.
And it comes at the cost of a large increase to complexity in both the topology task side and the transform side.
While I believe we could write a correct implementation with this approach without too much trouble, it is still harder to reason about, harder to quickly verify correctness and harder to maintain.

Additionally the main cost would be having to recreate connections, and I've added logic to this PR to copy over the connections from the old list.

So in the end I decided it wasnt worth it and went with the much simpler approach in this PR.

conorbros

LGTM but could we merge #824 first? Might be easier integrating this after. I'm happy with doing that work myself if I can avoid a merge conflict on #824

rukai · 2022-09-30T02:48:03Z

Being a smaller PR and already reviewed having one review IMO it does make sense to land this PR first.

But I'm not that fussed, if we can get #824 in within a reasonable time then sure.

rukai requested review from conorbros and benbromhead September 28, 2022 05:32

rukai mentioned this pull request Sep 29, 2022

Token aware routing #824

Merged

conorbros approved these changes Sep 30, 2022

View reviewed changes

benbromhead approved these changes Oct 5, 2022

View reviewed changes

CassandraSinkCluster: Keep node list up to date via a tokio::watcher

a6365ee

rukai force-pushed the share_nodes_list_by_watcher branch from 4b0c95f to a6365ee Compare October 5, 2022 05:11

rukai merged commit db4aca8 into shotover:main Oct 5, 2022

rukai mentioned this pull request Oct 5, 2022

CassandraSinkCluster: Disallow inserting/removing the nodes list from outside of NodePool #843

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

rukai commented Sep 28, 2022

conorbros left a comment

rukai commented Sep 30, 2022

CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

Conversation

rukai commented Sep 28, 2022

Alternative approach

conorbros left a comment

Choose a reason for hiding this comment

rukai commented Sep 30, 2022