Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CassandraSinkCluster: Keep node list up to date via a tokio::watcher #831

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

rukai
Copy link
Member

@rukai rukai commented Sep 28, 2022

Closes #730

This PR ensures that all existing connections receive all changes to the topology tasks node list.
Previously a new connection had to be made to observe the changes to the node list.
It also simplies code in a few places that had to rely on loop+sleep to wait for changes to occur, now we can just await on the recv call.

renames:

  • topology_task_nodes -> nodes_rx - we are now using a receiver type so we use the standard rx naming, nodes_rx also pairs nicely to the nodes field to which it is copied.
  • local_nodes -> nodes - the local part no longer makes sense now that we are using a Receiver, additionally local could be confused with the local_shotover_node field.

Alternative approach

I originally attempted to implement this with an enum that gets sent from the topology task to the transforms providing granular updates:

pub enum NodeChange {
    NewList(Vec<CassandraNode>),
    Add(CassandraNode),
    Remove(SocketAddr),
    Status { address: SocketAddr, is_up: bool },
}

It allows us to avoid copying an entire node list in many cases.
However these cases are error cases and our performance doesnt have to be amazing there.
And it comes at the cost of a large increase to complexity in both the topology task side and the transform side.
While I believe we could write a correct implementation with this approach without too much trouble, it is still harder to reason about, harder to quickly verify correctness and harder to maintain.

Additionally the main cost would be having to recreate connections, and I've added logic to this PR to copy over the connections from the old list.

So in the end I decided it wasnt worth it and went with the much simpler approach in this PR.

Copy link
Member

@conorbros conorbros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but could we merge #824 first? Might be easier integrating this after. I'm happy with doing that work myself if I can avoid a merge conflict on #824

@rukai
Copy link
Member Author

rukai commented Sep 30, 2022

Being a smaller PR and already reviewed having one review IMO it does make sense to land this PR first.

But I'm not that fussed, if we can get #824 in within a reasonable time then sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate different a different lock for sharing Cassandra topology
3 participants