Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkpoint: include and verify etcd cluster ID #192

Closed
jgraettinger opened this issue Aug 16, 2021 · 3 comments
Closed

checkpoint: include and verify etcd cluster ID #192

jgraettinger opened this issue Aug 16, 2021 · 3 comments
Assignees
Labels
enhance New feature or enhancement with UX impact

Comments

@jgraettinger
Copy link
Member

This is a current annoyance when working with flowctl develop locally. For example, a SQLite or PostgreSQL database will be left in place with an old checkpoint in flow_checkpoints. Then the user removes their flowctl_develop directory and starts anew. Flow then recovers the checkpoint from flow_checkpoints, and attempts to read through its encoded offsets (which have no bearing now to forthcoming journal content).

If we encoded and verified the cluster ID, then we could fail fast with a much more informative error message.

@jgraettinger jgraettinger added the enhance New feature or enhancement with UX impact label Aug 16, 2021
@psFried
Copy link
Member

psFried commented Aug 17, 2021

I looked into the cluster ID generation in etcd, and was surprised at how it works. The cluster id is generated just by hashing the sorted list of member ids. Each member id is generated by hashing the list of initial peer urls, along with the --initial-cluster-token argument, if one was provided. So the problem is actually the opposite of what I was worried about! The cluster id might not change when you startup a new etcd cluster, even if the data has been completely wiped out. The new cluster might use the same configuration, and will thus generate the same cluster id.

Given the unforeseen complexity there, I'm starting to feel like it's better to have gazette explicitly generate and store a nonce as a key-value in etcd. It can be guaranteed to be unique, and can also be stable even after restoring etcd from a backup. Maybe just use a uuid and store it at <etcd.prefix>/meta/clusterId?
Also, should we move this issue into the gazette repo? If this becomes a part of a consumer checkpoint, then maybe some gazette component should be responsible for validating it and returning an error if you try to pass a checkpoint with a non-matching cluster id.

@jgraettinger
Copy link
Member Author

jgraettinger commented Aug 18, 2021

Agreed, having an explicit key is making more sense.

As this is appearing Gazette focused, and isn't the most pressing concern, we could put it on ice until the Gazette feature branch is merged up ? Either way IMO.

@psFried
Copy link
Member

psFried commented Aug 18, 2021

Yeah, I'm leaning toward holding off on this one for a bit. I created gazette/core#292 so that the issue has visibility there. I'm leaning toward closing this issue in favor of gazette/core#292.

@psFried psFried closed this as completed Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhance New feature or enhancement with UX impact
Projects
None yet
Development

No branches or pull requests

2 participants