checkpoint: include and verify etcd cluster ID #192

jgraettinger · 2021-08-16T14:49:10Z

This is a current annoyance when working with flowctl develop locally. For example, a SQLite or PostgreSQL database will be left in place with an old checkpoint in flow_checkpoints. Then the user removes their flowctl_develop directory and starts anew. Flow then recovers the checkpoint from flow_checkpoints, and attempts to read through its encoded offsets (which have no bearing now to forthcoming journal content).

If we encoded and verified the cluster ID, then we could fail fast with a much more informative error message.

The text was updated successfully, but these errors were encountered:

psFried · 2021-08-17T22:57:34Z

I looked into the cluster ID generation in etcd, and was surprised at how it works. The cluster id is generated just by hashing the sorted list of member ids. Each member id is generated by hashing the list of initial peer urls, along with the --initial-cluster-token argument, if one was provided. So the problem is actually the opposite of what I was worried about! The cluster id might not change when you startup a new etcd cluster, even if the data has been completely wiped out. The new cluster might use the same configuration, and will thus generate the same cluster id.

Given the unforeseen complexity there, I'm starting to feel like it's better to have gazette explicitly generate and store a nonce as a key-value in etcd. It can be guaranteed to be unique, and can also be stable even after restoring etcd from a backup. Maybe just use a uuid and store it at <etcd.prefix>/meta/clusterId?
Also, should we move this issue into the gazette repo? If this becomes a part of a consumer checkpoint, then maybe some gazette component should be responsible for validating it and returning an error if you try to pass a checkpoint with a non-matching cluster id.

jgraettinger · 2021-08-18T02:03:30Z

Agreed, having an explicit key is making more sense.

As this is appearing Gazette focused, and isn't the most pressing concern, we could put it on ice until the Gazette feature branch is merged up ? Either way IMO.

psFried · 2021-08-18T13:41:03Z

Yeah, I'm leaning toward holding off on this one for a bit. I created gazette/core#292 so that the issue has visibility there. I'm leaning toward closing this issue in favor of gazette/core#292.

jgraettinger added the enhance New feature or enhancement with UX impact label Aug 16, 2021

jgraettinger assigned psFried Aug 17, 2021

psFried mentioned this issue Aug 18, 2021

Add a cluster id to consumer Checkpoint gazette/core#292

Open

psFried closed this as completed Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoint: include and verify etcd cluster ID #192

checkpoint: include and verify etcd cluster ID #192

jgraettinger commented Aug 16, 2021

psFried commented Aug 17, 2021

jgraettinger commented Aug 18, 2021 •

edited

Loading

psFried commented Aug 18, 2021

checkpoint: include and verify etcd cluster ID #192

checkpoint: include and verify etcd cluster ID #192

Comments

jgraettinger commented Aug 16, 2021

psFried commented Aug 17, 2021

jgraettinger commented Aug 18, 2021 • edited Loading

psFried commented Aug 18, 2021

jgraettinger commented Aug 18, 2021 •

edited

Loading