You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a current annoyance when working with flowctl develop locally. For example, a SQLite or PostgreSQL database will be left in place with an old checkpoint in flow_checkpoints. Then the user removes their flowctl_develop directory and starts anew. Flow then recovers the checkpoint from flow_checkpoints, and attempts to read through its encoded offsets (which have no bearing now to forthcoming journal content).
If we encoded and verified the cluster ID, then we could fail fast with a much more informative error message.
The text was updated successfully, but these errors were encountered:
I looked into the cluster ID generation in etcd, and was surprised at how it works. The cluster id is generated just by hashing the sorted list of member ids. Each member id is generated by hashing the list of initial peer urls, along with the --initial-cluster-token argument, if one was provided. So the problem is actually the opposite of what I was worried about! The cluster id might not change when you startup a new etcd cluster, even if the data has been completely wiped out. The new cluster might use the same configuration, and will thus generate the same cluster id.
Given the unforeseen complexity there, I'm starting to feel like it's better to have gazette explicitly generate and store a nonce as a key-value in etcd. It can be guaranteed to be unique, and can also be stable even after restoring etcd from a backup. Maybe just use a uuid and store it at <etcd.prefix>/meta/clusterId?
Also, should we move this issue into the gazette repo? If this becomes a part of a consumer checkpoint, then maybe some gazette component should be responsible for validating it and returning an error if you try to pass a checkpoint with a non-matching cluster id.
Agreed, having an explicit key is making more sense.
As this is appearing Gazette focused, and isn't the most pressing concern, we could put it on ice until the Gazette feature branch is merged up ? Either way IMO.
Yeah, I'm leaning toward holding off on this one for a bit. I created gazette/core#292 so that the issue has visibility there. I'm leaning toward closing this issue in favor of gazette/core#292.
This is a current annoyance when working with
flowctl develop
locally. For example, a SQLite or PostgreSQL database will be left in place with an old checkpoint inflow_checkpoints
. Then the user removes theirflowctl_develop
directory and starts anew. Flow then recovers the checkpoint fromflow_checkpoints
, and attempts to read through its encoded offsets (which have no bearing now to forthcoming journal content).If we encoded and verified the cluster ID, then we could fail fast with a much more informative error message.
The text was updated successfully, but these errors were encountered: