Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node ID/datacenter snapshot fix #4872

Merged
merged 2 commits into from
Oct 31, 2018
Merged

Node ID/datacenter snapshot fix #4872

merged 2 commits into from
Oct 31, 2018

Conversation

kyhavlov
Copy link
Contributor

This adds the missing ID and datacenter fields to the snapshot persist logic.

Copy link
Contributor

@pearkes pearkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. May want to hold this from master for one more set of eyes given 1.4.0.

@pearkes
Copy link
Contributor

pearkes commented Oct 30, 2018

But we should try to get this in for sure.

@pearkes pearkes added this to the 1.4.0 milestone Oct 30, 2018
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed a conversation here but why is this the right thing to do? Was there a specific issue that this addresses?

It seems like those fields could have been elided from the snapshot on purpose so that you can restore a snapshot to a different DC without the state being really messed up (e.g. for DR). Storing Node IDs in state also means that if you terminate all your client VMs, restore a snapshot on servers and then restart clients they will all fail to register because they will have same name but a different ID. Same would apply to restoring a snapshot on a new datacenter which is being spun up as a replica for disaster recovery - currently that just works (presumably) because the restored server state doesn't have node IDs or datacenters so the new clients with same names in the new DC will just register and things will work.

Both of those seem like relatively dangerous/high impact changes to me to merge without and rationale etc.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see this potentially changing behaviour in the case where you restore a snapshot in a new cluster or DC, but in that case other Node info from the snapshot is probably wrong too (like IP address). It's also recoverable by just force leaving all the nodes and allowing them to rejoin.

I doubt given that, that it was intentional to not store this state since we also use these to bootstrap new replicas in the same cluster and this is corrupting their state which is clearly a bug.

@slackpad
Copy link
Contributor

I don't think these omissions were intentional :-) Both of these fields were added later and I think we just forgot to plumb them through. Given how anti-entropy works it would have just synced things back up, kind of masking this issue.

@kyhavlov kyhavlov merged commit 8337e3d into master Oct 31, 2018
@kyhavlov kyhavlov deleted the node-snapshot-fix branch October 31, 2018 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants