-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topology_watcher: Allow tablets to reuse old tablet addresses. #5244
Conversation
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
hc.AddTablet(new, name) | ||
}() | ||
hc.deleteConn(old) | ||
hc.AddTablet(new, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice yeah this should hopefully eliminate a huge source of threading issues
I'll patch this in and will test it in our environment. It was very easy to reproduce the issue there. |
I can confirm this solves the issue in our environment but I'll let someone else more familiar with the code do a deeper review. |
@demmer explained that the reason However, it seems that now we don't close connections synchronously anyway. We just cancel the Context and move on. So I believe it should be safe to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be watertight. But it's a big improvement over what was there before.
// If it's the same tablet, something is wrong. | ||
if topoproto.TabletAliasEqual(th.latestTabletStats.Tablet.Alias, tablet.Alias) { | ||
hc.mu.Unlock() | ||
log.Warningf("refusing to add duplicate tablet %v for %v: %+v", name, tablet.Alias.Cell, tablet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should eventually become an acceptable state, which will allow us to "update if needed" kind of calls. But let the warning remain for now. It will allow us to find out if this really happens in the wild.
The other future improvement on this will be to add a timestamp that tracks when a tablet acquired the address to declare the true winner. Similar to how we're solving the mastership story.
Fixes #5229.
The unit test failed before I added the check in RemoveTablet that we're deleting the tablet we think we are.
I also removed the goroutines from
RemoveTablet
andReplaceTablet
because it's ludicrous that there was no guarantee whatsoever of what state you'll be in at the end ofloadTablets()
. Let me know if there was some reason we had tried to make those asynchronous.Signed-off-by: Anthony Yeh enisoc@planetscale.com