Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabletManager: Check for context cancellation in loop within ChangeTabletType() #9842

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions go/vt/vttablet/tabletmanager/tm_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@ func (ts *tmState) ChangeTabletType(ctx context.Context, tabletType topodatapb.T
// We must read the data again and verify whether the previous write succeeded or not.
// The only way to guarantee safety is to keep retrying read until we succeed
for {
if ctx.Err() != nil {
return ctx.Err()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had added the infinite loop intentionally, because we enter that code only if a topo write has failed. Once a topo write fails, the only way forward is to keep trying to read the topo server, because the information may or may not have been written.
However, I do understand the use case where they want to retry doing the same operation again, so we need to exit sometime and not wait indefinitely.

I just think we should wrap the returned error and explicitly say that the topo information might be different from the tablet information and that the user must call ChangeTabletType again to fix it.
Other than that, this looks good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think we should wrap the returned error and explicitly say that the topo information might be different from the tablet information and that the user must call ChangeTabletType again to fix it.

Suggestions for the error message? return fmt.Errorf("context canceled updating tablet_type for %s in the topo, please retry", ts.tm.tabletAlias) ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that will work!

}
ti, errInReading := ts.tm.TopoServer.GetTablet(ctx, ts.tm.tabletAlias)
if errInReading != nil {
<-time.After(100 * time.Millisecond)
Expand Down