Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards compatible replication status to state transition #10167

Merged
merged 14 commits into from
Apr 29, 2022

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Apr 28, 2022

Description

It turns out that this was NOT fully backwards compatible: https://github.com/vitessio/vitess/pull/9853/files

Later ERS related upgrade/downgrade tests added in #10148 showed this. The problem was that a v14 vtctl (client: vtctlclient->vtctld or vtctl) would send an RPC to a v13 vttablet (server) and the vttablet would respond with the v13 ReplicationStatusResponse message that does not have the IoStatus or SqlStatus fields in it so the v14 vtctl would process that response and consider the IO and SQL statuses as Unknown and be unable to proceed with some actions (such as reparenting). In order to support a v14 vtctl with v13 vttablets, when processing the RPC response we need to check for an IO and SQL state of Unknown and revert to using the binary IO and SQL thread running values (the older tablets cannot tell us if the IO state was connecting).

In order to support correct RPC communication between v13 and v14 we must also leave the old fields in place so that it's a clean upgrade from v13 to v14 — and the v14 vtctl will process the older io_thread_running and sql_thread_running ReplicationStatusResponse fields — while adding the new fields for v14+ vttablets and clients to use. This way in v15+ we can remove the old fields and complete the transition for smooth upgrades.

Related Issue(s)

Checklist

It was NOT backwards compatible.

Signed-off-by: Matt Lord <mattalord@gmail.com>
We are only appending the last io_thread_connecting field

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord marked this pull request as ready for review April 29, 2022 04:00
mattlord and others added 3 commits April 29, 2022 02:11
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
… compatible

Signed-off-by: Manan Gupta <manan@planetscale.com>
… the upgrade

Signed-off-by: Manan Gupta <manan@planetscale.com>
Copy link
Member

@GuptaManan100 GuptaManan100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all Looks good now! Maybe it is better we get a review from @deepthi too before merging

Signed-off-by: Manan Gupta <manan@planetscale.com>
@mattlord
Copy link
Contributor Author

LGTM! Thanks for all the help, @GuptaManan100 ! ❤️

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
Apologies for missing the compatibility issue on the earlier PR.

@@ -91,6 +91,21 @@ func ReplicationStatusToProto(s ReplicationStatus) *replicationdatapb.Status {
SqlState: int32(s.SQLState),
LastSqlError: s.LastSQLError,
}

// We need to be able to send gRPC response messages from v14 and newer tablets to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@deepthi deepthi merged commit fbf574c into vitessio:main Apr 29, 2022
@deepthi deepthi deleted the repl_state_transition branch April 29, 2022 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants