Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass Semi sync information from durability policy and use it to log discrepancies #9533

Merged
merged 32 commits into from
Jan 25, 2022

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Jan 19, 2022

Description

In release 13, we will introduce a new flag durability in vtctld and vtctl. That durability policy should be the one that reflects the semi-sync settings and we will eventually deprecate enable_semi_sync flag on the vttablets. But all of this cannot be accomplished in a single release because of upgrade downgrade considerations. So the plan is as follows -

  • Introduce the new flag durability
  • Pass the information all through the components and just document the discrepancies in configuration.
  • Keep using enable_semi_sync flag in v13, so the new flag is only used for logging but it is expected that vtctl, vtctld binaries of v13 will have the flag set.
  • In release 14, we would have the semi-sync information being passed everywhere, so we can deprecate enable_semi_sync and use the information coming from the durability policies.

This PR accomplishes this change of passing the information down and logging the differences for release 13.

Related Issue(s)

Checklist

Deployment Notes

…er calling a reparent operation

Signed-off-by: Manan Gupta <manan@planetscale.com>
…ameter

Signed-off-by: Manan Gupta <manan@planetscale.com>
…rences with enable_semi_sync flag

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…f enabled

Signed-off-by: Manan Gupta <manan@planetscale.com>
…parameter

Signed-off-by: Manan Gupta <manan@planetscale.com>
…arameter

Signed-off-by: Manan Gupta <manan@planetscale.com>
… a parameter

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…y-policy

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…y-policy

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…he changed type and not with the original

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100 GuptaManan100 marked this pull request as ready for review January 22, 2022 18:07
@GuptaManan100
Copy link
Member Author

Apart from the checks in this PR, I also made a change wherein we start using the durability policy information in fixSemiSync in #9552. That PR is 2 commits ahead of this PR -

  1. 1380e76 wherein we change fixSemiSyncReplication
  2. 39e6f09 which fixes some of the unit tests to set the correct durability policies and skips the backup tests.

On that PR, it can be observed that the only failing tests are from cluster 11, 16, 19, 20, 21, 26, docker_test_cluster_10 which all run Backup RPC and Upgrade Downgrade test for reparenting with old vtctl. Their are 2 failing tests, one TestSemiSyncSetupCorrectly and TestChangeTypeSemiSync which both test semi-sync settings. We expect these tests to fail since old vtclt does not have durability policy information and the new vttablets in that branch only fixSemiSync using the parameter passed down. The failure in tests concerned with Backup RPC is also expected, since we will no longer be fixing semiSync after a Backup operation succeeds. We will rely on vtorc to fix it. The main reason for this -

  1. After the durability policy changes take effect, a tablet might be sending semi sync acks to one primary but may be configured to not send semi sync acks to some other primary, and since Backup is a long operation during which the primary might have changed, we will have no idea whether we should be setting our semi-sync or not. So we will rely on vtorc to fix it when it becomes a compulsory component of Vitess. An alternate is to run StartReplication RPC after running Backup RPC, and that will figure out the semi sync settings.

@GuptaManan100 GuptaManan100 changed the title Semi sync durability policy Pass Semi sync information from durability policy and use it to log discrepancies Jan 22, 2022
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much quality review content for now, just a few naming suggestions.

go/vt/orchestrator/inst/tablet_dao.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/rpc_actions.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletmanager/rpc_actions.go Outdated Show resolved Hide resolved
go/cmd/vtworker/vtworker.go Outdated Show resolved Hide resolved
go/vt/worker/multi_split_diff.go Show resolved Hide resolved
Comment on lines +43138 to +43143

/** VEvent keyspace */
keyspace?: (string|null);

/** VEvent shard */
shard?: (string|null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doeg @ajm188 these fields were added to the proto definition some time ago, but the vtadmin files don't seem to have been regenerated at that time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can one of you post reminders in #maintainers and #developers so that people don't miss the step of generating vtadmin files?

Copy link
Contributor

@doeg doeg Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've posted another reminder to both channels. (This was also mentioned in #developers back at the beginning of December.) I expect this will (understandably) be an issue until we can enforce this in CI. The ticket for that is #9571.

Comment on lines +30869 to 30877
/**
* ApplySchemaRequest sql_mode.
* @member {string} sql_mode
* @memberof tabletmanagerdata.ApplySchemaRequest
* @instance
*/
ApplySchemaRequest.prototype.sql_mode = "";

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an artifact of a different PR as well.

@@ -35887,6 +35932,7 @@ $root.tabletmanagerdata = (function() {
* Properties of a StartReplicationRequest.
* @memberof tabletmanagerdata
* @interface IStartReplicationRequest
* @property {boolean|null} [semiSync] StartReplicationRequest semiSync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notfelineit do these changes affect the "destructive" tablet actions UI?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, not as it exists - should we add a semiSync/durability option in the UI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have to. Do you have a screenshot handy of what this looks like on the UI today?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2022-01-27 at 9 45 23 AM

Yes! It follows vtctld2's ui in that there is a singular button

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this directly calls the TabletManager grpc API instead of going through vtctld API.
My suggestion is to add a new vtctld API. Then semi_sync won't need to be provided by the user, it can be computed by VtctldServer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check today and see which it is using - and switch to vtctld if necesary.

Copy link
Contributor

@notfelineit notfelineit Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepthi vtadmin uses this codepath: https://cs.github.com/vitessio/vitess/blob/f906693c6441f4b41d8f842ffb9dba85258b5158/go/vt/vtctl/grpcvtctldserver/server.go?q=StartReplication#L2439 it does call the tablet manager API, but I think this is the vtctld API mentioned? it calls IsReplicaSemiSync to provide semi-sync setting.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.
Please address all feedback and then this can be merged.

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants