Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtctld] migrate validator rpcs #8849

Merged
merged 8 commits into from
Sep 24, 2021
Merged

Conversation

ajm188
Copy link
Contributor

@ajm188 ajm188 commented Sep 20, 2021

Description

This PR migrates the Validate and Validate{Keyspace, Shard} rpcs to the new VtctldServer.

Examples
❯ vtctldclient --server ":15999" Validate
Validation complete; no issues found.
❯ vtctldclient --server ":15999" ValidateKeyspace commerce
Validation of commerce complete; no issues found.
❯ vtctldclient --server ":15999" ValidateShard commerce/0
Validation of commerce/0 complete; no issues found.
❯ vtctldclient --server ":15999" ValidateShard commerce/0 -p
Validation of commerce/0 complete; no issues found.
❯ ps aux | grep vttablet | grep -v grep | shuf -n 1
amason           47526   0.0  0.2  5055136  75312 s000  S     6:46AM   0:00.37 vttablet -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global -log_dir /Users/amason/work/vitess/vtdataroot/tmp -log_queries_to_file /Users/amason/work/vitess/vtdataroot/tmp/vttablet_0000000101_querylog.txt -tablet-path zone1-0000000101 -tablet_hostname  -init_keyspace commerce -init_shard 0 -init_tablet_type replica -health_check_interval 5s -enable_semi_sync -enable_replication_reporter -backup_storage_implementation file -file_backup_storage_root /Users/amason/work/vitess/vtdataroot/backups -restore_from_backup -port 15101 -grpc_port 16101 -service_map grpc-queryservice,grpc-tabletmanager,grpc-updatestream -pid_file /Users/amason/work/vitess/vtdataroot/vt_0000000101/vttablet.pid -vtctld_addr http://SFO-M-AMASON02:15000/
❯ kill -9 47526
❯ vtctldclient --server ":15999" ValidateShard commerce/0 -p
Validation results:
- Ping(zone1-0000000101) failed: rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing dial tcp [::1]:16101: connect: connection refused" tablet hostname: localhost
E0923 06:48:01.531351   48260 main.go:42] shard commerce/0 had validation issues; see above for details
❯ vtctldclient --server ":15999" Validate -p
Validation results:

For commerce:

For commerce/0:
- Ping(zone1-0000000101) failed: rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing dial tcp [::1]:16101: connect: connection refused" tablet hostname: localhost
E0923 06:48:46.939330   48287 main.go:42] some issues were found during validation; see above for details

Related Issue(s)

Closes #8847
Closes #8848

Checklist

  • Should this PR be backported? - No
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@ajm188
Copy link
Contributor Author

ajm188 commented Sep 20, 2021

unit tests and cli entrypoints to come tomorrowwwww

@ajm188 ajm188 force-pushed the am_migrate_validators branch 2 times, most recently from 94e7880 to 4269647 Compare September 23, 2021 10:51
@ajm188 ajm188 marked this pull request as ready for review September 23, 2021 11:03
Copy link
Contributor Author

@ajm188 ajm188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just calling out deviations inline; this one was not a pure porting of behavior because I saw some things that could be improved

span.Annotate("ping_tablets", req.PingTablets)

resp := vtctldatapb.ValidateResponse{}
getKeyspacesCtx, getKeyspacesCancel := context.WithTimeout(ctx, *topo.RemoteOperationTimeout)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change to the implementation is I bounded all the remote calls (both to the topo and to tablet tmc calls) with timeouts. If I didn't do this, I could kill a tablet, and the Validate call would hang until action_timeout, which defaults to 24 hours.

wg.Add(1)
go func() {
defer wg.Done()
validateAllTablets := func(ctx context.Context, keyspaces []string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another deviation from the original implementation: we used to make a call to GetKeyspaces in the outer function, and then the first thing this goroutine would do is ....... also make a call to GetKeyspaces, so, I just removed it and used the earlier result.

Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Signed-off-by: Andrew Mason <amason@slack-corp.com>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@deepthi deepthi merged commit 1e12b5b into vitessio:main Sep 24, 2021
@ajm188 ajm188 deleted the am_migrate_validators branch September 24, 2021 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Validate{Keyspace,Shard} rpcs to VtctldServer Migrate Validate rpc to VtctldServer
3 participants