Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vtadmin] Update vtctld dialer to validate connectivity #9915

Merged
merged 17 commits into from
Mar 22, 2022

Conversation

doeg
Copy link
Contributor

@doeg doeg commented Mar 18, 2022

Description

This fixes #9422: prior to this fix, vtadmin-api will hang on to a cached gRPC connection a vtctld even after the gRPC channel is shut down, and any subsequent vtadmin-api request that queries a vtctld (e.g., /api/schemas) will fail.

This change updates VTAdmin's vtctld proxy to be "self healing" when its gRPC connection is lost. The Dial function, which is called prior to any vtctld request, will now wait until its cached connection is READY. If this check fails (for example: the connection is in a SHUTDOWN state as mentioned above, or we exceed grpc-connectivity-timeout's worth of waiting), then the proxy will discover a different vtctld in that cluster and attempt to establish (and cache) a new gRPC connection.

I also added a bunch more logging in the Dial function... I've found the added verbosity to be useful, but let me know if I took it too far (lol).

FWIW we've had this branch running across Slack's Vitess deployments for the past few months and its worked well. And thank you @ajm188 for writing most of this with me back in... uh.... last July. :')

Reproduction steps

Some of this is noted in #9422, but I'll note it here for posterity anyway. (It's an interesting example of doing mildly nontrivial stuff with VTAdmin + the local example. 🤷)

  1. Parameterize the vtctld-up.sh script and VTAdmin's discovery.json file to make it a lil easier to run a second vtctld

    View diff
    diff --git a/examples/local/scripts/vtctld-up.sh b/examples/local/scripts/vtctld-up.sh
    index db6e544230..b81134c477 100755
    --- a/examples/local/scripts/vtctld-up.sh
    +++ b/examples/local/scripts/vtctld-up.sh
    @@ -18,8 +18,8 @@
    
    source ./env.sh
    
    -cell=${CELL:-'test'}
    -grpc_port=15999
    +grpc_port=${VTCTLD_GRPC_PORT:-'15999'}
    +web_port=${VTCTLD_WEB_PORT:-'15000'}
    
    echo "Starting vtctld..."
    # shellcheck disable=SC2086
    @@ -32,7 +32,7 @@ vtctld \
    --backup_storage_implementation file \
    --file_backup_storage_root $VTDATAROOT/backups \
    --log_dir $VTDATAROOT/tmp \
    - --port $vtctld_web_port \
    + --port $web_port \
    --durability_policy 'semi_sync' \
    --grpc_port $grpc_port \
    --pid_file $VTDATAROOT/tmp/vtctld.pid \
    diff --git a/examples/local/vtadmin/discovery.json b/examples/local/vtadmin/discovery.json
    index def7dd50f8..6b29f0077c 100644
    --- a/examples/local/vtadmin/discovery.json
    +++ b/examples/local/vtadmin/discovery.json
    @@ -5,6 +5,12 @@
                    "fqdn": "localhost:15000",
                    "hostname": "localhost:15999"
                }
    +        },
    +        {
    +            "host": {
    +                "fqdn": "localhost:16000",
    +                "hostname": "localhost:16999"
    +            }
            }
        ],
        "vtgates": [
  2. Start up a local cluster as usual, which will start up a single vtctld on http://localhost:15999: ./101_initial_cluster.sh

  3. Start a second vtctld on http://localhost:16999: VTCTLD_GRPC_PORT=16999 VTCTLD_WEB_PORT=16000 ./scripts/vtctld-up.sh

  4. Start up VTAdmin. (The usual way is ./scripts/vtadmin-up.sh, which will also start vtadmin-web.)

At this point, we can double check that VTAdmin can "discover" both vtctlds. (Scare quotes since "discovery", in this case, is simply reading from that discovery.json file.)

 $ curl "http://localhost:14200/api/vtctlds"

{"result":{"vtctlds":[{"hostname":"localhost:15999","cluster":{"id":"local","name":"local"},"FQDN":"localhost:15000"}]},"ok":true}

Now, since VTAdmin lazy-initializes its vtctld connections, we need to trigger a request that traverses the "discover -> dial -> cache" codepath:

# We don't really care about the output right now
curl "http://localhost:14200/api/schemas"

Examine VTAdmin's proxy.go logs to see which of the two local vtctlds it discovered + dialed; in this case, it's the vtcltd on http://localhost:16999.

I0318 12:46:54.444487   43118 config.go:122] [rbac]: loaded authorizer with 1 rules
I0318 12:46:54.444526   43118 config.go:146] [rbac]: no authenticator implementation specified
I0318 12:46:54.449496   43118 server.go:240] server vtadmin listening on :14200
I0318 12:49:37.128481   43118 vtsql.go:175] Dialing localhost:15991 ...
2022-03-18 12:49:37     INFO proxy.go:136] Discovering vtctld to dial...

2022-03-18 12:49:37     INFO proxy.go:156] Discovered vtctld localhost:16999; attempting to establish gRPC connection...

2022-03-18 12:49:37     INFO proxy.go:162] Established gRPC connection to vtctld localhost:16999; waiting to transition to READY...

2022-03-18 12:49:37     INFO proxy.go:175] Established gRPC connection to vtctld localhost:16999

2022-03-18 12:49:37     INFO proxy.go:113] Using cached connection to vtctld localhost:16999

Now, we are going to kill this vtctld. 😈

kill $(ps aux | grep vtctld | grep 16999 | awk '{print $2}')

For the sake of illustration, let's take brief diversion into 🐛 bug territory 🐛 and see what happens on the main branch after we kill the vtctld (without the WaitForReady fix, but keeping all the logging): the curl command will eventually time out, since the request never completes.

`curl "http://localhost:14200/api/schemas"`
Type 'dlv help' for list of commands.
I0318 12:56:29.962070   45022 config.go:122] [rbac]: loaded authorizer with 1 rules
I0318 12:56:29.962096   45022 config.go:146] [rbac]: no authenticator implementation specified
I0318 12:56:29.964926   45022 server.go:240] server vtadmin listening on :14200
I0318 12:56:42.636499   45022 vtsql.go:175] Dialing localhost:15991 ...
2022-03-18 12:56:42     INFO proxy.go:136] Discovering vtctld to dial...

2022-03-18 12:56:42     INFO proxy.go:156] Discovered vtctld localhost:16999; attempting to establish gRPC connection...

2022-03-18 12:56:42     INFO proxy.go:175] Established gRPC connection to vtctld localhost:16999

2022-03-18 12:56:42     INFO proxy.go:113] Using cached connection to vtctld localhost:16999

W0318 12:56:58.812701   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 12:56:59.815187   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 12:57:01.442810   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
I0318 12:57:02.719835   45022 vtsql.go:147] Have valid connection to localhost:15991, reusing it.
2022-03-18 12:57:02     INFO proxy.go:113] Using cached connection to vtctld localhost:16999

W0318 12:57:03.778020   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 12:57:08.292044   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 12:57:14.125623   45022 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...

And now for the fix! We can observe that VTAdmin is able to detect the SHUTDOWN connection and negotiate a new one:

2022-03-18 13:13:49     INFO proxy.go:127] Closing stale connection to vtctld localhost:16999

2022-03-18 13:13:49     INFO proxy.go:139] Discovering vtctld to dial...

2022-03-18 13:13:49     INFO proxy.go:159] Discovered vtctld localhost:16999; attempting to establish gRPC connection...

2022-03-18 13:13:49     INFO proxy.go:165] Established gRPC connection to vtctld localhost:16999; waiting to transition to READY...

2022-03-18 13:13:51     INFO proxy.go:174] Could not transition to READY state for gRPC connection to localhost:16999: failed to transition from state TRANSIENT_FAILURE

2022-03-18 13:13:55     INFO proxy.go:127] Closing stale connection to vtctld localhost:16999

I0318 13:13:55.704301   48623 log.go:255] error closing possibly-stale connection before re-dialing: %!w(*status.Error=&{0xc0001bc960})
2022-03-18 13:13:55     INFO proxy.go:139] Discovering vtctld to dial...

2022-03-18 13:13:55     INFO proxy.go:159] Discovered vtctld localhost:15999; attempting to establish gRPC connection...

2022-03-18 13:13:55     INFO proxy.go:165] Established gRPC connection to vtctld localhost:15999; waiting to transition to READY...

2022-03-18 13:13:55     INFO proxy.go:178] Established gRPC connection to vtctld localhost:15999

2022-03-18 13:13:55     INFO proxy.go:113] Using cached connection to vtctld localhost:15999

The above logs are especially interesting because they point out a shortcoming of a later enhancement for this change. On VTAdmin's first attempt to discover a vtctld, it rediscovers the one we just killed on http://localhost:16999. This is because we're using static file discovery, and so the vtctld is never removed from that discovery.json file after we kill it, so VTAdmin has a 50% chance of rediscovering it again. And, as mentioned below, Dial does not retry in this case (although one could imagine it doing so with an expontential backoff or similar), so the request fails:

curl "http://localhost:14200/api/schemas"                  
{"error":{"message":"failed to transition from state TRANSIENT_FAILURE","code":"unknown"},"ok":false}

The rest of the logs result from a subsequent curl "http://localhost:14200/api/schemas" which (you guessed it!) redials, rediscovers, and re-establishes a gRPC connection to the remaining, healthy vtcltd on http://localhost:15999.

A note on rejected alternatives

Using the connectivity API to introspect our gRPC connections is a little cumbersome and possibly error prone. (I have been known to write bugs and... gestures at next section on leaked connections.)

Ideally the go-grpc library would handle this for us and theoretically it can, however my understanding is that we'd use the Resolver interface as a service discovery integration point and then run something like a lookaside load balancer. This has its advantages (round robin discovery, "officially supported")... but it would also be a Whole Thing to rewrite our service discovery layer.

Another approach that was shared with me is initializing healthchecks on the connection you get back from Dial: https://github.com/grpc/grpc/blob/master/doc/health-checking.md. I haven't investigated this one (to be candid, since this branch works) but I'll note it here for posterity!

A note on leaked connections

This PR updates proxy.go to functionally ignore errors from closing the gRPC connection. There are definitely some... undesirable interactions between this retry logic and gRPC's internal retry logic.

When the gRPC connection is lost, _even if we call Close, gRPC's internal mechanisms will continue to retry for ~4 seconds. During this time, as far as I can tell, the connectivity API will show the connection flapping between CONNECTING and TRANSIENT_FAILURE.

During this period, any VTAdmin request that traverses the Dial codepath will first fail to transition, and then all subsequent calls will fail since Dial (prior to this branch) will return early on that "error closing possibly stale connection" error, until gRPC's internal retry times out:

~/workspace/vitess/examples/local 🍕 $  curl "http://localhost:14200/api/schemas"                  
{"error":{"message":"failed to transition from state TRANSIENT_FAILURE","code":"unknown"},"ok":false}%                                                      
~/workspace/vitess/examples/local 🍕 $  curl "http://localhost:14200/api/schemas"
{"error":{"message":"error closing possibly-stale connection before re-dialing: rpc error: code = Canceled desc = grpc: the client connection is closing","code":"unknown"},"ok":false}%                                                                                                                                
~/workspace/vitess/examples/local 🍕 $  curl "http://localhost:14200/api/schemas"
{"error":{"message":"error closing possibly-stale connection before re-dialing: rpc error: code = Canceled desc = grpc: the client connection is closing","code":"unknown"},"ok":false}%                                                                                                                                
~/workspace/vitess/examples/local 🍕 $  curl "http://localhost:14200/api/schemas"
{"error":{"message":"error closing possibly-stale connection before re-dialing: rpc error: code = Canceled desc = grpc: the client connection is closing","code":"unknown"},"ok":false}%                                                                                                                                
~/workspace/vitess/examples/local 🍕 $  curl "http://localhost:14200/api/schemas"
{"error":{"message":"error closing possibly-stale connection before re-dialing: rpc error: code = Canceled desc = grpc: the client connection is closing","code":"unknown"},"ok":false}%   

We can also see evidence of these retries in the logs as soon as the vtctld is killed:

Type 'dlv help' for list of commands.
I0318 13:33:43.183080   55452 config.go:122] [rbac]: loaded authorizer with 1 rules
I0318 13:33:43.183104   55452 config.go:146] [rbac]: no authenticator implementation specified
I0318 13:33:43.186368   55452 server.go:240] server vtadmin listening on :14200
I0318 13:34:51.526821   55452 vtsql.go:175] Dialing localhost:15991 ...
2022-03-18 13:34:51     INFO proxy.go:139] Discovering vtctld to dial...

2022-03-18 13:34:51     INFO proxy.go:159] Discovered vtctld localhost:16999; attempting to establish gRPC connection...

2022-03-18 13:34:51     INFO proxy.go:165] Established gRPC connection to vtctld localhost:16999; waiting to transition to READY...

2022-03-18 13:34:51     INFO proxy.go:178] Established gRPC connection to vtctld localhost:16999

2022-03-18 13:34:51     INFO proxy.go:113] Using cached connection to vtctld localhost:16999

W0318 13:34:56.481484   55452 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 13:34:57.484005   55452 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...
W0318 13:34:59.223662   55452 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {localhost:16999 localhost:16999 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp [::1]:16999: connect: connection refused". Reconnecting...

I did a bunch of digging into this a few months ago (which is part of the reason this branch has taken me forever 😭) and realize that we can likely disable gRPC's internal retries with some incantation of grpc.DialOptions. I remember in a past conversation with @ajm188 that configuring dial opts was more complicated than I anticipated, and... well, I'd propose addressing that in a separate PR. :')

My understanding of the "worst case" scenario, as noted, is that VTAdmin can possibly leak gRPC connections that are improperly closed. In most cases, I think (hope?) these connections would terminate themselves once their retry timeout is up. There is a chance, though, that the once-dead vtctld comes back while gRPC is internally retrying, even if VTAdmin's proxy has since established a connection to a different vtctld.

FWIW, we've been running this change in our environment for several months without any issues. And this change is an enhancement given that the current behaviour is to simply fail forever until the vtadmin process is restarted. :')

Related Issue(s)

Closes #9422

Checklist

  • Should this PR be backported? No
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

This PR introduces grpc-connectivity-timeout, a new per-cluster config option that sets the maxmium wait time to establish a gRPC connection between VTAdmin and the vtctld it queries in that region. The default value is 2 seconds.

doeg added 11 commits March 18, 2022 12:03
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
…estRedial

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
@doeg doeg marked this pull request as ready for review March 18, 2022 17:41
// Even if the client connection does not shut down cleanly, we don't want to block
// Dial from discovering a new vtctld. This makes VTAdmin's dialer more resilient,
// but, as a caveat, it _can_ potentially leak improperly-closed gRPC connections.
log.Errorf("error closing possibly-stale connection before re-dialing: %w", err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more context about the leaked connections thing in the PR description.

Copy link
Contributor

@ajm188 ajm188 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, a couple things to address (you'll also need to add labels to the PR)

go/vt/vtadmin/vtctldclient/proxy.go Outdated Show resolved Hide resolved
}
}

log.Infof("Discovering vtctld to dial...\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep these, or were they just to help test/debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a bunch more logging in the Dial function... I've found the added verbosity to be useful, but let me know if I took it too far (lol).

Up to you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed most of the additional log statements.

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
@doeg
Copy link
Contributor Author

doeg commented Mar 21, 2022

It looks like there was an actual regression in a couple of the unit tests. Will fix.

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
@doeg
Copy link
Contributor Author

doeg commented Mar 21, 2022

Two things to fix the regressions:

  • fef9e98 and afe15d9 move the 2 second ConnectivityTimeout default to a variable so it can be used in unit tests
  • 5c8d160 adds WaitForReady to the fakevtctldclient implementation since it was segfaulting. (I missed this when adding it to localvtctldclient 🤔)

All tests are passing now.

@ajm188 no rush, but would you mind taking another quick look at 0ae0c1a...afe15d9 before this is merged? I'd like to confirm there aren't any weird gotchas with using a var like that.


As an aside + mostly a note to self, I noticed a bunch of gRPC reconnect logs in the failed test output that I thought may be related to (or worsened by) this change, but it happens on the main branch too:

$ /usr/local/go/bin/go test -timeout 30s -run ^TestDial$ vitess.io/vitess/go/vt/vtadmin/vtctldclient -v -count=10
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
W0321 15:23:14.898187     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59824 127.0.0.1:59824 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59824: connect: connection refused". Reconnecting...
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
=== RUN   TestDial
--- PASS: TestDial (0.00s)
W0321 15:23:14.899421     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59837 127.0.0.1:59837 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59837: connect: connection refused". Reconnecting...
W0321 15:23:14.898731     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59828 127.0.0.1:59828 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59828: connect: connection refused". Reconnecting...
=== RUN   TestDial
W0321 15:23:14.898800     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59829 127.0.0.1:59829 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59829: connect: connection refused". Reconnecting...
W0321 15:23:14.898965     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59832 127.0.0.1:59832 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59832: connect: connection refused". Reconnecting...
W0321 15:23:14.899138     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59834 127.0.0.1:59834 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59834: connect: connection refused". Reconnecting...
W0321 15:23:14.899272     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59835 127.0.0.1:59835 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59835: connect: connection refused". Reconnecting...
--- PASS: TestDial (0.00s)
W0321 15:23:14.899590     412 component.go:41] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:59840 127.0.0.1:59840 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:59840: connect: connection refused". Reconnecting...
PASS
ok  	vitess.io/vitess/go/vt/vtadmin/vtctldclient	0.794s

I think this is another example of the thing I mentioned in the PR description given gRPCs internal retry mechanism + our current dial options... I can take a stab at fixing that over the next few weeks in a separate branch by making that dial option configurable. (Open to other suggestions, of course.)

}

var defaultConnectivityTimeout = 2 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your question around potential gotchas, this should be fine. You can also make this a const since I can't think of a reason we would ever modify the default at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change that. What's another 4 hours of CI runs between friends?

return listener, server, err
}

func TestDial(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you're seeing running just this test in isolation is super interesting, do you mind filing an issue and i can dig into it? i'm not really sure what's going, but it could be a "macs suck at local networking" issue, or something not completely correct in the code, but it's hard to say (and i don't think we should block this PR, which I still maintain is strictly an improvement over the current, uh, Situation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #9943

Thanks for taking a look. I'm super interested in what you find! I spent way too long looking at gRPC internals for this PR 😭

Signed-off-by: Sara Bee <855595+doeg@users.noreply.github.com>
@doeg doeg merged commit 1b59109 into vitessio:main Mar 22, 2022
@doeg doeg deleted the sarabee-vtadmin-vtctld-connectivity branch March 22, 2022 14:54
@ajm188 ajm188 mentioned this pull request Mar 30, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[vtadmin-api] vtctld proxy dialer should check that gRPC connection is ready
2 participants