-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent should re-resolve DNS names more aggressively #1192
Comments
I'd like 5 minutes as an upper bound, configurable would be best. The most important hunk here is ensuring that agents jitter themselves within that time to spread the load. https://github.com/grpc/grpc-go/blob/7c8e60372e19da88fb3fe8ac6a8de781eef7f547/naming/dns_resolver.go#L34 this is particularly annoying because it isn't configurable at all and as far as I know the only way to do what we want is to vendor that resolver object and register as a custom resolver (https://github.com/grpc/grpc-go/blob/master/resolver/resolver.go#L42) and then we'll have to modify this spire/cmd/spire-agent/cli/run/run.go Line 218 in e6efacc
Given that grpc target isn't exposed to the user via configuration its ok but it will make configuration subtler if we wanted to expose that in the future (e.g. if you wanted to support a custom DNS resolver. grpc supports this with a target like |
I did some brief research here when opening this issue, and it seemed like it would be possible to accomplish this by creating a Round Robin Balancer and explicitly passing in a resolver (which can be created by calling We'll have to put some thought into the jitter. DNS queries are relatively inexpensive hence my leaning towards minute or sub-minute. You really think that's too aggressive? One option is to choose a random number between 30 and 60 when instantiating the balancer :) |
I was being conservative, I'd also prefer something like a minute of sub-minute, that's why I said 5 minute upper bound :) |
Can I work on this? |
@bharaththiruveedula that would be great! I'll assign this to you - let me know if there's anything we can do to help |
excited for this one! |
I looked at this a bit more and posted a question upstream about it: grpc/grpc-go#3170. Response from one of the maintainers:
We'll need to think about the tradeoffs some (clients disconnecting regularly vs. polling dns regularly), but it seems like a reasonable direction to try. I do somewhat prefer using the lib's preferred APIs over vendoring a bunch of internal code.. |
I think this has been fixed in #1265 which will be released tomorrow. Going to close this issue for now |
The default gRPC re-resolution interval is 30 minutes. This is far too slow for most dynamic environments. Dial the resolution interval down to something much more aggressive, perhaps one minute or thirty seconds.
@ZymoticB any preferences on a reasonable value here?
Here are a couple spots that will probably be relevant to this change:
spire/pkg/agent/client/dial.go
Line 69 in 41eb0c7
https://github.com/grpc/grpc-go/blob/7c8e60372e19da88fb3fe8ac6a8de781eef7f547/naming/dns_resolver.go#L34
The text was updated successfully, but these errors were encountered: