-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet don't share transport by default after #95427 #100849
Comments
/sig api-machinery |
/triage accepted |
how many connections between kubelet and kube apiserver on one node after PR #95427 in your case? it's mentioned it increases by five times, does that mean before the PR, there is one connection, after the PR, there are 5 or 6 connections? |
before the PR, there is one connection, after the PR, there 3 connections, |
Some increase was expected, since multiple REST clients constructed from a config with a custom dialer cannot safely share a transport, and client-go constructs a REST client for each API group/version accessed. Something like #97821 would be required to rework how client-go constructs clients to start sharing REST clients above the transport level between API groups/versions. I think this is working as intended until client-go client construction is reworked. |
do you know where in the code the 5 connections are created? are they all safe to share the transport? also are the 5 connections all active all the time? if not, IdleConnTimeout should cause the connection to be closed. |
transport will share when config.Transport not nil
if rotateCertificates is true, config.Transport will been set in kubeletcertificate.UpdateTransport kubernetes/cmd/kubelet/app/server.go Line 890 in b0abe89
but if rotateCertificates is false, config.Transport will not been set. |
/area kubelet @chenyw1990 @yliaog |
@gjkim42 I will work on this issue |
Hi @liggitt, if rotate-certificates=false kubelet customizes the dialer to provide a closeAllConns to close the connection when the connection is dead but not been closed. In #78016, there is a discuss about another solution
after #96778 client-go has HTTP 2.0 health check, so we don't need this change. So we can solve the current problem by reverting #78016. I have submitted #103149 to revert #78016, and this change requires your final confirmation. Thanks you. |
are we guaranteed kubelets always speak to kube-apiserver over http/2? if they use a load-balancer that is not http/2 capable, isn't the "close all connections" fix still required as a backstop? |
I thought http1.1 wasn't affected because it will always try to connect a new connection |
I'm not confident in that. The original issue dated back to before client-go defaulted to http/2, and we definitely saw issues with stuck TCP connections then. |
yeah, I could reproduce it with http1.1, however, it can be solved using the transport CloseIdleConnections method @liggitt please take a look #104844, the last commit needs more work, but I think that the 2 first commits exposeing the CloseIdleConnections() are useful |
ok, this seems to work #104844, but it will be nice to have more feedback |
Hi aojea, I have test #104844 on v1.21.1 after merge #104844, there is only 1 connection to kube-apiserver from kubelet so, #104844 can solve my issue. |
What happened:
In #95427, client-go don't share transport when c.Dial is not nil,
kubernetes/staging/src/k8s.io/client-go/transport/cache.go
Line 136 in b0abe89
but kubelet custorm Dial by default,
kubernetes/cmd/kubelet/app/server.go
Line 929 in b0abe89
After PR is integrated, the connections between kubelet and kube-apiserver in our cluster with 4000 nodes increases by five times.
What you expected to happen:
kubelet share transport by default, one kubelet only keep one connection to kube-apiserver.
@liggitt
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
start kubelet with rotate-certificates=false
Environment:
kubectl version
): v1.19.4cat /etc/os-release
):EulerOS 2.9uname -a
):The text was updated successfully, but these errors were encountered: