-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3037: client-go alternative services #3034
Conversation
aojea
commented
Nov 3, 2021
•
edited
Loading
edited
- One-line PR description: client-go capability to connect to multiple apiservers
- Issue link: client-go: alternative services #3037
- Other comments: Implementation [WIP] Alternative services kubernetes#106115
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
1103 12:30:59.066469 1558329 round_trippers.go:454] GET https://127.0.0.1:44267/api/v1/namespaces/default/pods?limit=500 200 OK in 1 milliseconds | ||
I1103 12:30:59.066484 1558329 round_trippers.go:460] Response Headers: | ||
I1103 12:30:59.066491 1558329 round_trippers.go:463] Cache-Control: no-cache, private | ||
I1103 12:30:59.066502 1558329 round_trippers.go:463] Alt-Svc: h2="10.0.0.2:6443", h2="10.0.0.3:6443", h2="10.0.0.4:6443 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear how this will interact with readiness. the LB is checking every second or so, but the external clients will not. Because of this, it can fail over to a not-ready server (hard fail and restart perhaps), and the clients will be unaware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the goal is not about nines of availability, is about being able to recover on a network or connectivity problem, it only failover when a network problem is detected, the trade off is, as you say, that readiness is not completely followed. On another note, only ready endpoints are published, since the apiservers remove their endpoint when they are not ready, that guarantees that only ready apiservers are published.
Per example, the most critical problem this solves is when the connection goes stale, the only way to get out of the loop is because the http2 readIdleTimeout (after 30s IIRC) will raise an error because the ping frame that will cause the client to fail over.
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/NNNN-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems plausible to me. Is there any notion of TTL for alt services?
keps/sig-api-machinery/3037-client-go-alternative-services/README.md
Outdated
Show resolved
Hide resolved
indeed, there is |
Do we need to consider TTL here?
…On Wed, Dec 22, 2021 at 12:37 AM Antonio Ojea ***@***.***> wrote:
Overall this seems plausible to me. Is there any notion of TTL for alt
services?
indeed, there is
https://datatracker.ietf.org/doc/html/rfc7838#section-2.2
https://datatracker.ietf.org/doc/html/rfc7838#section-3.1
—
Reply to this email directly, view it on GitHub
<#3034 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVAU6EI4BWSRF3HLVYDUSGE4RANCNFSM5HIYLO6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I've included it in my first prototype but then I didn't see the benefit:
|
Is it worth saying that in the KEP? :)
…On Wed, Dec 22, 2021 at 9:04 AM Antonio Ojea ***@***.***> wrote:
Do we need to consider TTL here?
I've included it in my first prototype but then I didn't see the benefit:
- the code is much more complicated with it, right now is less than
500 LOC including comments, ... ***@***.***
#diff-2b78b84e7e41ca070ae0d042da6105c1f886d9d8347269ababe21812c95c4af5
<kubernetes/kubernetes@8d8ae06#diff-2b78b84e7e41ca070ae0d042da6105c1f886d9d8347269ababe21812c95c4af5>
- kubernetes control-plane is not a high dynamic environment
- apiserver already has another mechanisms to warn clients using http
codes
- if the alt-service has gone you will have a connection error, the
client will expire that entry from the cache and will try another alt-svc
—
Reply to this email directly, view it on GitHub
<#3034 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVDGWFTJTR7IPHVLGO3USIAKXANCNFSM5HIYLO6Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
If you can keep kubeconfigs up to date, there's not really a need for a dynamic cache of the same data; if you can't keep them up to date, then it will conflict with such a cache. IMO putting all addresses in the config file only makes sense if the addresses are static and never change. I don't think we can do this as designed, but if some design can address those concerns, maybe there'd be room for both static and dynamic options; I just would not expect both to be on at the same time in the same cluster. |
If I can add to that: and if they are not static, you can't put aany of them in the config file. It seems dubious to assume that one of them is static. The potential use-case I see here is elastic apiservers (growing and shrinking number or replicas) and no load-balancer. Is that really a situation that arises? |
yeah, I've tried to cover that case too, there are some projects that has this elastic control plane feature |
I think multiple IPs in client-go is a relatively obvious thing to do. I don't find alt-svc to be bad, just very very niche. I don't feel like my opinion (with sig-net hat on) should be on the same tier with @lavalamp though, since he is much more in tune with api-machinery. |
After reviewing the KEP, reading the opinions of other leads, and speaking with @aojea I think there are two least-common-denominator concerns with this KEP that are not feasible to address using this design.
While some individuals have additional concerns, those two are commonly shared. On balance, we (apimachinery leads) think that this particular design is not worth the additional complexity. To provide a more complete history for future attempts, we'd like to merge this KEP in a |
👍 fine with me, |
keps/sig-api-machinery/3037-client-go-alternative-services/kep.yaml
Outdated
Show resolved
Hide resolved
/label tide/merge-method-squash |
Thanks. /lgtm |
/hold the verify failure looks real. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, deads2k The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5963a52
to
1afb0b4
Compare
/hold cancel |
/lgtm |