Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3037: client-go alternative services #3034

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

aojea
Copy link
Member

@aojea aojea commented Nov 3, 2021

  • One-line PR description: client-go capability to connect to multiple apiservers

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 3, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Nov 3, 2021
1103 12:30:59.066469 1558329 round_trippers.go:454] GET https://127.0.0.1:44267/api/v1/namespaces/default/pods?limit=500 200 OK in 1 milliseconds
I1103 12:30:59.066484 1558329 round_trippers.go:460] Response Headers:
I1103 12:30:59.066491 1558329 round_trippers.go:463] Cache-Control: no-cache, private
I1103 12:30:59.066502 1558329 round_trippers.go:463] Alt-Svc: h2="10.0.0.2:6443", h2="10.0.0.3:6443", h2="10.0.0.4:6443
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear how this will interact with readiness. the LB is checking every second or so, but the external clients will not. Because of this, it can fail over to a not-ready server (hard fail and restart perhaps), and the clients will be unaware.

Copy link
Member Author

@aojea aojea Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal is not about nines of availability, is about being able to recover on a network or connectivity problem, it only failover when a network problem is detected, the trade off is, as you say, that readiness is not completely followed. On another note, only ready endpoints are published, since the apiservers remove their endpoint when they are not ready, that guarantees that only ready apiservers are published.

Per example, the most critical problem this solves is when the connection goes stale, the only way to get out of the loop is because the http2 readIdleTimeout (after 30s IIRC) will raise an error because the ping frame that will cause the client to fail over.

@aojea aojea mentioned this pull request Nov 4, 2021
4 tasks
@aojea aojea changed the title [WIP] KEP-NNNN: client-go alternative services KEP-3037: client-go alternative services Nov 4, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2021
@aojea
Copy link
Member Author

aojea commented Nov 4, 2021

/assign @thockin @deads2k @lavalamp

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems plausible to me. Is there any notion of TTL for alt services?

@aojea
Copy link
Member Author

aojea commented Dec 22, 2021

Overall this seems plausible to me. Is there any notion of TTL for alt services?

indeed, there is
https://datatracker.ietf.org/doc/html/rfc7838#section-2.2
https://datatracker.ietf.org/doc/html/rfc7838#section-3.1

@thockin
Copy link
Member

thockin commented Dec 22, 2021 via email

@aojea
Copy link
Member Author

aojea commented Dec 22, 2021

Do we need to consider TTL here?

I've included it in my first prototype but then I didn't see the benefit:

@thockin
Copy link
Member

thockin commented Dec 22, 2021 via email

@thockin
Copy link
Member

thockin commented Dec 23, 2021

I am LGTM on this, but I defer to @deads2k and/or @lavalamp for approval

@lavalamp
Copy link
Member

2 ways to configure client side HA , which are useful at different times of cluster lifecycle:

If you can keep kubeconfigs up to date, there's not really a need for a dynamic cache of the same data; if you can't keep them up to date, then it will conflict with such a cache.

IMO putting all addresses in the config file only makes sense if the addresses are static and never change.

I don't think we can do this as designed, but if some design can address those concerns, maybe there'd be room for both static and dynamic options; I just would not expect both to be on at the same time in the same cluster.

@thockin
Copy link
Member

thockin commented Jan 25, 2022

IMO putting all addresses in the config file only makes sense if the addresses are static and never change.

If I can add to that: and if they are not static, you can't put aany of them in the config file. It seems dubious to assume that one of them is static.

The potential use-case I see here is elastic apiservers (growing and shrinking number or replicas) and no load-balancer. Is that really a situation that arises?

@aojea
Copy link
Member Author

aojea commented Jan 25, 2022

The potential use-case I see here is elastic apiservers (growing and shrinking number or replicas) and no load-balancer. Is that really a situation that arises?

yeah, I've tried to cover that case too, there are some projects that has this elastic control plane feature

@thockin
Copy link
Member

thockin commented Jan 26, 2022

I think multiple IPs in client-go is a relatively obvious thing to do.

I don't find alt-svc to be bad, just very very niche. I don't feel like my opinion (with sig-net hat on) should be on the same tier with @lavalamp though, since he is much more in tune with api-machinery.

@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

After reviewing the KEP, reading the opinions of other leads, and speaking with @aojea I think there are two least-common-denominator concerns with this KEP that are not feasible to address using this design.

  1. This design does not remove the need to have a load balancer, meta-server, or fixed IP of some kind that is always available to handle the first request. Since the first request still requires one of these, the overall operational complexity is increased, not constant or reduced.
  2. This particular design requires that the kube-apiserver either know which IPs a client can reach or allow the kube-apiserver to return IPs that may have not-kube-apiservers listening in a client's environment.

While some individuals have additional concerns, those two are commonly shared. On balance, we (apimachinery leads) think that this particular design is not worth the additional complexity. To provide a more complete history for future attempts, we'd like to merge this KEP in a rejected state with an explanation of why.

@aojea
Copy link
Member Author

aojea commented Jan 27, 2022

👍 fine with me,

@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 27, 2022
@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

fine with me,

Thanks.

/lgtm
/approve

@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

/hold

the verify failure looks real.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jan 27, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, deads2k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 27, 2022
@aojea
Copy link
Member Author

aojea commented Jan 27, 2022

/hold cancel
it should be fixed now

@k8s-ci-robot k8s-ci-robot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jan 27, 2022
@deads2k
Copy link
Contributor

deads2k commented Jan 27, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit 27aebeb into kubernetes:master Jan 27, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Jan 27, 2022
rikatz pushed a commit to rikatz/enhancements that referenced this pull request Feb 1, 2022
@aojea aojea mentioned this pull request Jul 15, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants