-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm should not re-use bind-address for api-server while using experimental-control-plane #1348
Comments
hello and thank you for the report.
FYI, the config had an v1alpha3 before v1beta1.
i don't think there is a way for kubeadm (and @fabriziopandini @detiber |
something else to note here. the purpose of the field |
@neolit123 that's a VIP (virtual IP) that is moved around by keepalived, and behind that VIP is an nginx proxy acting as a load balancer between the 3 api-server IPs.. (i am running these on baremetal so i don't have a cloud-provided load balancer resource to use so i rolled my own, being self-hosted on the master nodes) this is the main reason i'm specifying a bind-address in the first place, otherwise the api-server defaults to binding on 0.0.0.0 and it would then die because the address is already in use error (on the VIP). maybe not the most elegant approach but without a separate load balancer, this is the best work-around i could come up with so needed to adjust what address kube-apiserver tries to bind to by specifying |
basically my thoughts on this are that, kubeadm should not transport the bind-address setting between subsequent master nodes when using control-plane (at least if one is specified, that isn't 0.0.0.0).. this is a fundamental assumption that breaks down in the instance a bind-address is specified.. when i used to use separate kubeadm-config.yaml (back in the v1alpha_3_ days), i would be able to the set individual bind-address values for each master node's MasterConfiguration.. but using --experimental-control-plane, i lost that ability.. is there a better recommended way to specify this, other than manually in-place (s)edit the kube-apiserver manifest on the non-primary master nodes ? |
like i've mentioned above, you should be using this is outlined in our HA guide:
you can always modify the api-server config and restart a pod. |
can you clarify what you mean by
i'm confused because to me it seems that i am using the docs you're pointing me to assumes there is an external load balancer device.. but that assumption breaks apart in this set-up
i have to modify it because it's being set incorrectly :-) is there a better way to achieve this without manually making modifications to what kubeadm (in essence, fixing kubeadm's incorrect behavior on this edge-case by hand) |
if i understand this correctly this is not a HA setup, because if the first machines goes away the rest will not have a load balancer, correct?
LBs like keepalived allow you to run a LB service on each control plane node, which isn't excactly external LB. that is how |
it isn't ideal HA but it is HA because if the first machine does go away, keepalived should pick the VIP back up and the nginx on the new VIP's host will be load balancing across the remaining API serrvers.. it's kind of a distributed LB approach that still should provide HA (i've tested it already a while back) this isn't helping the original issue that kubeadm in this edge-case is incorrectly passing bind-address from the ClusterConfiguration onto other hosts breaking other api-servers |
it feels to me that in this case nginx adds a layer of complication that is not needed.
@fabriziopandini can comment if we want to omit copying the field. |
my understanding is that the API server is active/active, so i thought for scalability's sake that the nginx load-balancing across all (available) API servers made sense.. can you propose an alternative that doesn't require an external load-balancer which spreads requests across more than one api-server ? |
@aleks-mariusz As of today, the only settings that can be customized for each single API server instance is the With regards to your specific setup, if you want to keep the extra layer of the nginx load balancing, a possible workaround is to use a different port for the load balancing (or for the API servers) One thing that is important to notice is that this extra layer of load balancing will apply only to the external traffic directed to the API server (which usually is not relevant in terms of throughput), while the internal traffic is balanced by other mechanisms. |
Overlooked about internal traffic and other mechanisms used to service it.. re the external traffic, to me, if we have to work around my So would something like this:
accomplish this ? That is versus instead advertising the API-service on a different (instead of the well-known 6443) port. For discussion's sake, is customizing the In any case, while we're discussing the API-server, are any of you able to offer general best practices ideas for typical bare-metal deployments of 3x API-servers in an HA config? Specifically when being not cloud-based (and we have no hw LB available on-prem); since to recap, I've turned to using nginx to load-balance between the 3 API-servers, with nginx listening only on a VIP floating between the 3 API-servers (managed by keepalived), and asking the API server to instead of 0.0.0.0 to have a This work-around worked fine with k8s <= 1.11, back when kubeadm was using had a MasterConfig v1apha3 object, since when bootstrapping the cluster, i had created a separate kubeadm.yaml on each master host (thus letting me customize the bind-address on a per-API-server basis). But now, as of (i think) v1.12, the yaml apiVersion went to v1beta1 (with support for earlier versions seeming like they were withdrawn for bootstrapping newer versions of k8s), so there's now only one ClusterConfig, for the whole cluster. And thus I've uncovered when having specified a bind-address on a specific IP, this was actually set on each API-server the same, causing the 2nd and 3rd to not start (it was listening on an IP that wasn't available). This is the reason for this issue being open in the first case as kubeadm doesn't currently handle this edge-case properly. I thought this was a bug in the way kubeadm was behaving but you guys explained this might be how it's designed.. So I'm now re-evaluating whether my initial approach (nginx/keepalived) even really follows best-practices.. Personally, i don't really like the idea of having to alter any port numbers to work around the already-in-use issue, so wondering what other ideas people might have ? You've got me now considering eliminating nginx from this entirely (but keep using keepalived to move the VIP as needed), however to consider the impact, this would result in all requests would be directed at only one API-server.. From what I've read, API-servers can act in active/active, so this side-effect of eliminating nginx (meaning without any kind of load-balancing in front of them), a single API-server would be taking all the load.. which doesn't seem ideal either. What are other people with on-prem clusters doing? |
I think yes!
AFAIK, customising the bind-addres is fine During this journey, if there is demand from the community (and hopefully also some help), I'm personally open to reconsider actuals assumptions (like e.g. the list of node-specific settings).
We - as a SIG - are working to make this even mores simple and graduate all the different pieaces as fast as possible (hopefully in v1.14) in order to increase stability across releases as well, but we not yet there now, as described in the docs/in the feature graduation process.
I hope other users will answer here .... |
We have got the same problem and worked around it by setting apiServer.extraArgs.bind-address: <node_ip> so the first server comes up as expected. Then sed replace bind-address with local node ip and restart kubelet after the joins. When apiserver binds on node ip i'm free to setup a load balancing haproxy bound to a floating ip. HA is achieved by moving the floating ip to a healthy node via keepalived. As for the need: Most smaller cloud providers don't ship load balancers. This is the only way to get any HA on these platforms besides dedicated load balancer nodes. I hope there will be extraArgs support in join config in the future. |
Config related |
I use custom port - 443 for kube-apiserver, first node starts with correct port, but the rest of masters gets up with port 6443 |
...I've just hit the same...
From the discussion so far, the |
@dparalen if ControlPlaneEndpoint is not an external facility (DNS, virtual IP, load balancer), then how are you configuring the kubelets in the cluster to talk to the API server? Unless things have changed recently, all Kubernetes clients, including the Kubelet require a single endpoint to contact and do not accept a list of endpoints. |
@detiber in the context of this bug, I'd like to use the floating VIP, the address that Nginx (or in my case HAProxy) binds to, on any of the master nodes. That's why folks are working around the |
@dparalen If you could specify a separate bind address per control plane instance, is there anything that would prevent you from using the floating VIP as the ControlPlaneEndpoint? |
@detiber I hope not, I'm still in the process of working around this in my env. |
@detiber actually, there's one more thing I'm facing in the context of this bug, the
|
kubeadm treats control-plane nodes as replicas, by design. the work to enable such "kustomizations" has begun and will be available hopefully in 1.16: however it comes with the caveat that once you enable such customizations you enter unsupported territory.
see notes about kustomization above.
controlEndPoint can be any FQDN or address that leads traffic to an API server, so not technically a requirement for it to be external or a VIP. please continue discussions on #1379 |
@rosti , @fabriziopandin |
Is this a BUG REPORT or FEATURE REQUEST?
BUG
i am trying to set up HA and after the apiVersion of the yaml that gets passed to kubeadm went to v1beta1 (from v1alpha2), i've had to adjust and use the --experimental-control-plane flag.. well it's sure experimental enough, i uncovered this issue where the first node comes up fine, but when joining the 2nd node using the new flag, it creates a kube-apiserver yaml manifest that has the same bind-address parameter as the first api-server, which results in seemingly unrelated errors in the log file:
manually editing /etc/kubernetes/manifests/kube-apiserver.yaml with an in-place sed on the relevant bind-address line fixes it and allows the control plane to come up successfully..
so this seems like a bit of a bug in the way kubeadm sets up the manifest of kube-apiserver on the non-primary node, in that it re-uses the bind-address from the first node
p.s.. i'm specifying a bind-address in my yaml because i don't want it to bind on 0.0.0.0 (all interfaces), since i'm running a nginx on a VIP (setup by keepalived) on each node, so i want to specifically have it only bind to the a certain IP.
Versions
kubeadm version (use
kubeadm version
): v1.13.2kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:33:30Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Environment:
uname -a
): 3.10.0-957.1.3.el7.x86_64 (latest centos 7.6 updates applied)What happened?
api server on 2nd and 3rd node goes into a crash-backup-loop after getting repeated errors due to certificate errors?
What you expected to happen?
2nd (and 3rd) node joins cluster with a functioning API server
How to reproduce it (as minimally and precisely as possible)?
use the following config on node 1 and use kubeadm init as per the official docs
Anything else we need to know?
am using keepalived on each master node, to keep a VIP on one of the master nodes, and also have nginx on each master node, listening on the VIP... both these are in docker containers
The text was updated successfully, but these errors were encountered: