-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: loadbalancer: Internal does not create correct Route53 entries #4252
Comments
as a workaround, we updated the api.internal dns record, and then scaled dns-controller to 0 to prevent it from getting changed back. This resulted an in improvement, but it means now k8s cannot create dns names when we launch services, so its not acceptable long term. We also noted that while our workloads kept running this time, the nodes in the same AZ as the master still went to NotReady, making them unschedulable. This is still not the desired behavior. |
I've been battling with something similar. The In my instance, I doubt it will solve your problem as you seem to be having problems with the k8s nodes talking internally to now-dead masters. Maybe
Only supports the public endpoint, which maybe should be documented more clearly. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Now that this issue has been accepted in kubernetes kubernetes/kubernetes#63492 I feel it should be a valid option to turn on an elb for the internal api to use because I think we are now running into possible timing issues with dns updates and propagating that having an elb there would make it much easier to manage. With the dns propagation taking so long we get nodes going not ready if we say terminate all 3 masters. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Looking at the code the problem appears to be the internal annotation is always added to the apiserver pod https://github.com/kubernetes/kops/blob/master/nodeup/pkg/model/kube_apiserver.go#L444 This means the record will always be updated by dns-controller. Should this annotation not be conditional in the same way as the external one? |
Just to chime in with a bit more detail on this. We’ve been affected by a memory leak in kube-apiserver that causes a master node to become unresponsive until the docker health check is able to restart docker (which took 15 mins in one case due to high memory pressure). As far as I can tell dns-controller does not do any modifications to the internal api record based on the availability of the apiserver pods or nodes so the unresponsive pod remains as one of the IPs in the set. This in turn means, on occasion, kublet for some worker nodes tries to connect to the struggling master and fails resulting in that node being marked NotReady. This seems naive on behalf of kubernetes and I suspect some improvements need to be made there but in the meantime we’d also like to use the ELB created by kops for kubelet, kube-proxy etc on worker nodes. At least this way a health check could take care of taking unresponsive apiservers out of the pool. |
/remove-lifecycle rotten |
MR with proposed fix here #5375 |
Ah - thank you all for all the detective work to figure this out. I think we're close here. To date kops has always used DNS with A records of master nodes when talking to the master inside the cluster (e.g. kubelet -> master), even if there is an ELB present in front of the masters. The ELB is assumed to be primarily for traffic from outside the cluster to reach the master. We do support internal ELBs, but again this is intended for traffic from outside the cluster (but for clients that are on the VPC, either directly or via a VPN/DirectConnect) So I don't think the fix in #5375 is quite right, because users that have configured an ELB don't necessarily want their internal traffic going through it. We could expose another option to allow for internal kubelet -> master traffic to go via an ELB. I'm not sure whether this should be a separate ELB or the same one. (Or whether it should be an NLB). Another gotcha is that (until we have etcd-manager) we still use DNS for etcd cluster maintenance, so even if we solve this problem we might still have troubles there. Except that hopefully this is actually specific to kubelet -> master communications (if it's all related to kubernetes/kubernetes#63492 ) There's also hopes that this gets fixed in 1.11 or if/when 63492 is backported |
Ok thanks for the info - I see I've gotten the wrong end of the stick on the proposed fix because api.internal is created with the placeholder IP, so you're right this wouldn't work, the record would never be pointed to the ELB. I think we'd be totally fine with using the same ELB. I can't think of a reason why you'd want a separate one, unless for auditing purposes maybe. Or you've split your VPC's in such a way that the k8s nodes couldn't see the ELB, is that possible? Our config relating to this BTW is
We don't specify the DNS setting, so we'd kind of just assumed the ELB was used for everything, do you think some people are expecting this is not the case? If so then I guess we would have to have an explicit setting. Perhaps something like the following...
Then on the basis of |
I agree with @mellowplace. Our workound uses the same elb for internal traffic and we have no problems. We could use the mod he proposed. It seems overcomplicated to use a separate elb to me. We maintenance our clusters with automated jobs that delete the nodes one at a time periodically to patch them. Ever since we changed to using the elb for node to master communication, our node maintenance jobs work great. |
You need an internal ELB for node -> master communication but some people might have an external ELB for API communication because they don't have a VPN into their cluster so that's one reason to keep them separate. |
I've updated #5375 with the proposal from #4252 (comment) @lkysow So potentially the k8s nodes not be able to see an external ELB? My reading of the code is that we'd be dependant on the CIDR of spec.KubernetesApiAccess If we want to avoid creating another ELB then we might have to add a source security group to allow k8s nodes guaranteed access (we could make this conditional on @justinsb thoughts? |
@mellowplace sorry but I can't give you a good answer right now because I don't have the time to dig into this to understand the full context. My comment was just to say that in some cases it makes sense to have an internal ELB for cluster communication and an external ELB for external API communication. I'm not sure if that matters in this context though! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Operating Environment
kops 1.8.0
aws
k8s 1.8.4
What we expect to happen
Our kops spec looks like this:
Per the documentation here, we expect kops to create and maintain a route53 record for api.internal.ecom-2.dev.domain.com, which points to an AWS loadbalancer created for the masters.
What actually happens
We get instead a Route53 record that is a list of A records, corresponding to the masters .
Why this matters
Our workloads experience failures because the nodes cannot talk to a master any more--probably because they are using the ip address directly. When the node dies, workloads get failures doing lookups using kube-dns-- research has shown that this is due to the fact that kube-dns on the node is timing out trying to get to the masters via the old ip.
We can temporarily fix the issue by manually updating the Route53 record to point to the load balancer. We've verified that if we do this, and THEN kill a master, the workloads are not affected.
The problem is that when the new master comes back on line, the cluster re-writes the Route53 name back to A records-- so the workaround will not survive rolling updates.
I believe this is the root cause for #4247, and #3702, which I am going to close in favor of this.
I think this may be the actual root cause of #2634. The solution mentioned there is exactly as I describe, but I suspect would not survive a rolling update.
What we need
We need to know what configuration we have wrong that causes us to get an internal name pointing to the masters directly, and how to fix it.
Alternatively, a workaround that simply disables kops from updating the internal DNS record would be super helpful.
The text was updated successfully, but these errors were encountered: