Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipFamilyPolicy:PreferDualStack for kube-dns and metrics-server #8394

Closed
wants to merge 1 commit into from

Conversation

manuelbuil
Copy link
Contributor

Proposed Changes

Use ipFamilyPolicy: PreferDualStack as default for kube-dns and metrics services. That way, those services will have one IP when using ipv4-only and ipv6-only but will pick 2 addresses (ipv4 and ipv6) when k3s is deployed in dualStack mode

Types of Changes

Bug fix

Verification

Deploy k3s in dual-stack mode and verify that the service gets two IP addresses (ipv4 and ipv6)

Testing

Linked Issues

#8393

User-Facing Change

Fix kube-dns and metrics services in dual-stack mode so they pick dual-stack addresses

Further Comments

@brandond
Copy link
Member

@manuelbuil
Copy link
Contributor Author

manuelbuil commented Sep 20, 2023

Should we do this for traefik as well? https://github.com/k3s-io/k3s-charts/tree/main-source/packages/traefik

traefik is already doing it correctly:

azureuser@mbuil-vm0:~$ k get svc traefik -n kube-system -o yaml | grep ipFamilyPolicy
  ipFamilyPolicy: PreferDualStack

https://github.com/k3s-io/k3s/blob/master/manifests/traefik.yaml#L40-L41

@brandond
Copy link
Member

Ah great, I forgot we were injecting that here instead of in the chart patches

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (6330a5b) 47.43% compared to head (669f655) 51.47%.
Report is 23 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8394      +/-   ##
==========================================
+ Coverage   47.43%   51.47%   +4.04%     
==========================================
  Files         143      143              
  Lines       14741    14826      +85     
==========================================
+ Hits         6992     7632     +640     
+ Misses       6643     5970     -673     
- Partials     1106     1224     +118     
Flag Coverage Δ
e2etests 48.71% <81.81%> (?)
inttests 44.68% <87.50%> (-0.08%) ⬇️
unittests 19.73% <ø> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/deploy/zz_generated_bindata.go 51.49% <ø> (ø)
pkg/server/server.go 59.07% <100.00%> (+2.32%) ⬆️
pkg/cli/server/server.go 59.95% <77.77%> (+1.62%) ⬆️

... and 44 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aofei
Copy link
Contributor

aofei commented Sep 21, 2023

So will kube-dns get a static IPv6 address (like 2001:cafe:42:1::53)? Currently k3s server --cluster-dns accepts only one address (default: 10.43.0.10).

@brandond
Copy link
Member

Currently k3s server --cluster-dns accepts only one address (default: 10.43.0.10).

I don't believe that's true. It accepts a list, just like the cluster CIDR and service CIDR flags.

@manuelbuil
Copy link
Contributor Author

Currently k3s server --cluster-dns accepts only one address (default: 10.43.0.10).

I don't believe that's true. It accepts a list, just like the cluster CIDR and service CIDR flags.

Correct. This PR only changes the behavior in dual-stack mode. In that mode, the service will have two IPs (ipv4 and ipv6)

@aofei
Copy link
Contributor

aofei commented Sep 21, 2023

It accepts a list, just like the cluster CIDR and service CIDR flags.

Yes you are right. The flag usage led me to think it could only receive one IPv4 address.

--cluster-dns value  (networking) IPv4 Cluster IP for coredns service. Should be in your service-cidr range (default: 10.43.0.10)

However it seems impossible to reserve an IPv6 address via something like --cluster-dns 10.43.0.10,2001:cafe:42:1::53. I don't see clusterIPs here:

k3s/manifests/coredns.yaml

Lines 204 to 217 in 6330a5b

spec:
selector:
k8s-app: kube-dns
clusterIP: %{CLUSTER_DNS}%
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP

@manuelbuil
Copy link
Contributor Author

It accepts a list, just like the cluster CIDR and service CIDR flags.

Yes you are right. The flag usage led me to think it could only receive one IPv4 address.

--cluster-dns value  (networking) IPv4 Cluster IP for coredns service. Should be in your service-cidr range (default: 10.43.0.10)

However it seems impossible to reserve an IPv6 address via something like --cluster-dns 10.43.0.10,2001:cafe:42:1::53. I don't see clusterIPs here:

k3s/manifests/coredns.yaml

Lines 204 to 217 in 6330a5b

spec:
selector:
k8s-app: kube-dns
clusterIP: %{CLUSTER_DNS}%
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP

The service manifest has two fields: ClusterIP (string) and ClusterIPs ([]string).
Even if ClusterIP is set, ClusterIPs can include extra IPs. Of course ClusterIP needs to be part of ClusterIPs

@brandond
Copy link
Member

brandond commented Sep 21, 2023

Yeah, we probably also need to modify the coredns manifest, and how we template in the cluster DNS address, to work properly with dualstack. Just setting the ipfamilypolicy won't work, because we're hardcoding the clusterIP to a single address family.

Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to address the issue with clusterIP: %{CLUSTER_DNS}% before this will work right?

@aofei
Copy link
Contributor

aofei commented Sep 22, 2023

There should also be a default IPv6 Cluster IP for kube-dns, right?

According to the ServiceSpec, the clusterIPs field cannot be changed once set. This means that for an existing cluster, if only ipFamilyPolicy: PreferDualStack is introduced without providing a default IPv6 Cluster IP, and people forget to modify the --cluster-dns, the kube-dns will get a random IPv6 address that can never be changed.

Feels like there should be a default for it.

@brandond
Copy link
Member

base plus 10 is probably also a good default for IPv6.

@manuelbuil
Copy link
Contributor Author

Yeah, we probably also need to modify the coredns manifest, and how we template in the cluster DNS address, to work properly with dualstack. Just setting the ipfamilypolicy won't work, because we're hardcoding the clusterIP to a single address family.

There should also be a default IPv6 Cluster IP for kube-dns, right?

According to the ServiceSpec, the clusterIPs field cannot be changed once set. This means that for an existing cluster, if only ipFamilyPolicy: PreferDualStack is introduced without providing a default IPv6 Cluster IP, and people forget to modify the --cluster-dns, the kube-dns will get a random IPv6 address that can never be changed.

Feels like there should be a default for it.

Thanks for the comments, let me try to reply to you both :).

It is important to note that, at this point in k3s, when deploying on dualStack mode, IPv4 is always prioritized over IPv6. (This will change soon but in another PR). Even if you provide a dual-stack type of input to --cluster-dns, the IPv4 address will be used.

Before this PR, this is what we had with kube-dns service:

IPv4-only

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack

Dual-Stack

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack

IPv6-only

  clusterIP: fd00:43::a
  clusterIPs:
  - fd00:43::a
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv6
  ipFamilyPolicy: SingleStack

Both 10.43.0.10 and fd00:43::a are the default addresses for kube-dns. If another serviceCIDR range is defined, then we take the network address and +10, in fact, that's what we are doing with 10.43.0.10 and fd00:43::a.

When this PR is merged, this is what we will have:

IPv4-only

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: PreferDualStack

Dual-Stack

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  - $RANDOM-IP-FROM-IPV6-SERVICE-CIDR
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: PreferDualStack

IPv6-only

  clusterIP: fd00:43::a
  clusterIPs:
  - fd00:43::a
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv6
  ipFamilyPolicy: PreferDualStack

I can confirm that all the 6 examples work. I can lookup hostnames on the provided clusterIPs.

Up to here were facts :). Now on the There should also be a default IPv6 Cluster IP for kube-dns, my opinion is that this is not needed for the "non-prioritized IPFamily" in dual-stack, which as explained, it is IPv6 at this point. Remember that we need to pin the kube-dns IP because there are components that require to know the IP of the DNS before the service starts, for example kubelet. That is why we have a "default" kube-dns IP for IPv4-only, IPv6-only and dual-stack (IPv4). So, from a technical point of view, pinning both IPv4 and IPv6 IPs in the kube-dns service is not needed.

Now, from a user perspective view, I'd need more information. I wonder if perhaps it is useful to know that in k3s the DNS service is always 10.43.0.10. However, when using dual-stack, the service-cidr must be always provided, i.e. defaults like 10.43.0.0/16 or fd00:43::/112 don't apply anymore. So, even for the user, I don't see the benefit of pinning both IPs, but this is just my opinion. I am open for discussion and opinions :)

@brandond
Copy link
Member

brandond commented Sep 22, 2023

Remember that we need to pin the kube-dns IP because there are components that require to know the IP of the DNS before the service starts, for example kubelet. That is why we have a "default" kube-dns IP for IPv4-only, IPv6-only and dual-stack (IPv4). So, from a technical point of view, pinning both IPv4 and IPv6 IPs in the kube-dns service is not needed.

We need to know it ahead of time, and pass it to the kubelet, if we want the kubelet to configure dual-stack resolvers in resolv.conf. If we don't, it will only use the primary address for cluster dns. It doesn't actually look at the Kubernetes service to figure out what its addresses are.

From the kubelet docs:

cluster-dns strings
Comma-separated list of DNS server IP address. This value is used for containers DNS server in case of Pods with "dnsPolicy: ClusterFirst".

Maybe this is fine and we don't care about providing clients with dual-stack resolvers on dual-stack clusters, and are OK with forcing them to use just the primary ip family.

@aofei
Copy link
Contributor

aofei commented Sep 22, 2023

Maybe this is fine and we don't care about providing clients with dual-stack resolvers on dual-stack clusters, and are OK with forcing them to use just the primary ip family.

Considering how /etc/resolv.conf works, I agree that providing dual-stack resolvers to clients doesn't make any sense. So, using only the IPv4 address for components like kubelet when configuring a dual-stack --cluster-dns isn't a problem.


And thanks for the detailed explanation @manuelbuil!

My point of view is that for special services like kube-dns, they should have special IP addresses whenever possible (better if they are also easy to remember). Like the default kubernetes service always gets base+1 (e.g. 10.43.0.1). And some may assign base+2 to their ingress controller. So base+10 seems to be a good default for the kube-dns, regardless of whether it is single stack, or dual stack.

It's strange for kube-dns to get a random IP address, since it will most likely only be accessed via the bare IP address. Say someone wants to lookup hostnames on the IPv6 Cluster IP, basev6+10 won't impose any additional memorization burden. In my use case, I want to use Tailscale's Split DNS feature to forward the Cluster Domain, but I don't want to use the IPv4 address of kube-dns due to overlap.

Regardless of whether there will be a dual-stack default for the kube-dns in a dual-stack cluster, I think k3s should support customizing the kube-dns' clusterIPs field via a dual-stack --cluster-dns.

@brandond
Copy link
Member

Yeah, even if we only set the kubelet's --cluster-dns address to the primary address, I think we should still default to allocating base+10 to kube-dns for both address families.

@manuelbuil
Copy link
Contributor Author

manuelbuil commented Sep 25, 2023

Added the code pinning coredns IP address for both ip families in dualStack:

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  - 2001:cafe:42:1::a
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: PreferDualStack

Both dig @2001:cafe:42:1::a kubernetes.default.svc.cluster.local and dig @10.43.0.10 kubernetes.default.svc.cluster.local work.

Single stack keeps working:

  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: PreferDualStack

pkg/cli/server/server.go Outdated Show resolved Hide resolved
pkg/server/server.go Outdated Show resolved Hide resolved
manifests/coredns.yaml Outdated Show resolved Hide resolved
return errors.Wrap(err, "problems calculating extraDNS")
}
serverConfig.ControlConfig.ClusterDNSs = append(serverConfig.ControlConfig.ClusterDNSs, extraDNS)
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when I explicitly set --service-cidr 10.43.0.0/16,2001:cafe:42:1::/112 --cluster-dns 10.43.0.10?

Providing clusterIPs: [10.43.0.10] for the first time is ok, it will be assigned a random IPv6 address. However, could there be an error when updating the kube-dns service in the future?

From the ServiceSpec:

  • clusterIPs ([]string)
    ...
    This field may not be changed through updates unless the type field is also being changed to ExternalName (which requires this field to be empty) or the type field is being changed from ExternalName (in which case this field may optionally be specified, as describe above).
    ...

Copy link
Member

@brandond brandond Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be covered by upgrade testing, but it is good to think about.

If the apply controller doesn't handle it properly, we may have to add some additional logic to delete and re-create the service so that we can change the addresses. I don't think this should be a problem, as I believe should be allowed to change clusterIPs when switching from single-stack to dual-stack? There may be issues with existing dual-stack IPv6-primary clusters though, where we were previously using the IPv4 address as the primary, and the new code will use the IPv6 address instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that updating important config parameters in a running cluster is something that we currently don't support in k3s. For example: cluster-dns,service-cidr, cluster-cidr or flannel-backend. If there is enough interest, we could try to design that feature and prototype it. In any case, I'd say this is out of the scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I think you two got me wrong.

Let's say I started a cluster with k3s server --service-cidr 10.43.0.0/16,2001:cafe:42:1::/112 --cluster-dns 10.43.0.10. Based on the current PR, the kube-dns service in the coredns.yaml will be rendered as:

clusterIP: 10.43.0.10
clusterIPs: [10.43.0.10]
ipFamilyPolicy: PreferDualStack

Which will result a Service resource similar to this:

apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
spec:
  clusterIP: 10.43.0.10
  clusterIPs:
  - 10.43.0.10
  - 2001:cafe:42:1::62c1 # A random IPv6 address
  ipFamilyPolicy: PreferDualStack

So far so good.

Then we manually kubectl apply -f coredns.yaml again without any changes (this will happen every time k3s restarts, right?). I think we'll get this error:

The Service "kube-dns" is invalid: spec.ipFamilyPolicy: Invalid value: "PreferDualStack": must be 'SingleStack' to release the secondary cluster IP

This is because the clusterIPs field cannot be changed once set.

I know that we didn't make any changes to the coredns.yaml itself, but we were actually trying to release the secondary cluster IP (that random IPv6 address) of the kube-dns service without knowing it. Because the rendered field is always clusterIPs: [10.43.0.10].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aofei for the explanation, indeed I was not understanding what you meant. You have a good point. To be safe, we probably should use SingleStack for kube-dns service if cluster-dns flag is set with a singlestack value. It should be easy to code.

Moreover, if afterwards the user realizes a mistake was done and wanted dual-stack kube-dns, the user could change the value of cluster-dns adding the extra IP and restart k3s. Even if it means changing an important config on a running cluster, I think we could support this case because the main DNS IP would not change and we would just be adding a new extra IP

}

// Set ClusterDNS to the first IPv4 address, for legacy clients
// unless only IPv6 range given
clusterDNS, _, _, err := util.GetFirstIP(serverConfig.ControlConfig.ClusterDNSs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? I think Kubernetes might insist that the primary clusterIP for the service respect the ip family order for the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, all our code is handling this situation prioritizing ipv4 regardless of the order. I am preparing another PR to change this. In any case, it is not clear that Kubernetes mandates anything in this respect, in fact, when they describe dual-stack, IPv4 is always the first in all their examples. We can discuss further in my other PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I am creating a different PR because verifying the order to decide what IPFamily gets prioritized requires changes in several parts of the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break if I set --service-cidr 2001:cafe:42:1::/112,10.43.0.0/16 --cluster-dns 2001:cafe:42:1::a,10.43.0.10?

If the kube-dns service in the coredns.yaml will be rendered as:

clusterIP: 10.43.0.10
clusterIPs: [2001:cafe:42:1::a, 10.43.0.10]
ipFamilyPolicy: PreferDualStack

Then I think things will go wrong, as the ServiceSpec says:

  • clusterIPs ([]string)
    ...
    If this field is not specified, it will be initialized from the clusterIP field. If this field is specified, clients must ensure that clusterIPs[0] and clusterIP have the same value.
    ...

clusterIP and clusterIPs[0] must always be identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, all our code is expecting to define dual-stack as stated in the Kubernetes documentation:
https://kubernetes.io/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack
, i.e. it expects the order to be <IPv4 CIDR>,<IPv6 CIDR>. K3s is currently not ready for configs like --service-cidr 2001:cafe:42:1::/112,10.43.0.0/16. You are right, your config will not work, but there are other things that will not work either without some workarounds: e.g. kube-api or netpol controller. I am working on another PR to provide support for <IPv6 CIDR>,<IPv4 CIDR> type of configs. I'll add you to that PR as reviewer and we can address this problem over there

Copy link
Contributor

@aofei aofei Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually has nothing to do with the order of --service-cidr. --service-cidr 10.43.0.0/16,2001:cafe:42:1::/112 --cluster-dns 2001:cafe:42:1::a,10.43.0.10 will lead to the same result.

Kind of feels like there should be a check here to make sure that --cluster-dns always matches --service-cidr (within the ranges, having the same families and family order no matter which family takes precedence). I think k3s should throw a fatal to prevent the above case from happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: cmds.ServerConfig.ClusterDNS should always matches cmds.ServerConfig.ServiceCIDR when len(cmds.ServerConfig.ClusterDNS) == len(cmds.ServerConfig.ServiceCIDR).

The case of --service-cidr 10.43.0.0/16,2001:cafe:42:1::/112 --cluster-dns 10.43.0.10 should still be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of the above. If there are places where we are forcing or expecting IPv4 as the primary AF on dual-stack clusters, we should work towards eliminating those.

On the other hand, if someone changes the cluster-dns setting after the cluster has already been started, I'm not sure how hard we need to go our of our way to handle changes not allowed by Kubernetes itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion:

--service-cidr 10.43.0.0/16,2001:cafe:42:1::/112 --cluster-dns 2001:cafe:42:1::a,10.43.0.10

is currently not acceptable because cluster-dns is using the non-supported order: <IPv6><IPv4>.

Let me finish up the other PR and we can discuss <IPv6><IPv4> over there. I think I can have it in a few hours

Copy link
Contributor Author

@manuelbuil manuelbuil Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8460 <== When using this PR, the problem will not exist anymore (I think). @aofei could you help me reviewing that PR too? I can't add you to the reviewers list

Signed-off-by: Manuel Buil <mbuil@suse.com>
@manuelbuil
Copy link
Contributor Author

merged as part of #8460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants