-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add coredns proposal #1100
Add coredns proposal #1100
Changes from all commits
6d13333
57c5c9e
f8c9841
566693e
7d3bbce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
# Add CoreDNS for DNS-based Service Discovery | ||
|
||
Status: Pending | ||
|
||
Version: Alpha | ||
|
||
Implementation Owner: @johnbelamaric | ||
|
||
## Motivation | ||
|
||
CoreDNS is another CNCF project and is the successor to SkyDNS, which kube-dns is based on. It is a flexible, extensible | ||
authoritative DNS server and directly integrates with the Kubernetes API. It can serve as cluster DNS, | ||
complying with the [dns spec](https://github.com/kubernetes/dns/blob/master/docs/specification.md). | ||
|
||
CoreDNS has fewer moving parts than kube-dns, since it is a single executable and single process. It is written in Go so | ||
it is memory-safe (kube-dns includes dnsmasq which is not). It supports a number of use cases that kube-dns does not | ||
(see below). As a general-purpose authoritative DNS server it has a lot of functionality that kube-dns could not reasonably | ||
be expected to add. See, for example, the [intro](https://docs.google.com/presentation/d/1v6Coq1JRlqZ8rQ6bv0Tg0usSictmnN9U80g8WKxiOjQ/edit#slide=id.g249092e088_0_181) or [coredns.io](https://coredns.io) or the [CNCF webinar](https://youtu.be/dz9S7R8r5gw). | ||
|
||
## Proposal | ||
|
||
The proposed solution is to enable the selection of CoreDNS as an alternate to Kube-DNS during cluster deployment, with the | ||
intent to make it the default in the future. | ||
|
||
## User Experience | ||
|
||
### Use Cases | ||
|
||
* Standard DNS-based service discovery | ||
* Federation records | ||
* Stub domain support | ||
* Adding custom DNS entries | ||
* Making an alias for an external name [#39792](https://github.com/kubernetes/kubernetes/issues/39792) | ||
* Dynamically adding services to another domain, without running another server [#55](https://github.com/kubernetes/dns/issues/55) | ||
* Adding an arbitrary entry inside the cluster domain (for example TXT entries [#38](https://github.com/kubernetes/dns/issues/38)) | ||
* Verified pod DNS entries (ensure pod exists in specified namespace) | ||
* Experimental server-side search path to address latency issues [#33554](https://github.com/kubernetes/kubernetes/issues/33554) | ||
* Limit PTR replies to the cluster CIDR [#125](https://github.com/kubernetes/dns/issues/125) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are PTR records for services implemented? I saw coredns/coredns#1074 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, they are. You have to configure the reverse zone to make it work. That means knowing the service CIDR and configuring that ahead of time (would love to have kubernetes/kubernetes#25533 implemented). Since reverse DNS zones are on classful boundaries, if you have a classless CIDR for your service CIDR (say, a /12), then you have to widen that to the containing classful network. That leaves a subset of that network open to the spoofing described in kubernetes/dns#125, and so the issue you reference is to fix that. We still have that issue (PTR hijacking) with CoreDNS for IPs in the pod CIDRs, but we are in a position to fix it if the operator is willing to enable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We generally recommend production user disable pod IP DNS for this reason as well. I would prefer to let pod DNS get deprecated out since it was intentionally a stop gap. Thanks for the clarity on the reverse CIDR. I think one part of this that would be good would be to include a sample Corefile in this proposal that implements conformance with the Kube DNS spec. To someone new to the core file syntax but deeply familiar with the kube dns spec I had to dig through the code to know what I had to set up. A core file would go a long way to assisting in understanding the implications. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I have added that. Let me know if I need any more examples (e.g., federation). |
||
* Serve DNS for selected namespaces [#132](https://github.com/kubernetes/dns/issues/132) | ||
* Serve DNS based on a label selector | ||
* Support for wildcard queries (e.g., `*.namespace.svc.cluster.local` returns all services in `namespace`) | ||
|
||
By default, the user experience would be unchanged. For more advanced uses, existing users would need to modify the | ||
ConfigMap that contains the CoreDNS configuration file. | ||
|
||
### Configuring CoreDNS | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally when bundled with Kubernetes, it should come with a standard configuration file that makes the CoreDNS installation behave exactly the same as the current kube-dns. Additional configuration should be done via the existing kube-system:kube-dns configmap There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure it's appropriate to reuse the same config map. Users who have customized their config need to learn a new config. I don't think replacing ecosystem components will require every component to emulate the other implementations config. |
||
|
||
The CoreDNS configuration file is called a `Corefile` and syntactically is the same as a | ||
[Caddyfile](https://caddyserver.com/docs/caddyfile). The file consists of multiple stanzas called _server blocks_. | ||
Each of these represents a set of zones for which that server block should respond, along with the list | ||
of plugins to apply to a given request. More details on this can be found in the | ||
[Corefile Explained](https://coredns.io/2017/07/23/corefile-explained/) and | ||
[How Queries Are Processed](https://coredns.io/2017/06/08/how-queries-are-processed-in-coredns/) blog | ||
entries. | ||
|
||
### Configuration for Standard Kubernetes DNS | ||
|
||
The intent is to make configuration as simple as possible. The following Corefile will behave according | ||
to the spec, except that it will not respond to Pod queries. It assumes the cluster domain is `cluster.local` | ||
and the cluster CIDRs are all within 10.0.0.0/8. | ||
|
||
``` | ||
. { | ||
errors | ||
log | ||
cache 30 | ||
health | ||
prometheus | ||
kubernetes 10.0.0.0/8 cluster.local | ||
proxy . /etc/resolv.conf | ||
} | ||
|
||
``` | ||
|
||
The `.` means that queries for the root zone (`.`) and below should be handled by this server block. Each | ||
of the lines within `{ }` represent individual plugins: | ||
|
||
* `errors` enables [error logging](https://coredns.io/plugins/errors) | ||
* `log` enables [query logging](https://coredns.io/plugins/log/) | ||
* `cache 30` enables [caching](https://coredns.io/plugins/cache/) of positive and negative responses for 30 seconds | ||
* `health` opens an HTTP port to allow [health checks](https://coredns.io/plugins/health) from Kubernetes | ||
* `prometheus` enables Prometheus [metrics](https://coredns.io/plugins/metrics) | ||
* `kubernetes 10.0.0.0/8 cluster.local` connects to the Kubernetes API and [serves records](https://coredns.io/plugins/kubernetes/) for the `cluster.local` domain and reverse DNS for 10.0.0.0/8 per the [spec](https://github.com/kubernetes/dns/blob/master/docs/specification.md) | ||
* `proxy . /etc/resolv.conf` [forwards](https://coredns.io/plugins/proxy) any queries not handled by other plugins (the `.` means the root domain) to the nameservers configured in `/etc/resolv.conf` | ||
|
||
### Configuring Stub Domains | ||
|
||
To configure stub domains, you add additional server blocks for those domains: | ||
|
||
``` | ||
example.com { | ||
proxy example.com 8.8.8.8:53 | ||
} | ||
|
||
. { | ||
errors | ||
log | ||
cache 30 | ||
health | ||
prometheus | ||
kubernetes 10.0.0.0/8 cluster.local | ||
proxy . /etc/resolv.conf | ||
} | ||
``` | ||
|
||
### Configuring Federation | ||
|
||
Federation is implemented as a separate plugin. You simply list the federation names and | ||
their corresponding domains. | ||
|
||
``` | ||
. { | ||
errors | ||
log | ||
cache 30 | ||
health | ||
prometheus | ||
kubernetes 10.0.0.0/8 cluster.local | ||
federation cluster.local { | ||
east east.example.com | ||
west west.example.com | ||
} | ||
proxy . /etc/resolv.conf | ||
} | ||
``` | ||
|
||
### Reverse DNS | ||
|
||
Reverse DNS is supported for Services and Endpoints. It is not for Pods. | ||
|
||
You have to configure the reverse zone to make it work. That means knowing the service CIDR and configuring that | ||
ahead of time (until [#25533](https://github.com/kubernetes/kubernetes/issues/25533) is implemented). | ||
|
||
Since reverse DNS zones are on classful boundaries, if you have a classless CIDR for your service CIDR | ||
(say, a /12), then you have to widen that to the containing classful network. That leaves a subset of that network | ||
open to the spoofing described in [#125](https://github.com/kubernetes/dns/issues/125); this is to be fixed | ||
in [#1074](https://github.com/coredns/coredns/issues/1074). | ||
|
||
PTR spoofing by manual endpoints | ||
([#124](https://github.com/kubernetes/dns/issues/124)) would | ||
still be an issue even with [#1074](https://github.com/coredns/coredns/issues/1074) solved (as it is in kube-dns). This could be resolved in the case | ||
where `pods verified` is enabled but that is not done at this time. | ||
|
||
### Deployment and Operations | ||
|
||
Typically when deployed for cluster DNS, CoreDNS is managed by a Deployment. The | ||
CoreDNS pod only contains a single container, as opposed to kube-dns which requires three | ||
containers. This simplifies troubleshooting. | ||
|
||
The Kubernetes integration is stateless and so multiple pods may be run. Each pod will have its | ||
own connection to the API server. If you (like OpenShift) run a DNS pod for each node, you should not enable | ||
`pods verified` as that could put a high load on the API server. Instead, if you wish to support | ||
that functionality, you can run another central deployment and configure the per-node | ||
instances to proxy `pod.cluster.local` to the central deployment. | ||
|
||
All logging is to standard out, and may be disabled if | ||
desired. In very high queries-per-second environments, it is advisable to disable query logging to | ||
avoid I/O for every query. | ||
|
||
CoreDNS can be configured to provide an HTTP health check endpoint, so that it can be monitored | ||
by a standard Kubernetes HTTP health check. Readiness checks are not currently supported but | ||
are in the works (see [#588](https://github.com/coredns/coredns/issues/588)). For Kubernetes, a | ||
CoreDNS instance will be considered ready when it has finished syncing with the API. | ||
|
||
CoreDNS performance metrics can be published for Prometheus. | ||
|
||
When a change is made to the Corefile, you can send each CoreDNS instance a SIGUSR1, which will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does CoreDNS support watching a config file and auto-updating when it changes? Would make it easy to consume updated config via ConfigMap. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Miek said, not currently, but certainly it's feasible to add. |
||
trigger a graceful reload of the Corefile. | ||
|
||
### Performance and Resource Load | ||
|
||
The performance test was done in GCE with the following components: | ||
|
||
* CoreDNS system with machine type : n1-standard-1 ( 1 CPU, 2.3 GHz Intel Xeon E5 v3 (Haswell)) | ||
* Client system with machine type: n1-standard-1 ( 1 CPU, 2.3 GHz Intel Xeon E5 v3 (Haswell)) | ||
* Kubemark Cluster with 5000 nodes | ||
|
||
CoreDNS and client are running out-of-cluster (due to it being a Kubemark cluster). | ||
|
||
The following is the summary of the performance of CoreDNS. CoreDNS cache was disabled. | ||
|
||
Services (with 1% change per minute\*) | Max QPS\*\* | Latency (Median) | CoreDNS memory (at max QPS) | CoreDNS CPU (at max QPS) | | ||
------------ | ------------- | -------------- | --------------------- | ----------------- | | ||
1,000 | 18,000 | 0.1 ms | 38 MB | 95 % | | ||
5,000 | 16,000 | 0.1 ms | 73 MB | 93 % | | ||
10,000 | 10,000 | 0.1 ms | 115 MB | 78 % | | ||
|
||
\* We simulated service change load by creating and destroying 1% of services per minute. | ||
|
||
\** Max QPS with < 1 % packet loss | ||
|
||
## Implementation | ||
|
||
Each distribution project (kubeadm, minikube, kubespray, and others) will implement CoreDNS as an optional | ||
add-on as appropriate for that project. | ||
|
||
### Client/Server Backwards/Forwards compatibility | ||
|
||
No changes to other components are needed. | ||
|
||
The method for configuring the DNS server will change. Thus, in cases where users have customized | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a good reason to break compatibility here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since CoreDNS isn't solely a K8s product, it's configuration of these options is more general purpose and so falls outside of our Kubernetes plugin. Internally that would mean that the Kubernetes plugin has to reconfigure the overall server. I don't think that could be done in a deterministic way. That is, we don't know everything a user might configure, so we can't easily modify the overall configuration based on the additional ConfigMaps. What we can do, is provide a tool to convert the existing kube-dns configuration on a cluster into a CoreDNS configuration. So, it would be a one-time operation when moving from kube-dns to CoreDNS. (I think this answers your other config-related questions above too, let me know if not). We may want to keep the service name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ideal would be to be able to seamlessly drop the CoreDNS Pod in as the replacement for the existing kube-dns. By CoreDNS pod, I mean the CoreDNS + the appropriate configuration for a Kubernetes cluster. CoreDNS itself (the generic piece) does not have to change, but the ideal scenario would be to have the pod setup in such a way (e.g. via a very small shim sidecar) that preserve compatibility with the original kube-dns, especially wrt the parameters it consumes from the ConfigMap. This way we can switch to using coredns transparently with minimal user disruption. We would probably still need to maintain at least a two release deprecation of the existing ConfigMap and provide upgrade/downgrade automation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm somewhat skeptical of this approach - I understand the reasoning, but I also don't know that I consider it a requirement for a replacement implementation to emulate the previous config. I think that is something that might be desirable, but I also don't think I consider it a requirement for someone to use CoreDNS in a conformant way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the very least we need to support existing users of kube-dns to migrate to CoreDNS if CoreDNS is to replace kube-dns as the default provider. I know for sure that we have lots of users that utilize the stub domains via kube-system:kube-dns config map. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we would need to have an upgrade/downgrade path for existing users. Otherwise changing coredns to be the default will break people's clusters... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, we could clean up kube-system:kube-dns to be more generic, say be portable across different DNS providers. E.g. make it part of the API for k8s cluster, rather than viewing it as a entirely bound to the existing implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd expect the DNS API to be defined in terms of the DNS schema that is served, and the pod/service/endpoint inputs, not the config of the DNS server itself There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @liggitt I can see this specific config being out of scope for k8s -- but we still need to support our users across versions in a transparent way. If we make coredns the default "kube-dns" in say 1.10, it can't be the case that the users configs are not carried through. The proposal is replace kube-dns. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, those deployments that auto installed kube-dns would be responsible for adapting/shimming/converting the kube-dns config to cause the equivalent behavior via coredns if they wanted to switch on upgrade |
||
the DNS configuration, they will need to modify their configuration if they move to CoreDNS. | ||
For example, if users have configured stub domains, they would need to modify that configuration. | ||
|
||
When serving SRV requests for headless services, some responses are different from kube-dns, though still within | ||
the specification (see [#975](https://github.com/coredns/coredns/issues/975)). In summary, these are: | ||
|
||
* kube-dns uses endpoint names that have an opaque identifier. CoreDNS instead uses the pod IP with dashes. | ||
* kube-dns returns a bogus SRV record with port = 0 when no SRV prefix is present in the query. | ||
coredns returns all SRV record for the service (see also [#140](https://github.com/kubernetes/dns/issues/140)) | ||
|
||
Additionally, federation may return records in a slightly different manner (see [#1034](https://github.com/coredns/coredns/issues/1034)), | ||
though this may be changed prior to completing this proposal. | ||
|
||
In the plan for the Alpha, there will be no automated conversion of the kube-dns configuration. However, as | ||
part of the Beta, code will be provided that will produce a proper Corefile based upon the existing kube-dns | ||
configuration. | ||
|
||
## Alternatives considered | ||
|
||
Maintain existing kube-dns, add functionality to meet the currently unmet use cases above, and fix underlying issues. | ||
Ensuring the use of memory-safe code would require replacing dnsmasq with another (memory-safe) caching DNS server, | ||
or implementing caching within kube-dns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the stub domain support interoperable with the current kube-dns configmap?