Fix DNS latency of 5s when use iptables forward #62764

xiaoxubeii · 2018-04-18T04:04:32Z

What this PR does / why we need it:
Fix the dns latency of 5s when uses iptables forward.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #62628

Special notes for your reviewer:

Release note:

NONE

k8s-ci-robot · 2018-04-18T04:05:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiaoxubeii
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: danwinship

Assign the PR to them by writing /assign @danwinship in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

pkg/kubelet/network/dns/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dims · 2018-04-18T13:40:09Z

/ok-to-test

xiaoxubeii · 2018-04-19T07:43:09Z

/assign @danwinship

xiaoxubeii · 2018-04-23T03:03:04Z

/unassign danwinship

xiaoxubeii · 2018-04-23T03:07:56Z

/assign mrhohn

xiaoxubeii · 2018-05-10T04:13:37Z

/retest

xiaoxubeii · 2018-05-10T06:19:48Z

@MrHohn for review and approval : )

MrHohn · 2018-05-14T05:40:40Z

@bowei @thockin

MrHohn · 2018-05-14T05:53:47Z

Would be great if we can have a cluster wide knob for tweaking this defaultDNSOptions so user can place this workaround via that. I believe this might help mitigate the packet dropping bug for pod DNS resolution, but whether there would be any other side effect is unclear to me and that's why I'm hesitated.

Another pod wide option would be something like below (similar to what you posted on weaveworks/weave#3287 (comment)):

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "ClusterFirst"
  dnsConfig:
    options:
      - name: single-request-reopen

The given options will be merged with the pre-set ones (e.g. ndots:5).

Quentin-M · 2018-05-15T23:05:41Z

I would just like to add here that the single-request(-reopen) workaround does not work with Alpine-based containers, as musl does not support the option (see below). Unfortunately, Alpine Linux is the base image for 90% of our infrastructure.

src/network/resolvconf.c

                if (!strncmp(line, "options", 7) && isspace(line[7])) {
                        p = strstr(line, "ndots:");
                        if (p && isdigit(p[6])) {
                                p += 6;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->ndots = x > 15 ? 15 : x;
                        }
                        p = strstr(line, "attempts:");
                        if (p && isdigit(p[9])) {
                                p += 9;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->attempts = x > 10 ? 10 : x;
                        }
                        p = strstr(line, "timeout:");
                        if (p && (isdigit(p[8]) || p[8]=='.')) {
                                p += 8;
                                unsigned long x = strtoul(p, &z, 10);
                                if (z != p) conf->timeout = x > 60 ? 60 : x;
                        }
                        continue;
                }

src/network/lookup.h

struct resolvconf {
        struct address ns[MAXNS];
        unsigned nns, attempts, ndots;
        unsigned timeout;
};

Quentin-M · 2018-05-19T06:30:21Z

Here is the workaround we are about to use: weaveworks/weave#3287 (comment)

xiaoxubeii · 2018-05-22T03:51:27Z

@MrHohn OK, that's a compromise settlement, i will close the pr : )

xiaoxubeii · 2018-05-22T03:51:49Z

/close

steven-sheehy · 2018-06-22T16:08:23Z

Since we run various helm charts, not every pod we run is under our control to be able to add a custom dnsConfig to specify single-request-reopen. A custom kubelet flag to enable this would help, but I think this should be enabled by kubelet by default. From my understanding of the option, single-request-reopen sounds pretty safe since it provides a fallback for the existing functionality to try a new socket. If underlying libraries don't support it like alpine, it will just be ignored. Anyway this PR could be reopened?

Quentin-M · 2018-06-24T23:07:56Z

I just posted a little write-up about our journey troubleshooting the issue, and how we are worked around it in production: https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/.

@steven-sheehy Our workaround does not involve setting dnsConfig, nor does it require any change from the users.

steven-sheehy · 2018-06-27T15:45:17Z

@Quentin-M I tried your workaround and it still occurs. Left a comment on your blog. We may be suffering from the SNAT race condition as well. Regardless, we at least need a cluster level option to tweak this.

Quentin-M · 2018-06-27T16:06:06Z

This workaround works for both SNAT and DNAT. Just a few thoughts: You may have to adjust the latency by a few ms depending on your network conditions. You also would need to make sure that you are applying it to the right network interfaces depending on your CNI/network configuration.

…

On Jun 27, 2018, at 8:46 AM, Steven Sheehy ***@***.***> wrote: @Quentin-M I tried your workaround and it still occurs. Left a comment on your blog. We may be suffering from the SNAT race condition as well. Regardless, we at least need a cluster level option to tweak this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Fix 62628

79b202c

k8s-ci-robot requested review from danwinship and dcbw April 18, 2018 04:05

xiaoxubeii changed the title ~~Fix 62628~~ Fix DNS latency of 5s when uses iptables forward Apr 18, 2018

xiaoxubeii changed the title ~~Fix DNS latency of 5s when uses iptables forward~~ Fix DNS latency of 5s when use iptables forward Apr 18, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 18, 2018

Update dns testcases.

ae9f57e

k8s-ci-robot assigned danwinship Apr 19, 2018

k8s-ci-robot unassigned danwinship Apr 23, 2018

k8s-ci-robot assigned MrHohn Apr 23, 2018

This was referenced May 1, 2018

DNS lookup timeouts due to races in conntrack weaveworks/weave#3287

Open

Support configurable pod resolv.conf kubernetes/enhancements#504

Closed

k8s-ci-robot closed this May 22, 2018

steven-sheehy mentioned this pull request Jun 27, 2018

Configure DNS options at cluster level #59031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DNS latency of 5s when use iptables forward #62764

Fix DNS latency of 5s when use iptables forward #62764

xiaoxubeii commented Apr 18, 2018

k8s-ci-robot commented Apr 18, 2018

dims commented Apr 18, 2018

xiaoxubeii commented Apr 19, 2018

xiaoxubeii commented Apr 23, 2018

xiaoxubeii commented Apr 23, 2018

xiaoxubeii commented May 10, 2018

xiaoxubeii commented May 10, 2018

MrHohn commented May 14, 2018

MrHohn commented May 14, 2018

Quentin-M commented May 15, 2018 •

edited

Loading

Quentin-M commented May 19, 2018

xiaoxubeii commented May 22, 2018

xiaoxubeii commented May 22, 2018

steven-sheehy commented Jun 22, 2018

Quentin-M commented Jun 24, 2018

steven-sheehy commented Jun 27, 2018

Quentin-M commented Jun 27, 2018 via email

Fix DNS latency of 5s when use iptables forward #62764

Fix DNS latency of 5s when use iptables forward #62764

Conversation

xiaoxubeii commented Apr 18, 2018

k8s-ci-robot commented Apr 18, 2018

dims commented Apr 18, 2018

xiaoxubeii commented Apr 19, 2018

xiaoxubeii commented Apr 23, 2018

xiaoxubeii commented Apr 23, 2018

xiaoxubeii commented May 10, 2018

xiaoxubeii commented May 10, 2018

MrHohn commented May 14, 2018

MrHohn commented May 14, 2018

Quentin-M commented May 15, 2018 • edited Loading

Quentin-M commented May 19, 2018

xiaoxubeii commented May 22, 2018

xiaoxubeii commented May 22, 2018

steven-sheehy commented Jun 22, 2018

Quentin-M commented Jun 24, 2018

steven-sheehy commented Jun 27, 2018

Quentin-M commented Jun 27, 2018 via email

Quentin-M commented May 15, 2018 •

edited

Loading