Use of upstreams increases DNS Issues. #6812

vaibhavkhurana2018 · 2021-02-08T10:08:07Z

Summary

We are using upstreams for forwarding the requests to targets. https://docs.konghq.com/getting-started-guide/2.1.x/load-balancing/#:~:text=An%20Upstream%20Object%20refers%20to,over%20multiple%20services%20(targets).
As per the above doc, we will have to pass the upstream name in the service configuration.
With the configurations in kong, given here, kong will try to find the service in the entire search path.
It is guaranteed that the upstream will not be available in the DNS server/resolver as the upstream is a virtual kong entity.
This causes unnecessary load on the DNS servers and also results in the following error on kong.

2021/02/08 14:12:25 [error] 25#0: *49573 [lua] balancer.lua:921: execute(): DNS resolution failed: dns server error: 3 name error. Tried: ["(short)api-service:(na) - cache-miss","api-service.edge.svc.cluster.local:33 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.svc.cluster.local:33 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.cluster.local:33 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.ap-south-1.compute.internal:33 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service:33 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.edge.svc.cluster.local:1 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.svc.cluster.local:1 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.cluster.local:1 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.ap-south-1.compute.internal:1 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service:1 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.edge.svc.cluster.local:5 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.svc.cluster.local:5 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.cluster.local:5 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service.ap-south-1.compute.internal:5 - cache-hit/stale/scheduled/dns server error: 3 name error","api-service:5 - cache-hit/stale/scheduled/dns server error: 3 name error"], client: 52.66.95.207, server: kong, request: "GET / HTTP/1.1", host: "<host>", referrer: "https://<host>"

SUMMARY_GOES_HERE

Steps To Reproduce

Create an upstream with targets
2.Create a service and add upstream as host.
Check logs of kong.

Additional Details & Logs

Kong version ($ kong version) 2.0.3

Possible Solution:

To have a way to configure the upstream resolution only on kong/tell kong service that the host is upstream as upstream is the virtual entity of kong.

The text was updated successfully, but these errors were encountered:

vaibhavkhurana2018 · 2021-02-08T10:11:51Z

@Tieske Tagging you based on the responses on other issues similar to dns. Thanks!

Tieske · 2021-02-08T10:31:58Z

I think this was already resolved, it has nothing to do with the actual dns resolution, but is a synchronisation issue. DNS resolution (including upstreams) is tried before the upstream becomes available. This causes the upstream-lookup to fail, which then causes a fall-through to the actual DNS client which starts querying the name server.

@bungle @kikito might have better idea of when this was exactly fixed.

mlatimer-figure · 2021-02-08T17:01:04Z

We are seeing this issue as well. However we are on the latest version of Kong (2.3.1) and the Kong Ingress Controller (1.1.1). Issue opened here: #6807

vaibhavkhurana2018 · 2021-02-09T07:20:56Z

@kikito Can you help us if this was fixed in some version. We are facing these issues in our environment and can't find a way out/workaround for this use case. Thanks!

hugoShaka · 2021-02-12T20:58:31Z

We moved to upstreams this week and encountered several DNS performance issues, including this one, thanks for raising this issue :)

How to reproduce

create kong proxies (here we'll be on kubernetes)
create a route/service/upstream/target: here both service and upstream are called fleet-system.fleet-kube-monitor-alertmanager-s2s-alerts.rule-0, the target is an ip address
capture DNS traffic
restart kong instances (kubectl delete on all pods)

What we observed

the restart comes with a spike of resolutions from kong (spike height depends on how many services/upstream you have)
resolutions are full of kong trying to resolve its upstreams
the impact is amplified by Kubernetes' default resolv.conf settings (because of ndots=5 it will amplify each resolution by trying to add garbage to the domains you resolve, like in the screenshot)
when fully started kong stops trying to resolve its upstream names

Details

Kong version 2.1.2 running on GKE, not using the kong ingress controller.

Other considerations and workarounds

We noticed an important increase of DNS resolutions by using upstreams compared to bare hostnames directly in the services, this was not linked to this bug. It seems to be the default behaviour. We did not investigate further and put the IPs directly in the targets to get rid of the resolution step.
The impact of the startup resolution spike on Kubernetes can be mitigated by setting ndots to 0 on all Kong pods

locao · 2021-03-05T16:41:49Z

Hi @hugoShaka! Thanks for your report. This one is a different problem from the one originally reported, this one is related to DNS warm-up and should be fixed when #6891 is merged. We will keep having DNS resolutions when starting Kong, but only for hosts that are not upstream names, which is the problem you are facing, right?

You may want to create a new issue for that if you want to follow the progress of that PR.

locao · 2021-03-05T16:48:26Z

Hello @vaibhavkhurana2018 @mlatimer-figure! Thanks for pointing that. Today we released Kong 2.3.3 that includes #6833. In that PR we made some changes that address the upstream name usage, which used to sometimes cause the reported problem when the balancer was under high load. Could you please test this version?

#5831 Seems like we lost that by mistake during the workspace refactor work. Fix #6812

stale · 2021-04-09T07:13:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mgupta0141 · 2021-12-24T10:39:44Z

@vaibhavkhurana2018 @mlatimer-figure
[error] 48#0: *3147821 [lua] balancer.lua:921: execute(): DNS resolution failed: dns lookup pool exceeded retries (1): timeout.
[error] 48#0: *3148632 [lua] balancer.lua:921: execute(): DNS resolution failed: dns server error: 3 name error.
Is the issue fixed after KONG upgrade to 2.3.3 ?

We are at kong version 2.0.3.

kikito added the core/balancer label Feb 8, 2021

locao mentioned this issue Mar 5, 2021

Kong DNS Server Errors: 3 Name Error #6807

Closed

locao added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Mar 10, 2021

ghost mentioned this issue Apr 5, 2021

fix(balancer) don't cache an empty upstream name dict #7002

Merged

ghost pushed a commit that referenced this issue Apr 5, 2021

fix(balancer) don't cache an empty upstream name dict

184d5b4

#5831 Seems like we lost that by mistake during the workspace refactor work. Fix #6812

locao pushed a commit that referenced this issue Apr 5, 2021

fix(balancer) don't cache an empty upstream name dict (#7002)

abf2ba4

#5831 Seems like we lost that by mistake during the workspace refactor work. Fix #6812

ghost mentioned this issue Apr 7, 2021

fix(balancer) don't assume upstreams don't exist after init phases #7010

Merged

stale bot added the stale label Apr 9, 2021

bungle closed this as completed Apr 9, 2021

sahil-sharma mentioned this issue Jul 22, 2021

DNS resolution failed in DBless Kong #7588

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of upstreams increases DNS Issues. #6812

Use of upstreams increases DNS Issues. #6812

vaibhavkhurana2018 commented Feb 8, 2021

vaibhavkhurana2018 commented Feb 8, 2021

Tieske commented Feb 8, 2021

mlatimer-figure commented Feb 8, 2021

vaibhavkhurana2018 commented Feb 9, 2021

hugoShaka commented Feb 12, 2021 •

edited

Loading

locao commented Mar 5, 2021

locao commented Mar 5, 2021

stale bot commented Apr 9, 2021

mgupta0141 commented Dec 24, 2021

Use of upstreams increases DNS Issues. #6812

Use of upstreams increases DNS Issues. #6812

Comments

vaibhavkhurana2018 commented Feb 8, 2021

Summary

Steps To Reproduce

Additional Details & Logs

vaibhavkhurana2018 commented Feb 8, 2021

Tieske commented Feb 8, 2021

mlatimer-figure commented Feb 8, 2021

vaibhavkhurana2018 commented Feb 9, 2021

hugoShaka commented Feb 12, 2021 • edited Loading

How to reproduce

What we observed

Details

Other considerations and workarounds

locao commented Mar 5, 2021

locao commented Mar 5, 2021

stale bot commented Apr 9, 2021

mgupta0141 commented Dec 24, 2021

hugoShaka commented Feb 12, 2021 •

edited

Loading