Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnsdist: "Out" queries become negative over time #4344

Closed
fudoreaper opened this issue Aug 22, 2016 · 7 comments
Closed

dnsdist: "Out" queries become negative over time #4344

fudoreaper opened this issue Aug 22, 2016 · 7 comments

Comments

@fudoreaper
Copy link

Version: dnsdist 1.0.0
Source: repo.powerdns.com/ubuntu
OS: Ubuntu 14.04 64-bit, as a KVM guest on Ubuntu 14.04 hypervisor

Conditions: dnsdist is running as a recursive load balancer. It has 10 servers specified as resolving 'engines'; 5 logical boxes with an IPv4 and IPv6 address each. 4 servers are operated by me, running BIND, Power, Ubound and Knot, in Order Pool #1. In Order pool #2 i have HE.net anycast. Policy is 'leastOutstanding'. Query rate is about 50-100 qps, with 500 - 1000 qps sgpeaks.

Observed behaviour: After about 2 days at 15 million queries, the 'Out' queries on the IPv4 Power DNS node begins to show negative values. I am seeing this in the web dashboard interface. Currently seeing after 5 days uptime, 35 million queries, -5 'out. This is after i stop the powerDNS recursor so that the 'out' value stays stable.

Expected behaviour: The "out" queries values should NEVER be below 0.

Steps to repeat: This does seem repeatable, i've seen it twice now. Unsure the conditions that trigger it, except more than a day uptime and more than 10 million queries.

Effect: using the 'leastOutstanding' policy, a high number or queries are directed to to the IPv4 node with a -5 'out' query baseline. It has a 5 query headstart, or 'handicap'. This causes the leastOustanding policy, which uses 'out' queries as the first sorting value, to favour the one server unreasonably.

I am unaware of any logs which shed light on this problem, but would be happy to provide something if the developers ask me.

@rgacogne
Copy link
Member

Would you mind posting the results of dumpStats() and showServers() if you have access to the console?

@fudoreaper
Copy link
Author

Sure, i've got full access:

> dumpStats()
acl-drops                    772812    latency0-1                  25265824
block-filter                      0    latency1-10                   487367
cache-hits                 24961077    latency10-50                 4617890
cache-misses               11259115    latency100-1000              2741187
cpu-sys-msec                2633832    latency50-100                2226348
cpu-user-msec               2091980    no-policy                          0
downstream-send-errors            0    noncompliant-queries               2
downstream-timeouts           83500    noncompliant-responses            23
dyn-block-nmg-size                0    queries                     36220205
dyn-blocked                       0    rdqueries                   36213170
empty-queries                     5    real-memory-usage           73359360
fd-usage                        153    responses                   11175617
latency-avg100                27176.5  rule-drop                          0
latency-avg1000               27366.3  rule-nxdomain                      0
latency-avg10000              29350.0  self-answered                      0
latency-avg1000000            31961.9  servfail-responses             85534
latency-slow                 149538    trunc-failures                     0
                                       uptime                        427314

> showServers()
#   Name                 Address                       State     Qps    Qlim Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
0   XXXXXX RENS1 - BIND 208.81.7.153:53                  up     1.0     100   1  1     555391    5960   0.0 178.8           0
1   XXXXXX RENS1 - BIND [2605:e200:53:101::]:53          up     0.0     100   1  1     553626    6338   0.0 196.4           0
2   XXXXXX RENS2 - Powe 208.81.7.155:53                  up    31.7     100   1  1    5182355   22472   0.0  89.2 18446744073709551612
3   XXXXXX RENS2 - Powe [2605:e200:53:102::]:53          up     2.0     100   1  1    1188925    5584   0.0  89.3           0
4   XXXXXX RENS3 - Unbo 208.81.7.157:53                  up     0.0     100   1  1     306676    3733   0.0 223.1           0
5   XXXXXX RENS3 - Unbo [2605:e200:53:103::]:53          up     0.0     100   1  1     298736    3850   0.0 206.0           0
6   XXXXXX RENS4 - Knot 208.81.7.159:53                  up     1.0     100   1  1    1327503   16724   0.0 138.1           0
7   XXXXXX RENS4 - Knot [2605:e200:53:104::]:53          up     1.0     100   1  1    1352867   15877   0.0 146.4           0
8   HE.net v4            74.82.42.42:53                   up     0.0     500   2  1     246759    1420   0.0 163.9           0
9   HE.net v6            [2001:470:20::2]:53              up     0.0     500   2  1     249439    1553   0.0 133.0           0
All                                                             32.0                  11262277   83511```

@ahupowerdns
Copy link
Contributor

ahupowerdns commented Aug 22, 2016

I wonder if 'empty-queries=5' is a hint.. (update: unlikely)

@ahupowerdns
Copy link
Contributor

of note, TCP/IP also touches the 'outstanding' field..

@Habbie
Copy link
Member

Habbie commented Mar 7, 2017

Is this still happening on 1.1.0 or master?

@fudoreaper
Copy link
Author

fudoreaper commented Mar 7, 2017 via email

@rgacogne
Copy link
Member

rgacogne commented Aug 7, 2019

Closing this issue, I will re-open if the problem can be reproduced on a recent version.

@rgacogne rgacogne closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants