Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional SERVFAIL for ocsp.int-x3.letsencrypt.org. (hosted by akamai) #19

Open
tmolitor-stud-tu opened this issue Apr 23, 2019 · 7 comments

Comments

@tmolitor-stud-tu
Copy link

I'm running unbound 1.8.1 (debian version 1.8.1-1+b1) on my debian server.
When I first recognized the issue I was running unbound 1.6.0-3+deb9u2, but upgrading to 1.8.1 did not help.
I'm running unbound in single threaded mode to eliminate possible threading issues.

Occasionally unbound returns SERVFAIL for queries for ocsp.int-x3.letsencrypt.org. which is hosted by akamai.
I don't have any statistics but managed to get hold of such a query by running dig every minute (which produced a SERVFAIL response after about 3-4 days):

; <<>> DiG 9.10.3-P4-Debian <<>> +multiline ocsp.int-x3.letsencrypt.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 27104
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;ocsp.int-x3.letsencrypt.org. IN        A

;; Query time: 343 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Apr 22 18:15:33 CEST 2019
;; MSG SIZE  rcvd: 56

I also configured unbound with verbosity 2.
If you need, I can give you the full log for the corresponding timeframe, but some
loglines that seem suspicious to me are:

Apr 22 18:15:33 master unbound[3307]: [3307:0] info: response for ocsp.int-x3.letsencrypt.org. A IN
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: reply from <akamai.net.> 184.26.160.193#53
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: query response was nodata ANSWER

followed later by:

Apr 22 18:15:33 master unbound[3307]: [3307:0] info: response for ocsp.int-x3.letsencrypt.org. A IN
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: reply from <akamai.net.> 2600:1480:1::c1#53
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: Capsforid: reply is equal. go to next fallback
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: response for ocsp.int-x3.letsencrypt.org. A IN
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: reply from <akamai.net.> 23.74.25.192#53
Apr 22 18:15:33 master unbound[3307]: [3307:0] info: Capsforid fallback: getting different replies, failed
@tmolitor-stud-tu tmolitor-stud-tu changed the title occassional SERVFAIL for ocsp.int-x3.letsencrypt.org. (hosted by akamai) Occasional SERVFAIL for ocsp.int-x3.letsencrypt.org. (hosted by akamai) Apr 23, 2019
@ralphdolmans
Copy link
Contributor

You have use-caps-for-id enabled, which seems to fail for this domain. I'm not able to reproduce it here but your log should say something like "wrong 0x20-ID in reply qname", and then logging the server and packet with wrong caps.

After 0x20 fails Unbound has some fall-backs to work around this, which also fail as you see in your last log lines. Unbound 1.9.1. has some improvements in the 0x20 fallback, so it might be working there.

It is possible to disable 0x20 for a individual domain using caps-whitelist:

      use-caps-for-id: <yes or no>
              Use 0x20-encoded random bits in the query to foil spoof attempts.  This perturbs the lowercase and uppercase of  query
              names sent to authority servers and checks if the reply still has the correct casing.  Disabled by default.  This fea‐
              ture is an experimental implementation of draft dns-0x20.

       caps-whitelist: <domain>
              Whitelist the domain so that it does not receive caps-for-id perturbed queries.  For domains that do not support  0x20
              and  also fail with fallback because they keep sending different answers, like some load balancers.  Can be given mul‐
              tiple times, for different domains.

@tmolitor-stud-tu
Copy link
Author

your log should say something like "wrong 0x20-ID in reply qname", and then logging the server and packet with wrong caps.

It doesn't say anything like this. What verbosity level is needed to see that? Currently my verbosity is set to 2.

@tmolitor-stud-tu
Copy link
Author

Some more thoughts: could it be this bug? Was this ever fixed?
https://nlnetlabs.nl/pipermail/unbound-users/2018-July/010793.html

@ralphdolmans
Copy link
Contributor

your log should say something like "wrong 0x20-ID in reply qname", and then logging the server and packet with wrong caps.

It doesn't say anything like this. What verbosity level is needed to see that? Currently my verbosity is set to 2.

Other reasons to start the fallback include not getting a response at all (logged with Capsforid: timeouts and failure to scrub the response (logged with Capsforid: scrub failed). Logged with verbosity level 2.

As mentioned before, 1.9.1 has improvements in the 0x20 fallback handling so it might work there. Since this domain is hosted on a CDN it is also not unlikely that different answers are returned on purpose, in which case you could add this domain to the caps-whitelist.

Some more thoughts: could it be this bug? Was this ever fixed?
https://nlnetlabs.nl/pipermail/unbound-users/2018-July/010793.html

That should have been fixed in 3f2d186.

@tmolitor-stud-tu
Copy link
Author

Timeout could be possible, but this timeout message could as well be related to another query.
I will try with 1.9.1 and report back if the problem still persists.

Nonetheless I attached my full log for you to inspect if the timeout really belongs to the ocsp query or the other one nearby:
unbound.log

@tmolitor-stud-tu
Copy link
Author

I'm using unbound 1.9.0 and

use-caps-for-id: yes
caps-whitelist: ocsp.int-x3.letsencrypt.org.

and still get the same error (SERVFAIL) for ocsp.int-x3.letsencrypt.org.

No new log though (because I disabled verbose logging when I configured whitelisting.
Do you need/want any new logs?

@tmolitor-stud-tu
Copy link
Author

Should I whitelist the CNAMES (ocsp.int-x3.letsencrypt.org.edgesuite.net. and a771.dscq.akamai.net.) instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants