Skip to content

single specific fqdn "stuck" in hostdb #8417

@bdgranger

Description

@bdgranger

We are seeing an issue with hostdb in our fork (ATS 8.1.2 plus some cherry-picks from 9.x branch).

Very rarely in a production environment, we see a hostname get "stuck" in hostdb such that it never resolves again. We see a fairly constant string of "delaying force 0 answer for [timeout 0]" messages, and then after 30 seconds hostdb times out and the cache returns a 502 error to the client. Never during this 30 seconds do we see a request go down to the (dns) level for actual resolution. All the nameservers are functioning properly and every other hostname is being properly resolved. A "dig" command of the same fqdn from the command line on that server also properly resolves, so it does not appear to be a DNS issue. We have seen this maybe 4 times over the past couple months on only 1 or 2 servers out of a deployment of several hundred caches, so it does not happen often. It's never the same hostname that gets "stuck".

An ATS restart is required to clear the condition.

Another user (Nir Finkel) reported the same symptoms in the slack channel as well, using the 9.0.2 release of ATS.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions