Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control listener aborts with weird message #1662

Closed
afics opened this issue Aug 12, 2014 · 6 comments
Closed

Control listener aborts with weird message #1662

afics opened this issue Aug 12, 2014 · 6 comments

Comments

@afics
Copy link

afics commented Aug 12, 2014

OS: Fedora 20
pdns version: 3.3.1, installed via yum from the official repos
Backends: BIND
DNSSEC: Yes

Aug 12 12:28:02 host pdns[2206]: Reload was requested
Aug 12 12:28:02 host pdns[2206]: Notification request for domain 'zone2' received from operator
Aug 12 12:28:02 host pdns[2206]: Zone 'zone2' (/zones/zone2) needs reloading
Aug 12 12:28:02 host pdns[2206]: Zone 'zone2' (/zones/zone2) reloaded
Aug 12 12:28:02 host pdns[2206]: Zone 'zone1' (/zones/zone1) needs reloading
Aug 12 12:28:02 host pdns[2206]: Zone 'zone1' (/zones/zone1) reloaded
Aug 12 12:28:02 host pdns[2206]: Fatal error in control listener: Zone for 'zone1' in '/zones/zone1' being reloaded
Aug 12 12:28:02 host pdns[2206]: Control listener aborted, please file a bug!

My actual zone names are in the form domain.tld
This happens after I run the following script to deploy my locally (re)signed zones.

scp zone1.signed root@host:
ssh root@host mv /root/zone1.signed /srv/dns/named/zones/zone1
ssh root@host pdns_control reload
ssh root@host pdns_control notify zone1
@Habbie Habbie added this to the auth-3.4.0 milestone Aug 12, 2014
@cmouse
Copy link
Contributor

cmouse commented Aug 12, 2014

This is fully repeatable with just one zone. I used the following setup:

  • Make the environment read-only for pdns_server process
  • Provision one pre-signed bind zonefile and configure it
  • I used following settings:
allow-axfr-ips=127.0.0.0/8,::1
disable-axfr=no
disable-tcp=no
master=yes
setgid=nogroup
setuid=nobody
slave=yes
launch=bind
bind-config=/home/cmouse/pdns/etc/named.conf
bind-dnssec-db=/home/cmouse/pdns/etc/ro/bind.db
daemon=no 
loglevel=65535 
query-logging=yes 
log-dns-queries=yes 
  • Start powerdns process
  • Run pdns_control reload && pdns_control notify zonename repeatedly (3-10 times)
  • Wait for about 10 minutes
  • You can no longer get any responses for lookup or notify and with gdb you should see several threads stuck on pthreads_rwlocks_wrlocks() waiting for s_state_lock

@cmouse
Copy link
Contributor

cmouse commented Aug 12, 2014

PowerDNS log when stuck

Aug 12 22:19:14 Reading random entropy from '/dev/urandom'
Aug 12 22:19:14 This is a standalone pdns
Aug 12 22:19:14 Listening on controlsocket in './pdns.controlsocket'
Aug 12 22:19:14 UDP server bound to 0.0.0.0:53
Aug 12 22:19:14 TCP server bound to 0.0.0.0:53
Aug 12 22:19:14 PowerDNS Authoritative Server git-rec-3.6.0-296-g19cf1f6 (cmouse@desk01) (C) 2001-2014 PowerDNS.COM BV
Aug 12 22:19:14 Using 64-bits mode. Built on 20140812200939 by cmouse@desk01, gcc 4.8.2.
Aug 12 22:19:14 PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it according to the terms of the GPL version 2.
Aug 12 22:19:14 Set effective group id to 65534
Aug 12 22:19:14 Set effective user id to 65534
Aug 12 22:19:14 Creating backend connection for TCP
Aug 12 22:19:14 [bindbackend] Parsing 1 domain(s), will report when done
Aug 12 22:19:14 Master/slave communicator launching
Aug 12 22:19:14 [bindbackend] parsing 'z-ix.net' from file '/home/cmouse/pdns/etc/z-ix.net.txt'
Aug 12 22:19:14 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
Aug 12 22:19:14 [bindbackend] Done parsing domains, 0 rejected, 1 new, 0 removed
Aug 12 22:19:14 No new unfresh slave domains, 0 queued for AXFR already
Aug 12 22:19:14 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:14 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:14 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:14 End of answers
Aug 12 22:19:14 No master domains need notifications
Aug 12 22:19:14 About to create 3 backend threads for UDP
Aug 12 22:19:14 Done launching threads, ready to distribute questions
Aug 12 22:19:21 Reload was requested
Aug 12 22:19:21 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:19:21 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Lookup for 'NS' of 'z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 Returning: 'NS' of 'z-ix.net', content: '*.net', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.q-ix.net'
Aug 12 22:19:21 Found no authoritative zone for ns0.q-ix.net
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.sotecware.net'
Aug 12 22:19:21 Found no authoritative zone for ns0.sotecware.net
Aug 12 22:19:21 Lookup for 'ANY' of 'ns0.z-ix.net'
Aug 12 22:19:21 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:21 Returning: 'A' of 'ns0.z-ix.net', content: '*.54', prio: 0
Aug 12 22:19:21 Returning: 'AAAA' of 'ns0.z-ix.net', content: '*:cafe:0:1', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'A 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. g4Oi...', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'AAAA 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. hm1A...', prio: 0
Aug 12 22:19:21 Returning: 'RRSIG' of 'ns0.z-ix.net', content: 'NSEC 10 3 10800 20140911131133 20140812131133 57041 z-ix.net. J/q4....', prio: 0
Aug 12 22:19:21 End of answers
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to *.54:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to *.248:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:b01::1]:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:cafe:0:1]:53
Aug 12 22:19:21 Queued notification of domain 'z-ix.net' to [*:1305::2]:53
Aug 12 22:19:22 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:22 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:22 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:23 Received unsuccessful notification report for 'z-ix.net' from *.54:53, error: Not Implemented
Aug 12 22:19:23 Removed from notification list: 'z-ix.net' to *.54:53Not Implemented
Aug 12 22:19:23 Received unsuccessful notification report for 'z-ix.net' from *.248:53, error: Query Refused
Aug 12 22:19:23 Removed from notification list: 'z-ix.net' to *.248:53Query Refused
Aug 12 22:19:25 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [2*:b01::1]:53: Network is unreachable
Aug 12 22:19:25 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:25 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:28 Reload was requested
Aug 12 22:19:28 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:19:28 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:19:28 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:19:28 Returning: 'SOA' of 'z-ix.net', content: 'ns0.z-ix.net. dnsmaster.z-ix.net. 2014081256 10800 3600 604800 10800', prio: 0
Aug 12 22:19:28 End of answers
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to *.54:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to *.248:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:b01::1]:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:cafe:0:1]:53
Aug 12 22:19:28 Queued notification of domain 'z-ix.net' to [*:1305::2]:53
Aug 12 22:19:29 Error trying to resolve '[*:e101:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:e101:b01::1]:53: Network is unreachable
Aug 12 22:19:29 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:29 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to *:1305::2]:53: Network is unreachable
Aug 12 22:19:30 Received unsuccessful notification report for 'z-ix.net' from *.54:53, error: Not Implemented
Aug 12 22:19:30 Removed from notification list: 'z-ix.net' to *.54:53Not Implemented
Aug 12 22:19:30 Received unsuccessful notification report for 'z-ix.net' from *.248:53, error: Query Refused
Aug 12 22:19:30 Removed from notification list: 'z-ix.net' to *.248:53Query Refused
Aug 12 22:19:30 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:30 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:30 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve ' [*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:32 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:37 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:39 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:b01::1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:b01::1]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:cafe:0:1]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:cafe:0:1]:53: Network is unreachable
Aug 12 22:19:46 Error trying to resolve '[*:1305::2]:53' for notifying 'z-ix.net' to server: Unable to send notify to [*:1305::2]:53: Network is unreachable
Aug 12 22:19:56 Notification for z-ix.net to [*:b01::1]:53 failed after retries
Aug 12 22:19:56 Notification for z-ix.net to [*:cafe:0:1]:53 failed after retries
Aug 12 22:19:56 Notification for z-ix.net to [*:1305::2]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to [*:b01::1]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to [*:cafe:0:1]:53 failed after retries
Aug 12 22:20:03 Notification for z-ix.net to *:1305::2]:53 failed after retries
Aug 12 22:20:14 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:20:14 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:20:14 Zone 'z-ix.net' (/home/cmouse/pdns/etc/z-ix.net.txt) needs reloading
Aug 12 22:20:14 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
Aug 12 22:27:56 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:01 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:06 Remote 127.0.0.1 wants 'z-ix.net|SOA', do = 0, bufsize = 1680: packetcache MISS
Aug 12 22:28:16 Reload was requested
Aug 12 22:28:22 Notification request for domain 'z-ix.net' received from operator
Aug 12 22:28:22 Lookup for 'SOA' of 'z-ix.net'
Aug 12 22:28:22 Found a zone 'z-ix.net' (with id 1) that might contain data 
Aug 12 22:28:22 Zone 'z-ix.net' (/home/cmouse/pdns/etc/z-ix.net.txt) needs reloading
Aug 12 22:28:22 Query: select content from domainmetadata where domain='z-ix.net' and kind='NSEC3PARAM'
 ^CAug 12 22:32:28 PDNS instance is going down, taking children with us
Killed

@cmouse
Copy link
Contributor

cmouse commented Aug 12, 2014

It appears that the culrpit could be this, as this is the first thread to get stuck.

#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
#1  0x0000000000694eba in WriteLock::WriteLock (this=0x7ffff3b8b1d0, lock=0xb84c40 <Bind2Backend::s_state_lock>) at ../../pdns/lock.hh:63
#2  0x0000000000687e5d in Bind2Backend::safePutBBDomainInfo (bbd=...) at bindbackend2.cc:166
#3  0x000000000068fbba in Bind2Backend::queueReloadAndStore (this=0x7fffe4000a50, id=1) at bindbackend2.cc:856
#4  0x000000000069195f in Bind2Backend::lookup (this=0x7fffe4000a50, qtype=..., qname=..., pkt_p=0x0, zoneId=-1) at bindbackend2.cc:1063
#5  0x000000000070d7f8 in DNSBackend::getSOA (this=0x7fffe4000a50, domain=..., sd=..., p=0x0) at dnsbackend.cc:246
#6  0x00000000006891d2 in Bind2Backend::getUpdatedMasters (this=0x7fffe4000a50, changedDomains=0x7ffff3b8bcb0) at bindbackend2.cc:303
#7  0x000000000074c3d0 in UeberBackend::getUpdatedMasters (this=0x7ffff3b8bd90, domains=0x7ffff3b8bcb0) at ueberbackend.cc:298
#8  0x0000000000787c93 in CommunicatorClass::masterUpdateCheck (this=0xb85800 <Communicator>, P=0x7ffff3b8bd80) at mastercommunicator.cc:129
#9  0x000000000078c5aa in CommunicatorClass::mainloop (this=0xb85800 <Communicator>) at communicator.cc:102
#10 0x000000000078cc5d in CommunicatorClass::launchhelper (p=0xb85800 <Communicator>) at communicator.hh:173
#11 0x00007ffff7436182 in start_thread (arg=0x7ffff3b8c700) at pthread_create.c:312
#12 0x00007ffff5ea038d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@cmouse
Copy link
Contributor

cmouse commented Aug 19, 2014

The final reason was a double-locking in getUpdatedMasters, where the same lock is first locked read-only and then deeper in the stack separate attempt is made for write lock.

ahupowerdns added a commit that referenced this issue Aug 19, 2014
should fix #1662, reloading while checking for domains that need to be n...
@Habbie
Copy link
Member

Habbie commented Aug 19, 2014

@afics @cmouse can you test?

@Habbie Habbie reopened this Aug 19, 2014
@cmouse
Copy link
Contributor

cmouse commented Aug 19, 2014

I cannot repeat this issue anymore. So it would appear fixed.

@Habbie Habbie closed this as completed Aug 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants