-
-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbound core dump on SIGSEGV #1099
Comments
What seems to be wrong with the time_t variable access is that it is not locked. The structure is locked just before when it is accessed. But this new access in 1.20.0, is not guarded by locks. I guess that the RRset can be updated by another thread in the mean time and then the pointer of the data becomes invalid while the thread is using it. The fix commit adds locking for the access. The fix commit makes it through the unit test set, but I have not been able to replicate the same situation as that is reported. Hopefully the fix actually fixes the issue that is reported. It is certainly an improvement to lock this item before accessing it, much like it is locked some lines earlier. |
BIG thank you, Wouter! |
is updated and fetched after it is stored, and also check for a changed RRset.
Another commit to fix it, and this one instead of only locking it, also tests for deleted RRset. That could crash, because the pointer dereference would fail. It also adds checks on the id value, also for the earlier lock code, so that it does not pick up a modified RRset, if it was very quickly deleted and reused for another piece of data. It also passes the unit tests, and I think it is an improvement. |
Great improvement! Many Thanks and waiting for new release! :) |
Hello, colleagues!
There is some strange issue has been obsorved under loaded Unbound's work: the program was crashed suddenly.
Unbound version: 1.20.0
System version: 4.18.0-240.el8.x86_64 (Centos 8)
Compiled with gcc: gcc version 8.5.0 20210514 (Red Hat 8.5.0-4) (GCC)
Compiled options and modules:
Version 1.20.0
Configure line: --disable-dnscrypt --enable-dnstap --enable-ecdsa --disable-event-api --enable-gost --with-libevent --disable-subnet --disable-tfo-client --disable-tfo-server --with-pthreads --prefix=/usr/local --localstatedir=/var --mandir=/usr/local/man --infodir=/usr/local/share/info/
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1g FIPS 21 Apr 2020
Linked modules: dns64 respip validator iterator
Below there are some details:
Core was generated by `/usr/local/sbin/unbound -c /srvs/unbound/etc/unbound/unbound.conf'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000004a5cb0 in store_rrsets (env=, env=, qrep=, qstarttime=1719417033, region=0x7f7a7a0c6180, pside=, leeway=, now=1719417033, rep=0x7f7ad81fb3b0)
at services/cache/dns.c:119
119 if(ttl < min_ttl) min_ttl = ttl;
[Current thread is 1 (Thread 0x7f7f9f7fe700 (LWP 518126))]
(gdb) list
114 rep->rrsets[i] = rep->ref[i].key;
115 }
116 /* if ref was updated make sure the message ttl is updated to
117 * the minimum of the current rrsets. /
118 ttl = ((struct packed_rrset_data)rep->rrsets[i]->entry.data)->ttl;
119 if(ttl < min_ttl) min_ttl = ttl;
120 }
121 if(min_ttl < rep->ttl) {
122 rep->ttl = min_ttl;
123 rep->prefetch_ttl = PREFETCH_TTL_CALC(rep->ttl);
This is simple if(ttl < min_ttl) min_ttl = ttl;
However min_ttl was not changed although min_ttl > ttl:
(gdb) print ttl
$1 = 1719417633
(gdb) print min_ttl
$2 = 1719503433
(gdb) bt
0 0x00000000004a5cb0 in store_rrsets (env=, env=, qrep=, qstarttime=1719417033, region=0x7f7a7a0c6180, pside=, leeway=, now=1719417033, rep=0x7f7ad81fb3b0)
at services/cache/dns.c:119
1 dns_cache_store_msg (qstarttime=1719417033, region=0x7f7a7a0c6180, flags=272, qrep=0x7f7a7a0c8ed0, pside=, leeway=, rep=0x7f7ad81fb3b0, hash=1152309568, qinfo=0x7f7f9f7fd440, env=0x23efc20)
at services/cache/dns.c:164
2 dns_cache_store (env=0x23efc20, msgqinf=, msgrep=0x7f7a7a0c8ed0, is_referral=, leeway=, pside=, region=0x7f7a7a0c6180, flags=272, qstarttime=1719417033)
at services/cache/dns.c:1092
3 0x000000000048a7f7 in iter_dns_store (env=, msgqinf=, msgrep=, is_referral=, leeway=, pside=, region=0x7f7a7a0c6180, flags=272,
qstarttime=1719417033) at iterator/iter_utils.c:697
4 0x0000000000494d4b in processQueryResponse (ie=, ie=, id=2, iq=0x7f7a7a0c6468, qstate=0x0) at iterator/iterator.c:3222
5 iter_handle (qstate=qstate@entry=0x7f7a7a0c6210, iq=iq@entry=0x7f7a7a0c6468, ie=ie@entry=0x23af880, id=id@entry=2) at iterator/iterator.c:4145
6 0x00000000004968f3 in process_response (qstate=, iq=, ie=, id=, outbound=, event=) at iterator/iterator.c:4377
7 0x0000000000483386 in mesh_run (mesh=0x7f7b912fd840, mstate=0x7f7a7a0c61c0, ev=, e=) at services/mesh.c:1932
8 0x00000000004ae613 in mesh_report_reply (what=0, reply=0x7f7f9f7fdbb0, e=0x7f7a7a0c8e80, mesh=) at services/mesh.c:856
9 worker_handle_service_reply (c=0x7f7b90f63790, arg=0x7f7a7a0c8e80, error=0, reply_info=0x7f7f9f7fdbb0) at daemon/worker.c:268
10 0x000000000042647f in serviced_callbacks (sq=, error=0, c=0x7f7b90f63790, rep=) at services/outside_network.c:3051
11 0x0000000000426bc0 in serviced_udp_callback (c=0x7f7b90f63790, arg=0x7f7b24f7b410, error=, rep=0x7f7f9f7fdbb0) at services/outside_network.c:3392
12 0x0000000000430fe7 in outnet_udp_cb (c=c@entry=0x7f7b90f63790, arg=0x7f7be5d06bc0, error=error@entry=0, reply_info=reply_info@entry=0x7f7f9f7fdbb0) at services/outside_network.c:1537
13 0x00000000004316a7 in comm_point_udp_callback (fd=2967, event=, arg=) at util/netevent.c:1145
14 0x00007f7fb0131d3a in event_persist_closure (ev=, base=0x7f7f68000c20) at event.c:1623
15 event_process_active_single_queue (base=base@entry=0x7f7f68000c20, activeq=0x7f7f68001070, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1682
16 0x00007f7fb01323e7 in event_process_active (base=0x7f7f68000c20) at event.c:1783
17 event_base_loop (base=0x7f7f68000c20, flags=) at event.c:2006
18 0x00000000004348cc in ub_event_base_dispatch (base=) at util/ub_event.c:280
19 comm_base_dispatch (b=) at util/netevent.c:282
20 0x00000000004b7b01 in worker_work (worker=0x23ee700) at daemon/worker.c:2325
21 thread_start (arg=0x23ee700) at daemon/daemon.c:636
22 0x00007f7fafef814a in start_thread () from /lib64/libpthread.so.0
23 0x00007f7faf538dc3 in clone () from /lib64/libc.so.6
So, i have investigted any variables and structures in the core file. And i have not found any strange values. By the way, the lines (dns.c):
/* if ref was updated make sure the message ttl is updated to
* the minimum of the current rrsets. /
ttl = ((struct packed_rrset_data)rep->rrsets[i]->entry.data)->ttl;
if(ttl < min_ttl) min_ttl = ttl;
}
if(min_ttl < rep->ttl) {
rep->ttl = min_ttl;
rep->prefetch_ttl = PREFETCH_TTL_CALC(rep->ttl);
rep->serve_expired_ttl = rep->ttl + SERVE_EXPIRED_TTL;
}
do present in Unbound 1.20.0 only. In early program versions there are no such lines. But i have seen the switch() block with three case-sections and i have intersted in case 1:
case 1: /* ref updated, item inserted /
rep->rrsets[i] = rep->ref[i].key;
Is there need for lock_rw_rdlock() call?
Just like its done in case 2:
case 2: / ref updated, cache is superior /
if(region) {
struct ub_packed_rrset_key ck;
lock_rw_rdlock(&rep->ref[i].key->entry.lock);
[...]
Please excuse me for maybe stupid question. But i have no clue why the progam was crashed on simple operation over time_t variables (that are not even pointers).
Thank you a lot in advance!
The text was updated successfully, but these errors were encountered: