-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
null pointer in services/outside_network.c:160 reuse_cmp / rbtree_find_less_equal in Unbound 1.13.0 release #411
Comments
Could you provide the configuration in use? |
Thanks! At least this excludes a path I was looking at: forwarders and tls configuration. |
I'm afraid a reproducer is not going to be likely given the nature of the input data to these instances; sorry. Still, if something comes up, I'll let you know. |
Hi @jcjones, I wasn't able to reproduce this. We have an attempt on a fix that is included in the ongoing release candidate for unbound 1.13.1. We have a branch for that, if you would like to try it out that would be great. |
No, we're not using |
Just an update that we've had 24 hours of 1.13.1rc1 so far with only a stuck process (100% CPU, unfortunately didn't get a core on the restart), no segfaults yet. |
We've seen a segfault on 1.13.1rc1; I have a core dump I'll analyze tonight. |
Log output would also be nice if available. We added a couple extra print error cases ("internal error: ...") when trying to fix this. |
Will do, I saw many of those.
On Tue, Feb 9, 2021 at 12:19 PM gthess ***@***.***> wrote:
Log output would also be nice if available. We added a couple extra print
error cases ("internal error: ...") when trying to fix this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#411 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD6TDRPPINTXIKQSE6NLVLS6GDDTANCNFSM4WUTKYBA>
.
--
--
James ‘J.C.’ Jones
Blog <https://insufficient.coffee/>
|
* nlnet/master: - Fix for Python 3.9, no longer use deprecated functions of PyEval_CallObject (now PyObject_Call), PyEval_InitThreads (now none), PyParser_SimpleParseFile (now Py_CompileString). Changelog note for 1.13.1 release and main branch is 1.13.2 in development. - release 1.13.1rc2 tag on branch-1.13.1 with added changes of 2 feb. - Fix indentation of root anchor for use by windows install script. Fixup to add to LIBS. And autoconf. - Fix windows dependency on libssp.dll because of default stack protector in mingw. - Fix dynlibmod link on rhel8 for -ldl inclusion. - branch-1.13.1 is created, with release-1.13.1rc1 tag. - Hide our time traveling abilities. - Attempt to fix NULL keys in the reuse_tcp tree; relates to NLnetLabs#411.
Stack trace:
I swear I saw internal errors somewhere recently, but perhaps it wasn't unbound. None of my unbound instances have logged "internal error" in accessible logs, nor the specific messages from a8485d5. Here's the log around the segfault, though it's pretty bare:
|
I'm still seeing these spuriously with 1.13.1 release, they look the same. E.g.:
No instances of "internal error" in the logs at all. No output from Most interesting to me is that I have two of these same segfaults occurring within 10 minutes of each other, whereas before they always seemed to need substantial time to reproduce:
|
This is still under investigation. As reported on #439, there is now extra logging that could help pinpoint the issue. |
I see one in the last three days:
That address is for |
Quick question: how many outgoing interfaces are available for that instance? |
Only one, with two IPv6 addresses (one route-able, one internal) and one IPv4 address. |
Got another:
I'm guessing more than this won't be useful unless I see something other than |
Is there any updates on this problem? We are experiencing a similar problem since 1.13.0 (now on 1.13.1 the same):
|
@Mityai: No update on this, still looking though. |
* nlnet/master: (61 commits) - Fix that testcode dohclient has OpenSSL initialisation calls. - Further fix for NLnetLabs#468: detect SSL_CTX_set_alpn_protos for build with OpenSSL 1.0.1. - Fix NLnetLabs#468: OpenSSL 1.0.1 can no longer build Unbound. Changelog note for NLnetLabs#466 - Merge NLnetLabs#466 from FGasper: Support OpenSSLs that lack SSL_get0_alpn_selected. Support OpenSSLs that lack SSL_get0_alpn_selected. - Remove unused functions worker_handle_reply and libworker_handle_reply. - Fix documentation comment for files previously residing in checkconf/. - Fix that nxdomain synthesis does not happen above the stub or forward definition. - Fix (increase) verbosity level for iterator error log in processQueryTargets(). - Fix permission denied sendto log, squelch the log messages unless high verbosity is set. - rebuild configure to set EXTRALINK to libunbound.la for NLnetLabs#460. - Fix for NLnetLabs#411: Depth protect for crash on deleted element timeout. - Fix to stop IPv6 PMTU discovery. Changelog note for NLnetLabs#460. - Merge NLnetLabs#460 from orbea: build: Link with the libtool archive. build: Link with the libtool archive. - Clean makedist.sh. - Fix stack-protector change to not override other CFLAGS options. - Disable the use of stack-protector for cross compiled 32-bit windows builds; relates to NLnetLabs#444. - Fix NLnetLabs#429: Also fix end of transfer for http download of auth zones. - Fix that cachedb does not produce empty object files when disabled. ...
Hello, just some info... pfsense and opnsense both have bugs open on this and forum threads. If people to help reproduce and debug are needed, that might be a good place to find them. In pfsense at least some people can get it to happen quite frequently. Links:
|
Thanks @internationils! It's always good to have more information. |
Hi @jcjones, @Mityai, @internationils, |
I can't test it, but the PFsense people have grabbed it already... |
Hi @jcjones, @Mityai, @internationils: I don't see any movement on the forum threads you posted above. Do you maybe have other information on the matter? |
* nlnet/master: - zonemd-check: yesno option, default no, enables the processing of ZONEMD records for that zone. - Merge NLnetLabs#496 from banburybill: Use build system endianness if available, otherwise try to work it out. Use build system endianness if available, otherwise try to work it out. - For NLnetLabs#492: Fix font highlighting for the man page on emacs. - Fix NLnetLabs#492: module-config respip missing in unbound.conf.5.in man page. Merges NLnetLabs#494 from he32. Remove comment line (?) from man page. Transplant parts of the contributed RPZ documentation. - Move the NSEC3 max iterations count in line with the 150 value used by BIND, Knot and PowerDNS. This sets the default value for it in the configuration to 150 for all key sizes. - Test code has -q option for quiet output. - Fix for NLnetLabs#411, NLnetLabs#439, NLnetLabs#469: Reset the DNS message ID when moving queries between TCP streams. - Refactor for uniform way to produce random DNS message IDs. Fix date in changelog. - Fix NLnetLabs#489: Compile using MSYS2 MinGW 64-bit. - Fix that auth-zone zonefiles use last TTL if no TTL is specified. Changelog note for NLnetLabs#487 - Merge PR NLnetLabs#487: ifdef RLIMIT_AS in recently added check. ifdef RLIMIT_AS in recently added check
Auto-closed by commit message, reopening. |
Hi @jcjones, @Mityai, @internationils, |
I would love to add this patch on pfSense development branches so more people can test but it's not applying on 1.13.1 cleanly. I'm going to work on fixing conflicts and see if it works |
Hi @rbgarga, Btw you would also need the following commits (in order, before the PR diff) that solve parts of the issue before the PR was created: |
t
I already have these 3 commits applied and removed test changes for #513 ending up with a patch that only touches services/outside_network.[ch]. 2 hunks fail to apply creating this reject file https://idaho.arrakis.com.br/files/outside_network.c.rej |
I sorted out first hunk but second seems to depend of any other change:
|
You need the first part for sure; the else including I believe there are no Hope this is clear. |
Awesome! Thanks! |
* nlnet/master: - Changelog entry for NLnetLabs#513: Stream reuse, attempt to fix NLnetLabs#411, NLnetLabs#439, NLnetLabs#469. - Fix readzone unknown type print for memory resize. - Fix unittcpreuse.c: properly initialise outnet. - Remove redundant log_assert and fix error messages. - stream reuse, do not explicitly wait for a free pending_tcp if a reuse could be used. Changelog note for NLnetLabs#512 - Merge NLnetLabs#512: unbound.service.in: upgrade hardening to latest standards. unbound.service.in: upgrade hardening to latest standards - Add unittest for tcp_reuse functions. - stream reuse, move log_assert to the correct location. - stream reuse, clean links on structs that are unlinked from a list. - Fix for NLnetLabs#411, NLnetLabs#439, NLnetLabs#469: stream reuse, fix loop in the free pending_tcp list. - Fix for NLnetLabs#411, NLnetLabs#439, NLnetLabs#469: stream reuse, fix outnet deletion for all non-free pending_tcp. - Fix for NLnetLabs#411, NLnetLabs#439, NLnetLabs#469: stream reuse, fix LRU list when reuse is already in the tree. - Fix for NLnetLabs#411, NLnetLabs#439, NLnetLabs#469: stream reuse, fix linking when touching the tcp_reuse LRU list. - More log_assert for stream reuse operations. - Fix that ldns_zone_new_frm_fp_l counts the line number for an empty line after a comment.
@rbgarga, 1.13.2 is now released which includes the aforementioned patches. I believe it solves the occasional crash while reloading that I've been reading on the pfsense forum. @jcjones, @Mityai, we believe 1.13.2 solves this issue. I leave the issue open and feel free to close/update based on your experience. |
Closing as inactive; the observed issues seem resolved. |
Using my amd64 linux on centos7 build from #393 (comment) with these commits added to 1.13.0:
I am still getting rare crashes. I've caught one here, in
reuse_cmp
having a nullptr forkey2
, coming fromnode->key
:unbound/util/rbtree.c
Lines 525 to 528 in ca49781
The backtrace is:
At the null pointer,
*node
is normal except for the null:The core file is 6.36 GB; I can certainly share it and the centos7 rpm files out-of-band if you'd like to investigate directly, or I am happy to dig around in the core file in response to your questions. Thanks again!
The text was updated successfully, but these errors were encountered: