Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stubby started crashing randomly #295

Closed
froschmett opened this issue Aug 5, 2021 · 13 comments
Closed

stubby started crashing randomly #295

froschmett opened this issue Aug 5, 2021 · 13 comments

Comments

@froschmett
Copy link

froschmett commented Aug 5, 2021

Hi,
I am running stubby on a Ubuntu.
dpkg -l | grep stubby
stubby 1.4.0-1 amd64 modern asynchronous DNS API (stub resolver)

lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

uname -a
4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Since today stubby started to crash randomly and I cant imagine why. It ran without any trouble for the last 8 months or so.
Because it crashes quite a lot I added some line to the systemd-config so that it restarts on failure.

The failure it throws when it crashes is:

stubby[2557]: stubby: ./gldns/gbuffer.h:461: gldns_buffer_write_at: Assertion `gldns_buffer_available_at(buffer, at, count)' failed.
Aug 5 01:55:46 u41 systemd[1]: stubby.service: Main process exited, code=dumped, status=6/ABRT
Aug 5 01:55:46 u41 systemd[1]: stubby.service: Failed with result 'core-dump'.

Please let me know if you need further information.
I would be really happy if someone can shed some light on this, help or fix this :)

Many thanks and Best Regards,
froschmett

@saradickinson
Copy link
Contributor

@froschmett Thanks for the report - does seem strange!
Are you able to provide a core-dump file and your stubby.yml config to help with debugging? (Can be done offline)

@wtoorop does this look familiar at all to you?

@froschmett
Copy link
Author

froschmett commented Aug 5, 2021

@saradickinson
Hi,

I had some trouble getting the core dump to work, I hope that it worked. Here are the requested files.
I had to rename them to upload:
stubby.txt -> stubby.yml
core....txt -> core....lz4

Thanks and Best Regards,
froschmett
stubby.txt
core.stubby.64707.8906aa4b60b3496382f789b7408c834a.938.1628171207000000.txt

@froschmett
Copy link
Author

@saradickinson
Hi,

I installed my setup from the scratch on a totally new install of Ubuntu Server 20.04.2 LTS.

To my surprise the same error still persists. I really have no clue what happened or what is going on.
Just wanted to let you know.

Best,
froschmett

@pitpompej
Copy link

Hi,
I have the exact same error on my stubby instance running on a raspberry pi, also since about 4-5 days.
stubby 1.5.1-1 armhf
Best regards,
pit

@froschmett
Copy link
Author

Hi,

it seems that this issue is Debian/Ubuntu related.

I installed my setup with latest fedora server from the scratch and it seems to run stable. Fedora uses 0.3.0 via its repository which ran stable so far. I also compiled 0.4.0 and it seems to run stable too. I will survey this further and report back.

Best,
froschmett

@sn1987
Copy link

sn1987 commented Aug 13, 2021

Hi,

i have the same isseue for 3-4 days. I'm running stubby 0.4.0 on CentOS 8.4

Version: CentOS Linux release 8.4.2105

Kernel: 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Stack trace:

Process 139661 (stubby) of user 64707 dumped core.

                                                                 Stack trace of thread 139661:
                                                                 #0  0x00007fdffffd337f raise (libc.so.6)
                                                                 #1  0x00007fdffffbddb5 abort (libc.so.6)
                                                                 #2  0x00007fdffffbdc89 __assert_fail_base.cold.0 (libc.so.6)
                                                                 #3  0x00007fdffffcba76 __assert_fail (libc.so.6)
                                                                 #4  0x0000000000421c05 gldns_buffer_write_at (stubby)
                                                                 #5  0x0000000000421cc7 gldns_buffer_write (stubby)
                                                                 #6  0x000000000042569f _getdns_verify_rrsig (stubby)
                                                                 #7  0x0000000000426229 dnskey_signed_rrset (stubby)
                                                                 #8  0x0000000000426435 a_key_signed_rrset (stubby)
                                                                 #9  0x00000000004287a0 chain_head_validate_with_ta (stubby)
                                                                 #10 0x0000000000428a94 chain_head_validate (stubby)
                                                                 #11 0x0000000000428cb7 chain_set_netreq_dnssec_status (stubby)
                                                                 #12 0x0000000000429a9f check_chain_complete (stubby)
                                                                 #13 0x000000000042465a val_chain_node_cb (stubby)
                                                                 #14 0x000000000042ba05 _getdns_check_dns_req_complete (stubby)
                                                                 #15 0x000000000043dcfe upstream_read_cb (stubby)
                                                                 #16 0x000000000044573c poll_read_cb (stubby)
                                                                 #17 0x0000000000445ecf poll_eventloop_run_once (stubby)
                                                                 #18 0x00000000004461e9 poll_eventloop_run (stubby)
                                                                 #19 0x00000000004172fa getdns_context_run (stubby)
                                                                 #20 0x0000000000405bc6 main (stubby)
                                                                 #21 0x00007fdffffbf493 __libc_start_main (libc.so.6)
                                                                 #22 0x000000000040519e _start (stubby)

Best regards,

sn1987

@saradickinson
Copy link
Contributor

Thanks everyone for gathering for all the info - this looks to me like a problem with DNSSEC validation in getdns. Since it started happening out of the blue, I'm wondering if a lookup suddenly started returning an RRSIG that getdns chokes on.

One option is to try disabling local DNSSEC validation (which is OK if you are using a validating resolver) and see if the crashes stop.

If any of you are able to figure out what lookup triggers this, that would be very helpful. There is no logging in stubby that can help with that, but if you happen to be able to grab a tcpdump/wireshark capture on your local interface it will show the names being looked up so we can try to reproduce this....

@wtoorop Any thoughts?

@pitpompej
Copy link

pitpompej commented Dec 14, 2021

Hi, it has been a while but I might have found the lookup that triggers the crash:
dig -t NAPTR @127.0.0.1 -p 10053 rbm.mavenir.com
leads to a reproduceable crash on my stubby with the following log

Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[10:28:44.510246] STUBBY: Read config from file /etc/stubby/stubby.yml
[10:28:44.690510] STUBBY: DNSSEC Validation is ON
[10:28:44.694196] STUBBY: Transport list is:
[10:28:44.697517] STUBBY:   - TLS
[10:28:44.698833] STUBBY: Privacy Usage Profile is Strict (Authentication required)
[10:28:44.699143] STUBBY: (NOTE a Strict Profile only applies when TLS is the ONLY transport!!)
[10:28:44.699427] STUBBY: Starting DAEMON....
stubby: ./gldns/gbuffer.h:461: gldns_buffer_write_at: Assertion `gldns_buffer_available_at(buffer, at, count)' failed.

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0xb6de7230 in __GI_abort () at abort.c:79
#2  0xb6df4ba8 in __assert_fail_base (fmt=0xb6efb6a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0xb6f74870 "gldns_buffer_available_at(buffer, at, count)",
    assertion@entry=0xb6fee040 "", file=0xb6f7458c "./gldns/gbuffer.h", file@entry=0xb6f75034 "gldns_buffer_write_at", line=461, line@entry=3069163176,
    function=function@entry=0xb6f75034 "gldns_buffer_write_at") at assert.c:92
#3  0xb6df4c5c in __GI___assert_fail (assertion=0xb6fee040 "", file=0xb6f75034 "gldns_buffer_write_at", line=3069163176, function=0xb6f75034 "gldns_buffer_write_at")
    at assert.c:101
Backtrace stopped: Cannot access memory at address 0x7370696a

.

calling
dig -t A @127.0.0.1 -p 10053 rbm.mavenir.com
results in a valid answer but the NAPTR type request results in a crash.
Maybe that helps.
Regards

@wtoorop
Copy link
Contributor

wtoorop commented Dec 15, 2021

Thank you @pitpompej !! I will take a look shortly. Can you ping me if I haven't replied in 7 days? Thanks!

@pitpompej
Copy link

Hi,
as requested by you @wtoorop a ping on this topic. A little late, I know, shame on me ;-)
Seasons greatings

@wtoorop
Copy link
Contributor

wtoorop commented Jan 10, 2022

FYI I can reproduce!! Thank you @pitpompej for providing a reliable way to invoke this bug.
Calculation of the needed space for the validation buffer (lines 1492 till 1522 of dnssec.c here) doesn't match the actually needed space for RRs that may have compressed dnames (i.e. from lin 1545 till 1572 of dnssec.c here)

wtoorop added a commit to getdnsapi/getdns that referenced this issue Jan 10, 2022
rdata not correctly written for validation for certain RR types
@wtoorop
Copy link
Contributor

wtoorop commented Jan 12, 2022

Let me quickly drop some notes here for the release which will follow shortly (in 2 or 3 weeks).
Commit getdnsapi/getdns@45683d3 fixes the issue, but asserts should not have exited Stubby and certainly not the getdns library in the first place. They need to be compiled with NDEBUG defined if compiling for production (i.e. not debugging). With that, the issues above would have resulted in a failure to DNSSEC validate certain rr types (like NAPTR) instead of exit theprogram.
Still TODO:

  • Determine which RR types were affected by this issue for communication purposes

@Foritus
Copy link

Foritus commented Mar 1, 2022

hello :) Has this made it into a release yet? I'm using the version that ships with RaspberryPi OS (0.2.5 which is admiteddly already pretty ancient) and have this crash once or twice a day. At the moment I've mitigated it with a high quality systemd unit file hack:

In /etc/systemd/system/multi-user.target.wants/stubby.service under [Service] add these lines:

Restart=on-failure
RestartSec=5s

wtoorop added a commit to getdnsapi/getdns that referenced this issue Aug 11, 2022
alexeys85 pushed a commit to alexeys85/packages that referenced this issue Jan 11, 2023
Changelog from upstream (https://github.com/getdnsapi/getdns/releases/tag/v1.7.3):

* 2022-12-22: Version 1.7.3
  * PR getdnsapi/getdns#532: Increase CMake required version 3.5 -> 3.20, because we
    need cmake_path for Absolute paths in pkg-config (See Issue getdnsapi/getdns#517)
    Thanks Gabriel Ganne
  * Updated to Stubby 0.4.3 quickfix release

* 2022-08-19: Version 1.7.2
  * Stubby updated to 0.4.2 quickfix release

* 2022-08-19: Version 1.7.1
  * Always send the `dot` ALPN when using DoT
  * Strengthen version determination for Libidn2 during cmake processing
    (thanks jpbion).
  * Fix for issue in UDP stream selection in case of timeouts.
    Thanks Shikha Sharma
  * Fix using asterisk in ipstr for any address. Thanks uzlonewolf.
  * Issue getdnsapi/stubby#295: rdata not correctly written for validation for
    certain RR type. Also, set default built type to RelWithDebInfo and
    expose CFLAGS via GETDNS_BUILD_CFLAGS define and via
    getdns_context_get_api_information()
  * Issue getdnsapi/getdns#524: Bug fixes from submodules' upstream?
    Thanks Johnnyslee
  * Issue getdnsapi/getdns#517: Allow Absolute path CMAKE_INSTALL_{INCLUDE,LIB}DIR in
    pkg-config files. Thanks Alex Shpilkin
  * Issue getdnsapi/getdns#512: Update README.md to show correct PGP key location.
    Thanks Katze Prior.

Signed-off-by: Aquila Cooper <aquila@cpr.is>
1582130940 pushed a commit to 1582130940/OpenWrt-Lean-Packages that referenced this issue Jan 12, 2023
Changelog from upstream (https://github.com/getdnsapi/getdns/releases/tag/v1.7.3):

* 2022-12-22: Version 1.7.3
  * PR getdnsapi/getdns#532: Increase CMake required version 3.5 -> 3.20, because we
    need cmake_path for Absolute paths in pkg-config (See Issue getdnsapi/getdns#517)
    Thanks Gabriel Ganne
  * Updated to Stubby 0.4.3 quickfix release

* 2022-08-19: Version 1.7.2
  * Stubby updated to 0.4.2 quickfix release

* 2022-08-19: Version 1.7.1
  * Always send the `dot` ALPN when using DoT
  * Strengthen version determination for Libidn2 during cmake processing
    (thanks jpbion).
  * Fix for issue in UDP stream selection in case of timeouts.
    Thanks Shikha Sharma
  * Fix using asterisk in ipstr for any address. Thanks uzlonewolf.
  * Issue getdnsapi/stubby#295: rdata not correctly written for validation for
    certain RR type. Also, set default built type to RelWithDebInfo and
    expose CFLAGS via GETDNS_BUILD_CFLAGS define and via
    getdns_context_get_api_information()
  * Issue getdnsapi/getdns#524: Bug fixes from submodules' upstream?
    Thanks Johnnyslee
  * Issue getdnsapi/getdns#517: Allow Absolute path CMAKE_INSTALL_{INCLUDE,LIB}DIR in
    pkg-config files. Thanks Alex Shpilkin
  * Issue getdnsapi/getdns#512: Update README.md to show correct PGP key location.
    Thanks Katze Prior.

Signed-off-by: Aquila Cooper <aquila@cpr.is>
SibrenVasse pushed a commit to SibrenVasse/packages that referenced this issue Feb 26, 2023
Changelog from upstream (https://github.com/getdnsapi/getdns/releases/tag/v1.7.3):

* 2022-12-22: Version 1.7.3
  * PR getdnsapi/getdns#532: Increase CMake required version 3.5 -> 3.20, because we
    need cmake_path for Absolute paths in pkg-config (See Issue getdnsapi/getdns#517)
    Thanks Gabriel Ganne
  * Updated to Stubby 0.4.3 quickfix release

* 2022-08-19: Version 1.7.2
  * Stubby updated to 0.4.2 quickfix release

* 2022-08-19: Version 1.7.1
  * Always send the `dot` ALPN when using DoT
  * Strengthen version determination for Libidn2 during cmake processing
    (thanks jpbion).
  * Fix for issue in UDP stream selection in case of timeouts.
    Thanks Shikha Sharma
  * Fix using asterisk in ipstr for any address. Thanks uzlonewolf.
  * Issue getdnsapi/stubby#295: rdata not correctly written for validation for
    certain RR type. Also, set default built type to RelWithDebInfo and
    expose CFLAGS via GETDNS_BUILD_CFLAGS define and via
    getdns_context_get_api_information()
  * Issue getdnsapi/getdns#524: Bug fixes from submodules' upstream?
    Thanks Johnnyslee
  * Issue getdnsapi/getdns#517: Allow Absolute path CMAKE_INSTALL_{INCLUDE,LIB}DIR in
    pkg-config files. Thanks Alex Shpilkin
  * Issue getdnsapi/getdns#512: Update README.md to show correct PGP key location.
    Thanks Katze Prior.

Signed-off-by: Aquila Cooper <aquila@cpr.is>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants