Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load debugging information from the nginx compiled binary #2153

Closed
juliohm1978 opened this issue Feb 27, 2018 · 15 comments · Fixed by #2155
Closed

Unable to load debugging information from the nginx compiled binary #2153

juliohm1978 opened this issue Feb 27, 2018 · 15 comments · Fixed by #2155
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@juliohm1978
Copy link
Contributor

juliohm1978 commented Feb 27, 2018

NGINX Ingress controller version: 0.13

Kubernetes version (use kubectl version): 1.8.x

What happened: I need to instrument the nginx binary with a third party proprietary monitoring tool (AppMon), which involves loading an external agent that needs access to nginx's debugging information generated at compile time.

What you expected to happen: Given the current build.sh script, I'd expect the nginx binary to contain its debugging symbols. The script does include the flag -g into CC_OPT (line 285).

However, a simple test using gdb shows me it cannot find the debug symbols in the compiled binary.

How to reproduce it (as minimally and precisely as possible):

This can be checked with a current build of the image, i.e. quay.io/kubernetes-ingress-controller/nginx:0.34, run the following:

$> docker run --rm --name nginx -it quay.io/kubernetes-ingress-controller/nginx:0.34 bash

root@26c49acb2520:/# apt-get update && apt-get install -y gdb

<snipped output>

root@26c49acb2520:/# gdb /usr/sbin/nginx
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nginx...(no debugging symbols found)...done.
(gdb) 

The message at the bottom is the focus: no debugging symbols found.

My team is willing to maintain a custom build of the nginx-controller image to incorporate the third party agent for monitoring, but we need the compiled binary to include debugging symbols.

Can anyone help?

@aledbf
Copy link
Member

aledbf commented Feb 27, 2018

@juliohm1978 thank you for the report. The next release will contain the debug symbols.

@aledbf aledbf added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 27, 2018
@juliohm1978
Copy link
Contributor Author

Thanks @aledbf! That was fast.

Any chance I can get a tip for adding the necessary flags myself into a custom build? I'm already setup to build the image locally.

@aledbf
Copy link
Member

aledbf commented Feb 27, 2018

@juliohm1978 the missing part is --with-debug in the configure flags

https://www.nginx.com/blog/new-debugging-features-probe-nginx-internals/

@juliohm1978
Copy link
Contributor Author

--widht-debug is already included in the master branch I'm building here. The result is shown in the description above.

@aledbf
Copy link
Member

aledbf commented Feb 27, 2018

All the configuration for debug is already present.
Checking why the debug symbols are missing

@aledbf
Copy link
Member

aledbf commented Feb 28, 2018

@juliohm1978 please check #2155. The issue was related to thee compiler optimization flags.
Before this change the nginx binary was 2.3MB, now is 15MB with the debug symbols.

root@88b760d25a68:/# gdb /usr/sbin/nginx 
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/nginx...done.
(gdb) quit
root@88b760d25a68:/# ls -lat /usr/sbin/nginx 
-rwxr-xr-x 1 root root 14923048 Feb 28 12:53 /usr/sbin/nginx

@aledbf
Copy link
Member

aledbf commented Feb 28, 2018

@juliohm1978 right now the ci is building the new images. You can pull that image in ~20 minutes.

@juliohm1978
Copy link
Contributor Author

Just confirmed, 0.35 image has the debug symbols in its ~15MB binary.

Will reopen the issue if anything else comes up.

Thank you very much for the help!

@juliohm1978
Copy link
Contributor Author

Quick follow up!

I was able to successfuly attach the monitoring agent, but we had to further modify the nginx compiler flags.

For anyone interested, the new image 0.35 includes debug symbols using the flags -g -Og. However, the custom jemalloc implementation added a few months ago with -ljemalloc causes the third party agent to crash with a core dump. We also needed to remove that from the LD_OPT flags.

Unfortunately for us, the DynaTrace agent is very intrusive and we will need to maintain a custom build of the nginx-controller in-house to make this work.

I'd like to thank you all for the support. Great community!

@aledbf
Copy link
Member

aledbf commented Feb 28, 2018

@juliohm1978 can you post the crash detsails?

@juliohm1978
Copy link
Contributor Author

juliohm1978 commented Feb 28, 2018

The agent is loaded with the following command line

LD_PRELOAD=/dynatrace/agent/lib64/libdtagent.so nginx -g 'daemon off;'

It will print its debug messages to stdout, along with the runtime of nginx itself.

Nginx finally crashes with the following:

2018-02-28 18:55:15.087972 [7/24366890] severe  [native] Cannot allocate memory in the required region, nginx Agent will be inactive.
2018-02-28 18:55:15.087979 [7/24366890] debug   [native] Allocated memory for module table at location 7fd011b6b500 (7fd011b6b500 - 55d914b42000 >= %ld)
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

There's also a resulting 250MB core dump, but we'll have to upload it somewhere if you wish to dive into it.

Removing the -ljemalloc flag from LD_OPT fixes this crash.

UPDATE: This is what gdb back gives us from the core dump.

(gdb) back
#0  0x00007fd02262dfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd02262f3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fd01eb9e0ad in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fd01eb9c066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fd01eb9c0b1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fd01eb9c2c9 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fd01ef3103c in allocate<ngx_module_s*> (count=91, isPIE=true) at src/NginxPreload.cpp:232
#7  0x00007fd01ef30cc0 in NginxPreload::NginxPreload (this=0x7fd021c3d0b0) at src/NginxPreload.cpp:281
#8  0x00007fd01ef30e70 in getnginxpreload (getpid=0x7fd0226b3f40 <getpid>) at src/NginxPreload.cpp:340
#9  0x00007fd0211d0302 in ?? () from /dynatrace/agent/downloads/7.0.6.1013/native/linux-x86-64/libdtnginxagent.so
#10 0x00007fd0243830f6 in getpid () from /dynatrace/agent/lib64/libdtagent.so
#11 0x000055d914b6af94 in main (argc=3, argv=0x7fff97849248) at src/core/nginx.c:230

@rlees85
Copy link

rlees85 commented Mar 26, 2018

@juliohm1978 I don't suppose you have had any experience with ingress-nginx and Dynatrace SaaS (one-agent)?

If I do nothing (use ingress-nginx 0.12.0 the following happens:

  • Dynatrace SaaS moans saying it cannot enable Deep Monitoring on Nginx due to it being statically linked (is it? doesn't seem to be?)
  • I have no idea what one-agent is doing, but Nginx fails to open ports 80 & 443 and therefore does not work.

I've tried:

  • What you suggested above around removing -ljemalloc and building my own. Caused Nginx to core-dump.
  • Building an ingress-nginx with nginx version 1.12.1 which is on the Dynatrace supported list and this came up but failed to work (same as original issues) and that is with and without ljemalloc

I guess no-one will have a clue, I will raise an issue with Dynatrace, but if you did try with one-agent I just want to know your experience!

@juliohm1978
Copy link
Contributor Author

@rlees85

You are right. We don't have any experience with the OneAgent solution yet. But it is on our roadmap for a future licence upgrade.

At this point, I have no idea how the OneAgent performs its deep monitoring, but since it's supposed to be more prepared for a cloud environment, there should be a broader support for containerized apps and their more recent builds.

A DynaTrace support ticket is a good way to start. They do provide good technical support, even if delayed by their timezone. That's what we did to get it working with the traditional agent.

@rlees85
Copy link

rlees85 commented Jul 4, 2018

I'm back again (sorry) ... to add to the above, I raised a ticket with Dynatrace about OneAgent and it now works. I think it was the same issue as you had above (jemalloc). Dynatrace fixed it though...

It looks like, however the decision is to stay with AppMon.... and I just tried the Dynatrace AppMon Agent 7.1 and it barfs.

I've raised a ticket with Dynatrace but will more than likely need to bake our own ingress-nginx image, same as you.

I was wondering, do you run nginx directly or do you use /nginx-ingress-controller ? As during its duties, /nginx-ingress-controller can stop, start, reload, etc nginx quite a few times. Does AppMon mind this?

Also, when using LD_PRELOAD=/dynatrace/agent/lib64/libdtagent.so the bootstrap agent just hangs after it downloads the payload from the collector. Did you come across this problem? Using LD_PRELOAD=/dynatrace/agent/lib64/libdtnginxagent.so directly works (until the jemalloc crash) when running nginx directly, but cannot do this when running /nginx-ingress-controller

Seems like a bit of a wasps nest and any help if you have experienced any of these problems would be a great help.

@juliohm1978
Copy link
Contributor Author

Hi @rlees85,

We havent had any issues with the agent hanging at bootstrap.

We still use /nginx-ingress-controller to start nginx, as we tried to keep our image as close to the original entrypoint as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants