-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nagios 4.3.1 crashes when using mod_gearman #110
Comments
@hedenface do you want to have a look? |
I have also done a diff between the nagios-headers that you use for nagios4 and the "real" once for nagios-4.3.1.
D/\N |
I'll take a look today. @dan-m-joh Can I see your contact definitions, please? |
Of cause you can... (email redacted)
|
This looks like it may be a Core bug. I was able to replicate with pre-built and compiled from source ModGearman modules. Keep this issue open if you want, and I'll post the relevant fix when/if discovered. |
Great to hear, not that we have a bug, but that you could replicate it. Now I at least know that it is not just in my environment. D/\N |
We've had this issue for months. We're testing moving to Naemon, but sure wish this would work with Nagios 4 Core. |
@dan-m-joh Did you by chance happen to compile mod-gearman with the proper Nagios header? I'll get it set up on Wednesday and try and get this thing fixed. |
No, sorry I have had no chance to test with the "new" nagios headers. |
F.Y.I. Compiling mod_gearman with the Nagios-4.3.2 headers (replacing all (except epn_utils.h) headers in include/ and include/lib/ with the ones from the Nagios sources) seems to fix the issue for me. I will let it run on my test rig for a few days, than I will update my production rig. D/\N |
Was this ever fixed? I know it is closed, but there was no comment on the closing. I'm getting the same behavior with the following: CentOS 6.9 It happened with mod_gearman 3.0.6 from the sable repo too, I moved to the testing repo to see if it was fixed. Everything works fine until I enable active checks, then it dies with SIGSEGV. |
The problem is the headers that are used for compiling the binaries in the package you mention I believe @rcgreenw . What happens if you compile using the Nagios 4.3.4 headers? I suspect the issue will go away. |
I haven't had a chance to try that yet, the machine really isn't set up for development. I was hoping for updated packages so I wouldn't have to build my own. I'll see if I can get everything needed to build it installed. Thanks. |
We have a similar setup to rcgreenw, in terms of RPM package sources. What's the recommended solution here given we want to upgrade easily with RPMs? |
I was able to get an RPM built with minor modifications. I pulled from git, then removed the include/nagios4 directory and replaced it with a symlink to /usr/include/nagios (from the nagios-devel rpm). Then, I did an rpmbuild using the spec file in the support directory. There is a copy of the rpm here, but don't count on updates in the future. http://mirror.tausd.org/tausd/RHEL/6/tausd/x86_64/mod_gearman-3.0.5-9.1.el6.x86_64.rpm |
How about changing the configure script to detect /usr/include/nagios and only use the shiped nagios4 folder as fallback. And i am open to pull requests to update the nagios4 folder as well. |
It sounds like mod_gearman no longer supports nagios core now the nagios core has changed its interface. I see a few options:
I'd prefer 1, because I tend to avoid compiling software encouraging sysadmins to use supported binary repositories when at all possible (e.g. consol labs' yum repo). A cursory look at the folders in the repo suggests you already have some structure to support different neb module versions, perhaps this is an extensive of these to support the new nagios interface? |
|
I have upgraded Nagios from 4.2.4 to 4.3.1 (luckily only on my development box) and now it crashes with a SIGSEGV / SIGTERM repeatedly (about once a minute).
For me it looks like a problem when a broker_module sends data "back" to nagios.
I base this on the following facts.
Sadly, the only thing I can see in the nagios-log are:
Caught SIGSEGV, shutting down...
Caught SIGTERM, shutting down...
In the debug-log I do not see anything strange.
Here are my SW releases:
OS: RHEL 7.3
Nagios 4.3.1 (build from source)
mod_gearman 3.0.1-1 (labs.consol.de)
gearmand 0.33-5 (labs.consol.de)
Running nagios under gdb I see the following when it crashes:
Program received signal SIGSEGV, Segmentation fault.
clear_custom_vars (vars=vars@entry=0x7ffffffed940) at ../common/macros.c:2851
2851 my_free(this_customvariablesmember->variable_name);
Missing separate debuginfos, use: debuginfo-install boost-system-1.53.0-26.el7.x86_64 gearmand-0.33-5.x86_64 glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 sssd-client-1.14.0-43.el7_3.11.x86_64
(gdb) bt
#0 clear_custom_vars (vars=vars@entry=0x7ffffffed940) at ../common/macros.c:2851
#1 0x00005555555916bc in clear_contact_macros_r (mac=mac@entry=0x7ffffffed2e0) at ../common/macros.c:3001
#2 0x00005555555918b7 in clear_volatile_macros_r (mac=mac@entry=0x7ffffffed2e0) at ../common/macros.c:2870
#3 0x00007ffff64aaa9e in handle_svc_check (event_type=, data=0x7fffffffda30) at neb_module_nagios4/../neb_module/mod_gearman.c:851
#4 0x000055555556bb2f in neb_make_callbacks (callback_type=callback_type@entry=6, data=data@entry=0x7fffffffda30) at nebmods.c:529
#5 0x0000555555569f10 in broker_service_check (type=type@entry=704, flags=flags@entry=0, attr=attr@entry=0, svc=svc@entry=0x555555e97310, check_type=check_type@entry=0,
start_time=..., end_time=..., cmd=, latency=0, exectime=exectime@entry=0, timeout=timeout@entry=0, early_timeout=early_timeout@entry=0,
retcode=retcode@entry=0, cmdline=cmdline@entry=0x0, timestamp=timestamp@entry=0x0, cr=cr@entry=0x0) at broker.c:326
#6 0x000055555557172f in run_async_service_check (svc=svc@entry=0x555555e97310, check_options=check_options@entry=0, latency=latency@entry=0.0008800000068731606,
scheduled_check=scheduled_check@entry=1, reschedule_check=reschedule_check@entry=1, time_is_valid=time_is_valid@entry=0x7fffffffe29c,
preferred_time=preferred_time@entry=0x7fffffffe2a8) at checks.c:199
#7 0x0000555555571cb1 in run_scheduled_service_check (svc=svc@entry=0x555555e97310, check_options=0, latency=latency@entry=0.0008800000068731606) at checks.c:90
#8 0x0000555555587adb in handle_timed_event (event=event@entry=0x555555e8fc20) at events.c:1171
#9 0x0000555555588623 in event_execution_loop () at events.c:1110
#10 0x0000555555568a56 in main (argc=, argv=) at nagios.c:814
I hope you see something there to help you find the issue.
If you need more debugging info, I would be glad to help.
Regards,
D/\N
The text was updated successfully, but these errors were encountered: