-
Notifications
You must be signed in to change notification settings - Fork 1.6k
nginx worker hangs while logging to the audit log #2373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, @wutchzone That sounds to be a bug. Further investigation is needed. Do you happen to have the audit log in Serial or Parallel mode? |
We have set it as I will try to create a reproducible example on local machine. For now I am able to reproduce this behavior only in production, where we have multiple CPU cores (40+). I will also experiment with other types of locks. |
Hi @wutchzone , I'm not certain whether this would help your particular situation, but you might want to consider upgrading your libmodsecurity version to v3.0.4. There were some fixes post-v3.0.3 that might be relevant. I'm thinking in particular of: |
@wutchzone did you had the chance to test v3.0.4? |
Here's the solution: I was experiencing the same problem, and it would appear that Atomicorp wrote some 'bad rules' which have caused nginx processes to hang. If you are running ASL or if you are just using atomicorp's modsecurity package, this is why your nginx processes are getting stuck. They have pushed out an update, so you will need to update your modsecurity package as soon as possible, or you will probably continue getting the same errors. Good luck. |
We wil update the library by the end of the week. I will let you know once I will have the results. Thank you all for quick response and help. |
@zimmerle Sadly updating the package to the latest version did not help either. Workers are still getting stuck as cen bee see on the image below. @mrwizard64 I am not aware of that we are using rules from Atomicorp. We are using this. |
But after the update it stucks less frequent. Once i catch it again I will provide full backtrace. |
thank you. The core flie will lead you to the fullbacktrace. It could help a lot |
Thanks buddy
…On Mon, Aug 3, 2020 at 7:35 AM Daniel Sedlak ***@***.***> wrote:
But after the update it stucks less frequent. Once i catch it again I will
provide full backtrace.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2373 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIL6PLCVD7T5N3XUU5WHM7LR62OGHANCNFSM4PITTMIA>
.
|
It looks like it is stuck on another
But this time it happens less often. When we turn off |
Any chance that the server demand of actually writing to the audit log is
draining system resources to the point of collapse?
…On Tue, Aug 4, 2020 at 6:07 AM Daniel Sedlak ***@***.***> wrote:
It looks like it is stuck on another pthread_mutex_lock(). I am unable to
locate pthread_mutex_lock in this function (
modsecurity::utils::find_resource), maybe something got inlined.
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
#1 0x00007fa81d99a714 in __GI___pthread_mutex_lock (mutex=0x7fa81d9c5008) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007fa81d72a74a in modsecurity::utils::find_resource(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)
() from /usr/lib/x86_64-linux-gnu/libmodsecurity.so.3
#3 0x0000000000000008 in ?? ()
#4 0x384b505942563964 in ?? ()
#5 0x0000292934302100 in ?? ()
#6 0x00007fff254c5710 in ?? ()
#7 0x0000000000000000 in ?? ()
But this time it happens less often. When we turn off Audit Log workers
no longer stucks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2373 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIL6PLDZOFE2MFNBWL2T5XTR67MXDANCNFSM4PITTMIA>
.
|
Not possible. There is over 100 GiB of free RAM, and few TBs of disk space. |
Is it possible to see a full backtrace with symbols (like in your original posting) for this new instance? The stack of function names with line numbers could be helpful. |
Our nginx is compiled with Lua support, sometimes in GDB it is not possible to see function names in backtrace due to LuaVM. Dunno why, maybe it was somehow stripped or JITed out. I will check it out more. |
@wutchzone Is this issue already solved? |
No, issue still not resolved. Using Mainline nginx 1.19 and latest modsec. After enabling module, nginx just hangs out after ~30sec. |
I am sorry for the late response. The issue still persists. I am unable to get a better backtrace, but the issue occurs more frequently on machines with more workers. |
Hi @wutchzone, are you still facing this problem? I'm using the same spec as yours (ModSecurity V3.0.3 + OWASP CRS), How did you avoid this kind of problem finally? Just by turning off the audit logging? Pod spec:
ModSecurity conf:
We are using (the left-side chart is CPU, and the right one is memory) As you can see, the CPU usage was going up slowly since 8 PM, and after it reached the peak, the health check could not be responded to in 5 seconds, so the pods were terminated, and that's why the CPU usage dropped dramatically, then we were unable to roll out the deployment normally because the health check was still failed even we killed the broken pod. So we disable ModSecurity, rollout the deployment, and everything was being alright. The pods couldn't respond to any requests when the CPU and Memory usage didn't even reach half of the limit. If you have any suggestions, please let me know! |
I have the same behavior. I've tried logging through the Below is a graph on the server CPU: Not sure if this matters, but I'm running this is on GCP, on a e2-standard-2 machine (2 CPUs and 8 GB of RAM). Using nginx 1.20.2 and modsecurity-nginx 1.0.2 . CoreRuleSet OWASP 3.3.2. |
Hello,
In our company we use ModSecurity in our nginx. We noticed that sometimes nginx workers just hangs and do nothing, however when we
strace
the process (or attach thegdb
), then the worker starts spinning again. This problem happens when we enable audit logging (ex:SecAuditLog /var/log/modsec_audit.log
), I have traced the issues to the following line https://github.com/SpiderLabs/ModSecurity/blob/0eb3c123f447b8787ea726ad4d4439018a07ee31/src/utils/shared_files.cc#L236 it appers that proccess is unable to wakeup and acquire the lock for the file and it is just waiting.backtrace:
We use the following connector
server:
Is this a bug or this directive cannot be used with multiple processes logging to the same file?
The text was updated successfully, but these errors were encountered: