-
Notifications
You must be signed in to change notification settings - Fork 7.8k
PHP 8.1.16 segfaults on line 597 of sapi/apache2handler/sapi_apache2.c #10737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just a little bit more information... I launched my Apache/PHP with valgrind to see if I could identify the source of any memory problems. I don't understand these valgrind logs very well, but I'm seeing lots of traces point back to line 426 of zend_string.c (see below). The trace differs, but the final file and line number always remain the same. Does this indicate that anything is amiss?
|
@Girgias I saw looking at |
This is a false positive. See #10221 and #9068 (comment) for the analysis |
@nielsdos Got it, so according to #9068 I should ignore any valgrind complaints about I think I've narrowed down the root cause a bit further. I ran |
Yes, you should ignore those valgrind reports.
To debug you can configure php with --enable-address-sanitizer , which can find memory errors without having to use valgrind. You could try reproducing your apache crash with that and see what happens. Also what do you mean with "our build"? Is it a custom fork of php or do you just mean that you compiled php yourself without modifications? |
@nielsdos About 30 opcache-related unit tests were failing via
Thanks for the tip -- I'll give that a try and report back. We also just tried adding We confirmed that
I'm just referring to compiling PHP from the standard 8.1.16 source with our own selection of extensions, etc. Not a custom fork of PHP or anything like that. |
I don't see an immediate relation between those tests. What do the failures look like? Are they segfaults? If so, is it possible to attach gdb and see where they segfault?
I think so. But I don't think the memory manager is the source of the issue tbh. I just noticed however that you didn't use |
Oh that's a good catch. Yes, we are running Apache 2.4.54 with mpm_event:
And nope, we have not compiled PHP with For what it's worth, I confirmed that our prior 7.3.27 PHP build was also compiled without I'll try re-compiling PHP with |
@nielsdos Unfortunately adding I'm still working on compiling with I immensely appreciate your help! Aside from what you've suggested above, I'm completely out of ideas on where to go from here. |
@nielsdos Another update... I managed to resolve all but one of the unit tests that were failing. They were all caused by a small set of differences in my php.ini compared to the default php.ini. For example,
I've run into some trouble trying to accomplish that. When I compile PHP with I'm also going to try switching from |
No luck :( I compiled PHP with |
The error_log issue is probably worth reporting separately yeah. I would expect clang vs gcc to make no difference. |
@nielsdos Thanks for the tip on the |
@nielsdos Added the |
That's unfortunate. Thanks for keeping trying. EDIT: there's a PR open that does something with LDFLAGS, maybe that's related #10678 to the linker issue |
Some tests fail if the error_log is overriden by the loaded ini configuration. Explicitly set it to an empty value to prevent the failures. See php#10737 (comment)
@nielsdos I haven't made much new progress today, but I did make an interesting discovery. If I deploy PHP 7.3.27 with the same configuration that we've used on our older fleet of web servers, it also incurs a segmentation fault. So whatever is causing these segfaults, it's nothing new to PHP -- 7.3.27 segfaults, 8.1.16 segfault and 8.2.3 segfaults. I'm scratching my head because we're experiencing these segfaults in a very "vanilla" environment. A KVM virtual machine running CentOS 8 Stream ( |
@nielsdos Here's the I captured the It's hard to see in that text output, but All of that is pretty much gibberish to me, but does it have some meaning to you? :) |
Yes that is actually very very helpful, thank you very much! |
I managed to reproduce the failure on my system, by creating lots and lots of concurrent requests. It also seems to help to stretch the execution of the php script with a busy-wait forloop to increase the chance of a concurrent request. |
@ElliotNB Can you please try to add the following flag to what you already have for your configure: And then compile PHP and see if you still get segfaults. |
@nielsdos Wow, that's incredible news! Yes, I will recompile with that option and report back right away. With our production traffic sometimes it takes the seg faults just over an hour to show up so it'll probably be about an hour and a half before I can confirm if they're gone. Thank you! |
@nielsdos Unfortunately I just had a segfault. The Here's the system and compile info:
Any other debug info you'd like me to capture?
Your observation aligns very well with our production traffic. Part of our production traffic involves data ingestion for an ETL process. Inbound POST requests with a large JSON payload that we iterate over and update a database. That might explain why we're triggering these segfaults so easily while, presumably, it doesn't happen as much for others. I'm curious why we weren't seeing these segfaults on our older fleet of servers with PHP 7.3.27 -- those servers were running CentOS 6.10, Apache 2.4.39 and PHP 7.3.27 (with essentially the same |
Thanks for trying, that's unfortunate. I'm kinda out of ideas for now to be honest. The issue for me was that the signal was racy delivered to a thread that hadn't initialised the tsrm properly yet. And the following patch (on top of PHP-8.1) fixed that for me: diff --git a/TSRM/TSRM.c b/TSRM/TSRM.c
index 7cd924318e..962a295555 100644
--- a/TSRM/TSRM.c
+++ b/TSRM/TSRM.c
@@ -113,7 +113,7 @@ static pthread_key_t tls_key;
#endif
TSRM_TLS uint8_t in_main_thread = 0;
-TSRM_TLS uint8_t is_thread_shutdown = 0;
+TSRM_TLS uint8_t is_thread_shutdown = 1;
/* Startup TSRM (call once for the entire process) */
TSRM_API int tsrm_startup(int expected_threads, int expected_resources, int debug_level, const char *debug_filename)
Both of our issues stem from the fact that You may try with the patch above, although I doubt that's gonna make a difference. When you get a crash again you could try the following GDB commands to get more debug info: |
@nielsdos Here's two sets of
There's a difference in those three variables between the PHP versions: PHP 7.3.27:
PHP 8.1.16:
Next, I'll try out your |
Also thank you for putting all that effort in testing this and following up! |
@ElliotNB Yep that's the old commit, so I think that's the issue. It's my mistake really... I noticed I had a mistake which could cause a UAF crash, and I thought I could just quickly force push the fix because I thought you wouldn't have checked out yet. |
@ElliotNB Actually, looking more at it, that commit should work too I think. Is it possible to give a GDB backtrace of the crash? |
@nielsdos No worries! I'm going to have to restart Apache anyway to grab a |
@nielsdos Just had a segfault with the code taken from your fork/branch, here are the details: https://pastebin.com/raw/pAzg2eSB Looks a little different than what I've seen before. I used the same
|
@nielsdos I'm currently testing this patch #10737 (comment) -- no segfaults yet after 60 minutes, but I'm going to let it run for a couple hours longer. Edit: still no segfault at the two hour mark, but I'll let it run overnight just to be sure. |
@ElliotNB Thanks! With the information you gave I was able to find and reproduce the issue with the branch. I pushed the commit which fixes the issue on my branch. You should be able to pull from it, and recompile as usual. Branch URL is still the same: https://github.com/nielsdos/php-src/commits/fix-tsrm-apache Also: if you haven't already, you should probably close the issue on the Apache bugtracker. |
@nielsdos Sounds good, I'll give that a whirl this morning. The other patch I was testing also segfaulted, but it took a long time for the segfaults to show up. Let me know if you want to see that gdb output too. |
@ElliotNB Thanks! Yes, maybe it's best if I also see those GDB reports to at least know if it's not another underlying issue. |
@nielsdos Here's the Currently testing the revised code, I'll let it run for at least a couple hours (unless I get a segfault sooner). |
Fixes phpGH-8789. Fixes phpGH-10015. This is one small part of the underlying bug for phpGH-10737, as in my attempts to reproduce the issue I constantly hit this crash easily. (The fix for the other underlying issue for that bug will follow soon.) It's possible that a signal arrives at a thread that never handled a PHP request before. This causes the signal globals to dereference a NULL pointer because the TSRM pointers for the thread aren't set up to point to the thread resources yet. PR phpGH-9766 previously fixed this for master by ignoring the signal if the thread didn't handle a PHP request yet. While this fixes the crash bug, I think the solution is suboptimal for 3 reasons: 1) The signal is ignored and a message is printed saying there is a bug. However, this is not a bug at all. For example in Apache, the signal set up happens on child process creation, and the thread resource creation happens lazily when the first request is handled by the thread. Hence, the fact that the thread resources aren't set up yet is not actually buggy behaviour. 2) I believe since it was believed to be buggy behaviour, that fix was only applied to master, so 8.1 & 8.2 keep on crashing. 3) We can do better than ignoring the signal. By just acting in the same way as if the signals aren't active. This means we need to take the same path as if the TSRM had already shut down.
@nielsdos 2.5 hours of no segfaults using your branch/fork, I think it's safe to call the 'all clear' on this one! My apologies for the false positive earlier. I'm going to check back in another couple hours just in case there's a latent segfault, but we're having peak middle of the day traffic and no segfaults for 2.5 hours seems very safe. |
@ElliotNB thanks for reporting back. Let's indeed wait a bit longer to see for sure :) |
@nielsdos Almost 5 hours without a segfault, I'll keep watching but I'd be shocked if one showed up at this point. Edit: 11 hours and still no segfaults 👍 |
…bals Fixes GH-8789. Fixes GH-10015. This is one small part of the underlying bug for GH-10737, as in my attempts to reproduce the issue I constantly hit this crash easily. (The fix for the other underlying issue for that bug will follow soon.) It's possible that a signal arrives at a thread that never handled a PHP request before. This causes the signal globals to dereference a NULL pointer because the TSRM pointers for the thread aren't set up to point to the thread resources yet. PR GH-9766 previously fixed this for master by ignoring the signal if the thread didn't handle a PHP request yet. While this fixes the crash bug, I think the solution is suboptimal for 3 reasons: 1) The signal is ignored and a message is printed saying there is a bug. However, this is not a bug at all. For example in Apache, the signal set up happens on child process creation, and the thread resource creation happens lazily when the first request is handled by the thread. Hence, the fact that the thread resources aren't set up yet is not actually buggy behaviour. 2) I believe since it was believed to be buggy behaviour, that fix was only applied to master, so 8.1 & 8.2 keep on crashing. 3) We can do better than ignoring the signal. By just acting in the same way as if the signals aren't active. This means we need to take the same path as if the TSRM had already shut down. Closes GH-10861.
Assuming no new failures happened, I'll mark my PR as ready tonight. |
@nielsdos No segfaults over the weekend either 👍
Yeah I've been wondering if there's something unique about our applications and/or usage that's made it relatively easy for us to re-pro these segfaults that not many others appear to have reported. As far as I know our stack is pretty common (KVM guest VMs running CentOS 8 Stream, Apache 2.4.54, PHP 8.1.16, PostgreSQL 11, memcache). |
@ElliotNB Thanks for reporting back. That convinces me the bug is really gone. As for the reproduction environment: it looks indeed like a normal setup. I don't know what causes this for you. I know of one other bug report that I'm now pretty convinced it's the same bug as you were experiencing and they also didn't have a special setup. I tried a lot of different things but still to this day I'm unable to reproduce the issue naturally, I can only reproduce it artificially by using an interposer to modify the thread ID. |
@nielsdos I've not seen any segfaults after several days now, but I am seeing something unusual in my Apache Maybe it's just from the newer Apache version, but I thought I'd mention it in case you think that your changes might have an impact on the ability of child processes to exit gracefully when the |
@ElliotNB this looks unrelated. The reason you're seeing leak warnings is because the PHP module is forcefully shut down. The reason you haven't seen the leaks before is because you seem to be running a debug PHP build right now, which will report leaks while a production build will not. And the reason it's forcefully shut down is because the children didn't exit in time. I believe this could also be because you're running a PHP debug build: they are a lot slower than production builds. You can create a production build of PHP by removing If you still experience the message about children that did not exit in time after switching to a production build, then it is simply because you have long running scripts and Apache wants to restart and not wait until the scripts are finished. In this case it's due to the way current versions of Apache work. |
Ahh that makes sense -- I forgot I had added |
* PHP-8.1: Fix GH-10737: PHP 8.1.16 segfaults on line 597 of sapi/apache2handler/sapi_apache2.c
* PHP-8.2: Fix GH-10737: PHP 8.1.16 segfaults on line 597 of sapi/apache2handler/sapi_apache2.c
Woohoo! It's closed! Thanks again for all your help on this 👍
If it's not too late it would be awesome if you could toss my username in the NEWS file, thank you! 🕺 🥇 |
Ah right! Sorry I forgot about this, I'll set this right. |
@ElliotNB Done! |
Some tests fail if the error_log is overriden by the loaded ini configuration. Explicitly set it to an empty value to prevent the failures. See #10737 (comment) Closes GH-10772.
Description
I apologize in advance for the low quality issue report. I'm hoping that someone can point me in the right direction on next steps for completing a more detailed analysis.
I am running Apache 2.4.54 mpm_event with PHP 8.1.16 on a CentOS 8 Stream machine. When we gradually introduce production traffic load to this server, it begins to segfault every 25-45 minutes. I've been unable to find any correlation between the PHP code being executed and the segmentation faults. There is no particular request or script that reliably triggers a segmentation fault.
I captured core dumps for the segfaults and ran them through
gdb
.gdb
produced the following traces. I also ran valgrind in an attempt to capture debug info on memory problems (see below).The trace ends at line 597 of
sapi/apache2handler/sapi_apache2.c
which contains this line:ctx = SG(server_context);
in this block of code:I'm guessing that this isn't enough information to figure out what's going on. Could anyone recommend next steps for troubleshooting this?
PHP Version
8.1.16
configure
line:Dynamically loaded extensions:
We experimented with disabling Zend opcache, but the segfaults persisted.
Operating System
CentOS 8 Stream
The text was updated successfully, but these errors were encountered: