-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random crash of FPM master process in fpm_stdio_child_said #8517
Comments
Possible duplicate of #8494 ? |
I stack backtrace might be helpful. |
@triplesixman Would it be possible to generate a backtrace? Otherwise we have pretty much no chance of finding the actual issue. |
Hi, Sorry for the delay we finally been able to reproduce the issue and to gather a backtrace:
|
@bukka Looks like php-src/sapi/fpm/fpm/fpm_stdio.c Line 156 in 10dc522
I'm completely unfamiliar with fpm, do you know how this could happen? |
I don't see anything suspicious from looking to the code. If it was NULL, then it would fall to the different branch and was created so it would need to be some invalid pointer but again that should not happen based on the code. The only thing that comes to my mind is some sort of memory corruption somewhere. I will add this on my list and will try to do more careful review. @Poulpatine could you share your FPM config? |
Sure, here it is.
And /etc/php/8.0/fpm/pool.d/www.conf :
|
@Poulpatine would you be able to try it with commented out |
Ok, I'm doing the test. I'll keep you informed. |
Hi @bukka , it seems that there is no difference regarding the crash frequency. |
@Poulpatine Thanks for checking. Would you be able to also try to temporarily disable slowlog (commenting out |
@bukka, Sure, I'll do that and tell you. |
Hi, a new crash occured last night. The backtrace is similar to previous ones:
Here is the fpm configuration:
|
@Poulpatine I have been searching for a possible source of this issue without any luck. But co-incidentally there is just a new PR fixing one problem in zlog that might cause this: see #8785 . If it's not easy for you to apply the patch, can you just disable docoration in the pool - add |
I've patched the Debian package and installed it. |
Hi, unfortunately and even with the patch, a new crash occured last night. |
Can you try it with |
In case there is another issue related to this... |
I just applied the change, we will keep you informed |
@Poulpatine @triplesixman Any update on this? Do you still see the issue even without decorated output? |
A new crash occurred yesterday afternoon, here is the trace:
|
I'm running out of ideas in terms of options. I think I might need to take a bit deeper look to the core dump if there's anything useful. Would you be able to share it? If so, feel free to email me privately. It would be also great if you could set log_level to debug and share with me some time before the crash depending on its size. |
I just sent you an email with the core dump file. I also changed the log level to debug. I will keep you informed in case of new crash. |
Mail sent! Let me know if you need anything else to help you in your search. |
@triplesixman I finally managed look into this properly. I have been actually looking and thinking about it last few days (on and off :) ) and I got an idea what could be potentially the issue and added an initial fix for that to #9444 . If you could try it, that would be great! If you don't see any further crash, could you also compare the load on master process before and after the patch? |
Thanks for the update! We will try it out this weekend. EDIT: We put the patch in place today 09/09. I will keep you informed |
Hi @triplesixman , have you had any crash since deploying the patch? |
No crashes observed since September 9 following the application of the patch. I guess we can consider that the problem is solved if there is no new crash by October 9th (I see wide because the crash frequencies are very random). By curiosity I installed php 8.1 to see if the problem was reproduced on this version and it is indeed the case. |
Thanks, did you notice any difference in the load of the master process. It should probably not be visible if you don't run excessive number of pools with high number of children. Anyway I will do some optimization if this is confirmed working. Was more wondering if there's anything visible in terms of load already? |
Hello @bukka , unfortunately the main load of the system does not come from PHP instances but from processes triggered from PHP so we can't really distinguish a trend. Regarding the crashes, no crash have occured since last patch :). Thanks ! |
Hi @bukka, we would like to make all our servers benefit from this fix, our other servers are under php 8.1.11, we go through the deb.sury.org packages for updates. Will the next versions integrate the fix and from which version ? Thank you in advance |
We should delete event before freeing child so it cannot be triggered after the child is freed.
We should delete event before freeing child so it cannot be triggered after the child is freed.
We should delete event before freeing child so it cannot be triggered after the child is freed.
@Poulpatine @triplesixman Apology for the delay. The fix is not committed because I'm not too happy with the overhead it introduces. I spent some time to think aboutit and come up with a different approach that is in #9817 . Would you be able to test it? |
@Poulpatine @triplesixman Did you get a chance to try the fix in #9817 ? |
Hi @bukka, we have been able to deploy the fix last week-end and we have faced 3 segfault since. Here's the backtrace of one of them:
|
Ok it seems like #9817 still leaves the race condition so it doesn't resolve the issue. It might still be logical thing to do as there is no point to keep that event but that would be more optimization. Will leave it open for now and think about it later. It means that I needed to go back #9444 which actually fixes the issue. After thinking about its impact, I realized it might not be that bad. The thing is that setups with many pools won't likely catch worker output but use logs instead so most users will have probably just one pool and if you didn't notice problems with master in your setup when I see you run quite a lot of children, then it's likely others won't see the issue either. Of course this is still not for sure so I merged it to 8.1+ and left out 8.0 because its normal bug fixing support ends in 2 days and I would have no way to address potential issues related to the fix. It means it will be part of 8.1.14 and will also go to 8.2.1. |
@bukka Hello, I wanted to inform you that we have had new segfaults recently despite the upgrade. However, the segfaults seem very rare compared to before the upgrade. I will try to give you more informations about the crashes in the next weeks (our server is not configured to get the backtrace anymore, but we will do it soon). |
@triplesixman @Poulpatine So after more investigation, that previous commit wasn't really a right fix. It might have hidden the the problem but didn't fix the cause of the crash. It actually made the problem more visible under another workload. The main race condition seems to be about possibility to access the child after it's freed (delivery the input event after the signal). I just created a new PR which basically reverts the previous fix and instead tries to delay the freeing of the child by 1 second. It is available here: #11084 . If you were be able to test it, that would be appreciated! |
Description
Hello,
Since the switch to php8.0-fpm, I have random php crashes which force me to restart it manually and cause a 502 bad gateaway error on my site.
The crashes are random and sometimes occur after several days.
The logs indicate this:
Unfortunately I don't have more information and it is impossible to identify the problem. Do you know how I can get more details?
I have thousands of active users on my site and it will not be possible for me to reproduce the crash by myself.
Thank you
PHP Version
PHP 8.0.18
Operating System
Debian 11
The text was updated successfully, but these errors were encountered: