-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when OPcache is enabled along with an extension utilising the Observer API #13817
Comments
@mtrop-godaddy It looks like this may be related to #13735. Would it be possible to check whether this issue is resolved on the |
I've done a quick test with the 8.3.5RC1 release, which contains the patch that you've mentioned, but it doesn't seem to have helped. I'm still seeing the same segfault. I'll try and do some more testing with the code changes in that PR, just in case I missed something, but so far it looks like this might be a different issue. |
This particular crash is in zend_observer_fcall_end_all, so it's likely not related to the other observer issues. This kind of crash however is virtually impossible to track down without observing it in the debugger, I think. Is the crash reliably reproducible? I think we need a reproducer here. Feel free to reach out privately, if you cannot share it publicly. |
We'll see what we can come up with. The crash is easy to reproduce (any web request to the site in question) but it's happening on a small set of very complex WordPress sites (custom theme, lots of plugins, etc.) so getting a simple reproduction case, or just trying to track it down in a debugger isn't easy. |
We can also reproduce on complex WordPress sites, all on PHP 8.2.16, some running |
Finally got around to debugging this some more.. Still don't have a reproduction case but was just wondering if some of the data we're seeing in When the SIGSEGV signal is raised, we're in
The filename/line_start point at the following The trait is used in a bunch of different places but AFAICT that function isn't executed before the crash (added a bunch of
It also looks like an exception may have been raised before the crash since
I'm not familiar enough with PHP internals to know if these are relevant so any pointers would be appreciated. |
This could be a stupid question, but let's try. But I'm still trying to come up with a test... |
@nielsdos trampoline funcs are explicitly exempt from being observed (see _zend_observe_fcall_begin) and as such should not need the extra temporary. Only actually observed functions ever have something written to that temporary. |
@mtrop-godaddy As you see zend_observer_fcall_end_all gets called in shutdown, more precisely when there was a fatal error (bailout) before (it gets always called, but shouldn't do anything otherwise). If you have that, you have tracked down in which function it goes sideways. Then that functions properties need to be inspected, possibly watchpoints added to track where the temporary was assigned this wrong value etc. This is as said not a simple investigation and my offer still stands if you want to reach out privately to me for debugging over screen sharing or providing the applictaion or such. |
Thanks @bwoebi, that helped a bunch! Just tried setting a breakpoint on I also played with a r/w watchpoint on the The following plugin/code seems to be related - https://plugins.trac.wordpress.org/browser/kadence-woocommerce-email-designer/trunk/kadence-woocommerce-email-designer.php#L847. Disabling the plugin also seems to stop the seg faults from happening.
I agree it'd be easier to debug this over a screen share, but we need to check if the affected customer is okay with that first, so I'll reach out when we get confirmation from them. In the meantime I'll try collecting some more information. |
After investiagting together with @mtrop-godaddy, we figured that the stack size of some observed function calls (INIT_FCALL) was computed wrongly. Disabling pass 4 fixes it (opcache.optimization_level=0xFFFFFFFF7). Normally pass 12 fixes these up again this, but for some unknown reason it didn't here. In any case, here pass 4 didn't take ZEND_OBSERVER_ENABLED into account and will compute a stack size missing the observer temporary. |
Instead of fixing up temporaries count in between observer steps, just take ZEND_ACC_DONE_PASS_TWO into account during stack_size calculation. Introducing zend_vm_calc_ct_used_stack for that use case. This should be much less susceptible to forgetting to handle the ZEND_OBSERVER_ENABLED temporary explicitly.
Instead of fixing up temporaries count in between observer steps, just take ZEND_ACC_DONE_PASS_TWO into account during stack_size calculation. Introducing zend_vm_calc_ct_used_stack for that use case. This should be much less susceptible to forgetting to handle the ZEND_OBSERVER_ENABLED temporary explicitly.
Just wanted to confirm we're no longer seeing any crashes in the affected app after smoke testing it with the 8.3.4 release of PHP and patches in #14018 applied on top of that. |
@mtrop-godaddy Are there discussions of this resolution being worked on for PHP 8.2 elsewhere, or is this the only place? |
@scottbuscemi This is the only place as far as I'm aware. |
Instead of fixing up temporaries count in between observer steps, just take ZEND_ACC_DONE_PASS_TWO into account during stack_size calculation. Introducing zend_vm_calc_ct_used_stack for that use case. This should be much less susceptible to forgetting to handle the ZEND_OBSERVER_ENABLED temporary explicitly.
Hey @bwoebi! Thanks for the patch. Are there plans to have this merged upstream? I guess it's starting to create more problems as the Observer API adoption/usage increases. |
New versions of 8.2 and 8.3 were released a few days ago, but neither contained this fix. Is that because the merge happened just a day or two before the release? (While the fix is clearly in 8.4, I think some of us are still hoping to see it in 8.2 and 8.3.) |
Yes, the bugfix is in the upcoming release. It did not make it in time for the RC of the current release. |
Description
When OPcache is enabled and either NewRelic or the
zend_test
extensions are active, we're seeing a segmentation fault on a fairly complicated WordPress app. We've unfortunately not been able to come up with a simple PHP script that would allow us to simplify reproduction steps. The site's code is maintained by a customer of ours and has a large number of plugins so it's difficult to pinpoint the code that's causing the issue.The issue has also been reported in NewRelic's PHP agent repo, but it looks like it's most likely not caused by their agent, since it's possible to reproduce it with the
zend_test
extension. The original reporter mentioned that the issue may have been caused by thefont-awesome
plugin but we've not been able to reproduce it with that so far:It's not clear what's changed in NewRelic's PHP agent that started triggering this bug (it started happening in version 10.18.0.8), but we believe they've stopped using the old tracing method of replacing the
zend_execute_ex
function inminit
/mshutdown
and switched to using the Observer API in all cases.A similar issue has also been reported in this repo so hopefully this isn't a duplicate, our crash seems to be different enough:
execute_data->opline
pointers in observer fcall handlers when JIT is enabled #13772OPcache and JIT
The issue goes away completely if we disabled OPcache:
We also have JIT disabled by default (buffer size is set to 0):
However, the issue still occurs even if we completely disable it while OPcache is still enabled:
Core dump debugging
We've looked at a few core dumps, and when OPcache is enabled, the
func
pointer in theexecute_data
variable ends up pointing at an invalid address which then crashes PHP when it dereferences it in the Observer API's exit handler:Here's the backtrace and a some more debugging information that we've been able to extract from a core dump:
The backtrace with
zend_test
enabled is almost identical, the only difference is that NewRelic's signal handler isn't invoked after the segmentation fault:The following config options are set for the
zend_test
extension:More info about the environment
The site's running in a Docker container, using a slightly modified image that's based on the official PHP Docker images - https://github.com/docker-library/php/tree/master/8.3/bullseye/fpm.
Some notable changes:
--enable-embed
List of PHP modules:
php-modules.txt
PHP Version
PHP 8.3.4
Operating System
Ubuntu 20.04
The text was updated successfully, but these errors were encountered: