-
Notifications
You must be signed in to change notification settings - Fork 7.8k
php_opcache crash - memory access violation #7915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
additional information: the crash has occurred 6x; each crash produces (2) .dmp files. Just ran DebugDiag on all 12 files. Every one of the errors looks like this:
or like this:
all of the errors are in accel_new_interned_string or accel_post_startup in every case, the php process which crashed is a console application, NOT running in apache httpd. |
I guess the issues on Windows Server 2019 are due to more aggressive ASLR. Are you running multiple PHP CLI process in parallel? Either toplevel process, or PHP processes spawned by a single toplevel PHP process. If so, would assigning a unique opcache.cache_id to these processes help? Also consider to set |
Thanks @cmb69 for the speedy reply. Yes we are running tons of CLI processes: ~2000/day on this server. However the schedule of these jobs is unchanged from the old server. Keeping a separate opcache.cache_id would defeat some of the value of caching - since these processes use many of the same files - and would consume vastly more memory (there are currently 22 separate CLI processes), so I don't think that's a realistic solution. I've changed verbosity_level=4 as suggested and added opcache.error_log configuration. Will post details here after the next crash...which I presume will occur. |
Yeah, makes probably sense to wait on hopefully helpful log messages. However, sharing OPcache instances for "unrelated" (i.e. non forked) processes is a quirk on Windows, and can't generally work, since these processes (especially if dynamically loaded extensions are involved) can have different memory layout (ASLR), but OPcache partly relies on a fixed memory layout, and the only mitigation on Windows is checking that a single central function ( |
hmmm... I was not aware of the point you made in the previous comment. Is it possible for each CLI process to set opcache.cache_id dynamically, using something like And, if I did that, would the opcache configurations (especially |
p.s. reviewing the php_opcache installation instructions: I see that my configuration lacked the recommended |
Yes, you need to set And no, you can't set Any other OPcache setting is not affected by |
Cristoph - thanks again for your help and advice, and very prompt replies. Much appreciated !
I just installed https://github.com/amnuts/opcache-gui and you are correct: 512MB is far too much memory. Our production server has been up since 25-Dec (more than two weeks) and is using < 55MB of the cache, with a ~100% hit-rate on more than 1000 files. see attached screenshot.
All of the CLI processes have the same entry-script, and all the CLI code is in the same directory, so a registry-entry probably would not work to create separate caches. However, the CLI processes are launched by the Windows task-scheduler, which executes .bat files, so perhaps setting the As I mentioned previously, the CLI processes share quite a lot of code with the web-app, since they're both using the same framework and models, although different controllers and views. My intention was to have one large cache for efficiency. Do you think it may be possible to fix the memory-access-violation, if I can provide the debugging output? Or should I assume it will be too difficult, and begin implementing the separate-cache for each CLI process ? The fallback for all of this would be to disable opcache for CLI, but given that these processes execute 2000+ times/day, it seems woefully inefficient to do so. |
Thanks for the detailed reply!
If you're starting the PHP scripts from batch files, you could pass the cache_id as command line option:
The
Yes, that's reasonable. However, if the actual issue is really caused by different memory layout due to ASLR[1], that just won't work. The only way a single OPcache instance could reliably work for the CLI would be to have a single long-running CLI process (basically a service) which actually executes the different tasks; to be able to run multiple tasks in parallel, you would need to use an extension which allows to do so in a single process (e.g. parallel). [1] I'm not sure whether this is actually the reason, but we faced the same issue when the PHP test runner was improved to run tests in parallel, and the solution we came up with was actually the |
@cmb69 How about spawning fastcgi and using fast-cgi-client to run tasks? Could this solve OP problems? |
@KapitanOczywisty, FastCGI has exactly the same issues. On POSIX systems this is solved by FPM, which forks its workers to enforce the same memory layout. There is nothing like that for Windows, though (because Windows does not support forking). Apache mpm_winnt comes close, but still may have that issue; PR #7865 might solve that. |
ok - this is helpful. (BTW - so far it has not crashed again. I will post the opcache-debug output as/when it does). Per the previous comment, I've reduced the One more question: does it reserve this much memory? In other words, if I have a web-app, plus 22 CLI processes with separate opcache_id's, will I be reserving 23 * 80 = 1840MB (nearly 2GB) of memory? Or is the |
one more thing:
In point of fact, since So is the key word here "reliably" ? Because up till now (at least on our old production Win 2012 R2 server); and for the past few weeks on Windows Server 2019, it is in fact working....just not 100% reliably...hence this ticket. |
OPcache instances are not supposed to shared between different SAPIs. Actually, each combination of user/cache_id/SAPI/system_id, where the system_id is unique for each PHP version plus some further details, has its own ID which is hashed. Since we're using MD5 hashes, there is a chance of collisions, but that should be pretty low. You can double check that when file_cache is enabled for Web and CLI; there should be at least two different sets of caches, even if you use the same basedir for the file_cache. |
I'm not seeing that. in php.ini on my non-production system:
Note: there is no the .bat file for CLI process #1: the .bat file for CLI process #2: I restarted apache, and then opened the web app; and executed both of the .bat files EDIT: what I wrote originally was partly incorrect. I needed to actually STOP and then restart apache for the new cache configuration to take effect. Now in c:\temp\opcache\ there are TWO subdirectories:
|
Hmm, I cannot reproduce that:
|
ah, yes. Sorry it was my mistake:
I used your example literally, but did not have a variable %cli-cache% defined. using a string literal like "2" or "clicache", I get the same results as you. In my previous test, the cache_id variable was undefined, so functionally equivalent to null or empty-string, which explains the previous results. Sorry for the wasted effort on your part, and for my ignorance of Windows shell syntax. I think the easy solution for me is to use the This still does not explain php_opcache crashing, as per the subject of this ticket. I've been monitoring for several days with I'll be offline from Jan 12 to Feb 12, so I'll return to this topic in a month. Thank you again very much for your time, attention, expertise, and patience. :-) |
back online. problem occurred once on Jan 12; and again today Feb 14. Today's instance involved five CLI processes running concurrently. each process creates two dump files. In 10x dumps, the error is always at accel_new_interned_strings+66. looks like adding the opcache.file_cache configuration helped some; but the failures continue. Possibly due to multiple concurrent CLI processes? I have not yet tried adding the -d cache-id as suggested previously. Is that the next best thing to do ? I'm happy to continue trying work-around, but I would (very respectfully) point out that there is a bug somewhere which is causing a crash....anything I can do to help troubleshoot I'm happy to help with, keeping in mind I'm not a PHP core developer and have no ability to build or debug PHP locally. Nor can I reproduce this problem on demand - it only happens on a loaded production server. p.s. I'm happy to post additional stack-traces (derived from using DebugDiag2 on the .dmp files). and/or any other artifacts, if that would be helpful. |
If it helps anyone, I had the same issue and solved it by cleaning up c:\windows\temp |
@cmb69, Has something been changed in 8.1 with respect to OPcache sharing in Windows CLI? My concurrent CLI processes crash with the release of 8.1. Before, in 8.0, up to 8.0.27 we never had this problem. Faulting application name: php.exe, version: 8.1.14.0, time stamp: 0x63b570c0 The issues are gone when using opcache.enable_cli=0 or by using pcache.cache_id Is there something that can be done to revert to the old behavior? |
I don't think there has been anything specific wrt. Windows (CLI). Maybe your problems are caused by the inheritance cache which has been introduced. There might be bugs to fix, but without some way to reproduce these issues, there's probably not much we can do.
Disabling OPcache (or only enabling file cache) can cause serious performance degradation; opcache_cache_id should not (unless there is not sufficient memory available; each cache_id get's its own SHM). So I suggest you try with different cache_ids. Generally, I'm not happy with the OPcache re-attaching behavior (which is only implemented on Windows). On other systems, unrelated CLI processes always have their own OPcache instances. The re-attaching on Windows was implemented to be able to share a single cache for mutliple (F)CGI processes, but that has issues (i.e. possibly failing to work, so file_cache_fallback is a thing). And in the case of simulating |
@cmb69 thanks for your fast reply. The reproduction of the crash over here is quite easy in PHPstorm:
This process can be repeated with the same result every time. So it doesn't crash during the first run but always at the second run. It also always seems to crash at the same test so the reproduction seems to be consistent. Update: Further investigation shows that the crash originates in calls to PDO's sqlite::memory: |
If it would generally happen, there certainly have been a lot of respective reports. I assume that this is specific to your production and/or test code, or possibly your machine or PHP configuration.
Do you call that directly ( Also condider to clear the file cache, or even to disable it via configuration; at least for testing purposes. |
I digged it bit further, it seems the PDO was not called yet (sorry), but a file prior to that (a query building file named SelectQuery.php). The process crashes when I call one of its methods. When I comment that out, the tests run fine (having that other php instance open). I can rerun this without problems. However, when I make a change in that particular SelectQuery.php file, like adding '//' somewhere, everything crashes again. I have to kill that other PHP process to make it all running a again succesfully. Also when a new php instance is started, everything works repeatly fine. So during runtime something is going on with that particular file and mutating it does have consequences, perhaps because of the new Inheritance cache. |
interestingly, when I mutate the ancestor root file (adding // somewhere), the 0xC0000005 disappears and everything runs smooth again So we have an inheritance tree like this: Mutate behavior by adding //: |
good news, we have found a workaround to this bug, thanks to @cmb69 pointing to looking into the inheritance cache. Highly appreciated. Before running multiple processes in parallel, we run a single php file in which we loop the composer classmap (composer/autoload_classmap.php) and execute a class_exists on the classes. With this php instance running in the background, we are able to run our tests in parallel again. Jeej. |
Regarding the opcache.cache_id property - I run all our Windows IIS sites under different user accounts for the apppools. So, would it still apply in this configuration to specify a different cache_id for each site? Thank you. ; Facilitates multiple OPcache instances per user (for Windows only). All PHP ; Enables and sets the second level cache directory. ; Implies opcache.file_cache_only=1 for a certain process that failed to |
I've detected the same problem when PHP use a symlink and when PHP use the same file between different sites. eg.: site1, site2, siteN, use the c:\common\shared.php |
Description
I don't know how to recreate the problem. it occurs on our production server, sporadically. 6x in past 4 days. never happened on old server (Windows 2012 R2) running same versions of PHP (8.0.14 x64 thread-safe) and apache httpd (2.4.51 x64), and the same opcache configuration. We are not using JIT.
Windows Event Log shows:
php.ini:
Running Windows DebugDiag tool, with PHP 8.0.14 symbols loaded, gives the report attached here:
opcache-crash-analysis.pdf
PHP Version
PHP 8.0.14
Operating System
Windows Server 2019; apache httpd 2.4.51 x64
The text was updated successfully, but these errors were encountered: