- Preface
Mage_Core_Model_Config
and Caching- Symptoms of a Corrupted Config Cache
- Debugging the Issue
- The Problem & Explanation
- Replication
- PHPUnit Tests
- The Fix
If this document helps you out then please give it a star, thanks!
As of Magento 1.9.3.1 the fix discovered in this write up has been included in the core.
If you still suffering configuration cache corruption you should
- Upgrade to 1.9.3.1.
- Apply the available patches to your older version of Magento
- Apply the fix from this write up (SUPEE-4755).
- If you are running php-fpm read the following article and ensure the patch is applied.
- If you are still seeing frequent issues try the core modification described here.
Since SUPEE-6788 I've been contacted a lot by people who have tried the the Fix described below with mixed results:
- I applied your patch and it fixed everything thanks!
- I applied your patch and it didn't work, do you have any other recommendations?
- I applied your patch, dropped
core_config_data
, droppedcore_url_rewrite
, and now my website isn't working. Here is root SSH access please fix.
Anyway, I do have a few further improvements to Mage_Core_Model_Config that I've been using in production for a while now, they seem to be helping. Click through to read more.
- Config Cache Corruption Fix - 100 Router Match Iterations - Unreadable local.xml
- Config Generation Performance Fix - allowUseCacheForInit
PATCH_SUPEE-4755_EE_1.13.1.0_v1.sh
This patch is the exact same as my fix which gives it some validity.
This patch isn't publicly listed anywhere because Magento don't do that for anything but security patches for some reason. I know the patch file says EE_1.13.1.0, but I tested it on community edition 1.9 and it applied fine.
... I guess that makes me a Magento 1 core contributor now?
The majority of my experimentation took place on EE 1.13.0.1 and while this bug does also affect CE 1.9.1.0 I will be referring to EE 1.13.0.1 code throughout the explanation.
Mage_Core_Model_Config
is the class responsible for merging all the system configuration files (config.xml, local.xml) and module configuration files (config.xml) into one object. There are 3 basic parts to the Mage_Core_Model_Config::init()
call
- Loading the base config (config.xml and local.xml files).
- Attempting to load the rest of the config from cache.
- (On cache load failure) Load the remainder of the config from xml and the database, then save it into cache.
/**
* Initialization of core configuration
*
* @return Mage_Core_Model_Config
*/
public function init($options=array())
{
$this->setCacheChecksum(null);
$this->_cacheLoadedSections = array();
$this->setOptions($options);
$this->loadBase();
$cacheLoad = $this->loadModulesCache();
if ($cacheLoad) {
return $this;
}
$this->loadModules();
$this->loadDb();
$this->saveCache();
return $this;
}
If at any point something were to silently go wrong within loadModules
or loadDb
, then corrupted configuration would be saved into cache, meaning that the following request would be served invalid configuration.
Mage_Core_Model_Config
also has a protected variable $_useCache
. When this flag is set Magento will attempt to load sections of the config from cache storage then persist them within the singleton itself.
The following symptoms would usually manifest when the website is experiencing high load, and very often after a cache flush was triggered. The symptoms persist until you flush the CONFIG
cache.
Your website produces nothing but Front controller reached 100 router match iterations
reports. (Tested on Magento 1.12 and 1.13)
Your homepage fails to load the CSS, and a message is displayed saying There was no 404 CMS page configured or found
.
I have not spent much time debugging the effects on the community edition, there are likely other symptoms. (Tested on Magento 1.9.1.0)
Many sources correctly point out that the problem is caused by some of the routers disappearing from the configuration object, meaning there is no router available to match the request.
This error only occurs when the routers configuration was loaded from cache. To stop the bug from bringing down the website and to aid my debugging I overrode Mage_Core_Model_Cache::save()
such that it would do some quick data validation, and prevent corrupted data being saved.
/**
* Save data
*
* @param string $data
* @param string $id
* @param array $tags
* @param int $lifeTime
* @return bool
*/
public function save($data, $id, $tags = array(), $lifeTime = null)
{
//Start patch for 100 routers problem
if (strpos($id,'config_global_stores_') !== false) {
$xml = new SimpleXMLElement($data);
$xmlPath = $xml->xpath('web/routers/standard');
if (count($xmlPath) != 1) {
$e = new Exception();
Mage::log($e->getTraceAsString(), null, 'corrupt-cache.log', true);
//Returning true to prevent it from saving an incomplete cache entry
return true;
}
}
//End patch
if ($this->_disallowSave) {
return true;
}
/**
* Add global magento cache tag to all cached data exclude config cache
*/
if (!in_array(Mage_Core_Model_Config::CACHE_TAG, $tags)) {
$tags[] = Mage_Core_Model_App::CACHE_TAG;
}
return $this->getFrontend()->save((string)$data, $this->_id($id), $this->_tags($tags), $lifeTime);
}
This code change did not 'solve' the issue, but it did stop corrupted configuration being saved and prevented the website crashing so much.
It was also useful as a point of debugging, as I now had a place from which I could monitor and log the issue. The stack trace logged here was vital in learning what was causing the issue.
By using apache bench to stress my Magento instance along with a lot of file_put_contents
debugging I was able to discover that the invalid configuration was generated in the loadDb
method of Mage_Core_Model_Config
, but only under the following conditions
Mage_Core_Model_Config::init()
has been called on the singleton twice.- The first call to
loadModulesCache
must successfully load from cache and set$_useCache = true
- The second call must fail to retrieve the config from cache, and proceed to incorrectly rebuild the cache because
$_useCache = true
is still set.
To understand how we get this flow we'll have to revisit Mage_Core_Model_Config::init()
and a few other functions.
/**
* Initialization of core configuration
*
* @return Mage_Core_Model_Config
*/
public function init($options=array())
{
$this->setCacheChecksum(null);
$this->_cacheLoadedSections = array();
$this->setOptions($options);
$this->loadBase();
$cacheLoad = $this->loadModulesCache();
if ($cacheLoad) {
return $this;
}
$this->loadModules();
$this->loadDb();
$this->saveCache();
return $this;
}
We're interested in $cacheLoad = $this->loadMoudulesCache()
, the first call successfully retrieved something that resolved to true
while the second call received something that resolved to false
.
Digging deeper into the code.
/**
* Load cached modules configuration
*
* @return bool
*/
public function loadModulesCache()
{
if (Mage::isInstalled(array('etc_dir' => $this->getOptions()->getEtcDir()))) {
if ($this->_canUseCacheForInit()) {
Varien_Profiler::start('mage::app::init::config::load_cache');
$loaded = $this->loadCache();
Varien_Profiler::stop('mage::app::init::config::load_cache');
if ($loaded) {
$this->_useCache = true;
return true;
}
}
}
return false;
}
loadModulesCache
attempts to load the configuration from cache, if it is loaded it sets $_useCache = true
and returns true
so that we do not continue to regenerate the cache in the init
method. The main points for this call failing would be that loadCache
or _canUseCacheForInit
returns false
.
Again, digging deeper into the code.
/**
* Check if cache can be used for config initialization
*
* @return bool
*/
protected function _canUseCacheForInit()
{
return Mage::app()->useCache('config') && $this->_allowCacheForInit
&& !$this->_loadCache($this->_getCacheLockId());
}
_canUseCacheForInit
ensures the cache is enabled and that it is not locked. For some reason Magento actually uses the cache to lock itself $this->_loadCache($this->_getCacheLockId())
.
The problem in our case was that on the second run of Mage_Core_Model_Config::init()
we were failing the _canUseCacheForInit
call because the cache was locked. This meant that we would proceed to regenerate and save the CONFIG
cache while the singletons $_useCache
was erroneously still set to true
.
As far as I am aware the cache is locked only within the saveCache
call of the Mage_Core_Model_Config
singleton. The cache is locked after the configuration has been generated and before the calls to _saveCache
which will save the config for config_global
, config_websites
and config_stores_{stores}
. The cache lock is removed when all the configuration has been saved in cache.
/**
* Save configuration cache
*
* @param array $tags cache tags
* @return Mage_Core_Model_Config
*/
public function saveCache($tags=array())
{
if (!Mage::app()->useCache('config')) {
return $this;
}
if (!in_array(self::CACHE_TAG, $tags)) {
$tags[] = self::CACHE_TAG;
}
$cacheLockId = $this->_getCacheLockId();
if ($this->_loadCache($cacheLockId)) {
return $this;
}
if (!empty($this->_cacheSections)) {
$xml = clone $this->_xml;
foreach ($this->_cacheSections as $sectionName => $level) {
$this->_saveSectionCache($this->getCacheId(), $sectionName, $xml, $level, $tags);
unset($xml->$sectionName);
}
$this->_cachePartsForSave[$this->getCacheId()] = $xml->asNiceXml('', false);
} else {
return parent::saveCache($tags);
}
$this->_saveCache(time(), $cacheLockId, array(), 60);
$this->removeCache();
foreach ($this->_cachePartsForSave as $cacheId => $cacheData) {
$this->_saveCache($cacheData, $cacheId, $tags, $this->getCacheLifetime());
}
unset($this->_cachePartsForSave);
$this->_removeCache($cacheLockId);
return $this;
}
What was happening was likely due to the shared cache storage between multiple servers, but could also have been caused by multiple processes running on one server.
Here's a step-by-step of what was happening in our instance, we had a cronjob which was calling Mage::app()->init()
multiple times which accounted for the repeated initialisation of Mage_Core_Model_Config
.
Time | Process 1 | Process 2 | Shared Cache Lock |
---|---|---|---|
1 | init() - loadModulesCache succeeds and sets $_useCache = true . |
- | |
2 | Some code is executed | init() - loadModulesCache fails because someone has hit Flush Cache in the admin panel |
|
3 | Some code is executed | init() - loadModules , loadDb work as expected. |
|
4 | Some code calls Mage::app()->init() |
init() - saveCache is initiated |
|
5 | Mage::app()->init() calls Mage_Core_Model_Config::init() |
saveCache() - sets the cache lock |
LOCKED |
6 | init() - loadModulesCache fails as cache lock is present |
saveCache() - saves config to cache |
LOCKED |
7 | init() - calls loadModules and loadDb with $_useCache = true , generated invalid config |
saveCache() - removes the cache lock, init completes. |
|
7 | init() - calls saveCache with incorrectly generated data, causing the errors. |
Some code is executed |
If you have a look at 100-router-script.php
you can see a simple script which should allow you to reproduce the bug on a Magento instance. Simply download it to the root of your Magento instance and run it. Alternatively you can specify the location of Mage.php
using an environment variable.
MAGE_PATH="/path/to/magento/app/Mage.php" php 100-router-script.php
I was unable to easily reproduce the time sensitive cache hit on global_config.lock
, however I was able to emulate it by making loadModulesCache
fail to load config_global
on the second call.
The phpunit tests simulate the time sensitive cache hit on global_config.lock
in two different ways:
- By using
Convenient_Core_Model_Config
, a custom configuration model which returnstrue
for the second call toloadCache
for theconfig_global.lock
entry. - By removing
config_global
from the cache as in the100-router-script.php
file.
These tests enable the CONFIG
cache.
Run the unit tests by specifying an environment variable pointing your Magento installation.
git clone https://github.com/AmpersandHQ/magento-ce-ee-config-corruption-bug.git
composer install
MAGE_PATH="/path/to/magento/app/Mage.php" vendor/bin/phpunit
On an unpatched instance you should see the following output
PHPUnit 4.6.4 by Sebastian Bergmann and contributors.
Configuration read from /lukerodgers.co.uk/magento-ce-ee-config-corruption-bug/phpunit.xml
FF
Time: 2.53 seconds, Memory: 16.25Mb
There were 2 failures:
1) ConfigurationTest::reinitWithAlternativeConfigModel
Failed asserting that two strings are equal.
/lukerodgers.co.uk/magento-ce-ee-config-corruption-bug/tests/ConfigurationTest.php:78
2) ConfigurationTest::reinitMissingCacheEntry
Failed asserting that two strings are equal.
/lukerodgers.co.uk/magento-ce-ee-config-corruption-bug/tests/ConfigurationTest.php:121
On a patched instance you should see:
PHPUnit 4.6.4 by Sebastian Bergmann and contributors.
Configuration read from /lukerodgers.co.uk/magento-ce-ee-config-corruption-bug/phpunit.xml
..
Time: 2.87 seconds, Memory: 16.50Mb
OK (2 tests, 2 assertions)
2 weeks of work and all this for a 1 line fix.
I overrode Mage_Core_Model_Config
and forced $_useCache = false
when regenerating the config. This fix is validated by the replication script/phpunit tests above, and completely stopped this bug from occurring in our instances.
/**
* Initialization of core configuration
*
* @return Mage_Core_Model_Config
*/
public function init($options=array())
{
$this->setCacheChecksum(null);
$this->_cacheLoadedSections = array();
$this->setOptions($options);
$this->loadBase();
$cacheLoad = $this->loadModulesCache();
if ($cacheLoad) {
return $this;
}
//100 Router Fix Start
$this->_useCache = false;
//100 Router Fix End
$this->loadModules();
$this->loadDb();
$this->saveCache();
return $this;
}
It would be naive of me to believe that this will completely solve the issue for everyones Magento setup. If it does not work for you then I recommend you
- Implement the logic mentioned in the Debugging the Issue section.
- Wait for some log data to appear.
- Contact me!
I do not believe that this fix will affect performance in any negative way as the usual flow for loadModules
, loadDb
and saveCache
is for $_useCache
to be false
.