Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageCache: async rendering of blocks can corrupt layout cache #8554

Closed
bka opened this issue Feb 14, 2017 · 31 comments
Closed

PageCache: async rendering of blocks can corrupt layout cache #8554

bka opened this issue Feb 14, 2017 · 31 comments
Labels
bug report Component: Framework/Cache Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development

Comments

@bka
Copy link
Contributor

bka commented Feb 14, 2017

Hi folks,

I disovered a potential bug regarding the layout cache which leads to a blank page. If this happens, customers may see a white page without any hmtl inside the body tags. It might be related to #6942, although there is no php-fpm or libxml in the game here.

Preconditions

  1. Tested on Magento 2.1.0 CE and 2.1.4 CE

Steps to reproduce

  1. bin/magento cache:clean
  2. curl -gI "http://magento2.local/page_cache/block/render/?ajax=1&blocks=[%22header.login%22]&handles=[%22default%22%2C%22cms_index_index%22%2C%22cms_page_view%22%2C%22cms_index_index_id_home%22]&originalRequest=a&version="
  3. curl -s http://magento2.local | grep "<body" -A 3 -B 3

Expected result

  1. Body tag should still contain some data

Actual result

  1. Generated DOM has an empty body like this
<body data-container="body" data-mage-init='{"loaderAjax": {}, "loader": { "icon": "http://magento2.local/pub/static/version1487104262/frontend/Magento/luma/de_DE/images/loader-2.gif"}}' class="cms-index-index page-layout-1column">
</body>

auswahl_092

Discussion

This ajax call seems to destroy the layout cache. If it is not present, structure which gets loaded and unserialized in lib/internal/Magento/Framework/View/Layout.php:generateElements from cache does not contain any root element anymore:

$ awk -F "root" '{print NF-1}' var/cache/mage--0/mage---792_STRUCTURE_LAYOUT_FRONTEND_STORE1_2C7461AF04BB11FE209659D7147D2C579
0
0

Cleaning the cache again and loading the front page normally generates a valid layout cache, where root is present:

$ awk -F "root" '{print NF-1}' var/cache/mage--0/mage---792_STRUCTURE_LAYOUT_FRONTEND_STORE1_2C7461AF04BB11FE209659D7147D2C579
0
98

I think it has something do to with the used handles [default, cms_index_index, cms_page_view, cms_index_index_id_home] which are
used for generating the cache key in getCacheId. If there is no cache present, the page cache block just writes its content into the layout cache although there is no root layout used.

I need to dive deeper into this issue. It normally should not happen, because the DOM should get loaded first. And subsequent ajax calls should always have a layout cache present. But if cache gets cleared manually and there is already an ajax call on its way it can lead to a broken layout cache.

@bka
Copy link
Contributor Author

bka commented Feb 15, 2017

Ok, I did some debugging. It seems to only affect CMS pages and is related to the way how Magento_PageCache module works. This module renders a script into the DOM to fetch some blocks asynchronously.

<script type="text/x-magento-init">
    {
        "body": {
            "pageCache": {"url":"http:\/\/magento2.local\/index.php\/page_cache\/block\/render\/","handles":["default","cms_index_index","cms_page_view","cms_index_index_id_home"],"originalRequest":{"route":"cms","controller":"index","action":"index","uri":"\/index.php\/"},"versionCookieName":"private_content_version"}        }
    }
</script>

This is done in PageCache/view/frontend/web/js/page-cache.js. This call hits PageCache/Controller/Block/Render.php and PageCache/Controller/Block.php:_getBlocks triggers the loading of the layout:

$this->_view->loadLayout($handles, true, true, false);

However, because of the way how lib/internal/Magento/Framework/View/Model/Layout/Merge.php:_fetchRecursiveUpdates works, the pageLayout will be set to admin-1column defined in app/code/Magento/Theme/view/adminhtml/page_layout/admin-1column.xml. Unfortunately, in this case lib/internal/Magento/Framework/View/Page/Layout/Reader.php:read just gets an empty layout <layout/> for $xml.

public function read(Layout\Reader\Context $readerContext, $pageLayout)
{
    $this->getPageLayoutMerge()->load($pageLayout);
    $xml = $this->getPageLayoutMerge()->asSimplexml();
    $this->reader->interpret($readerContext, $xml);
}

I think the layout is not loaded because our area in this case is frontend not adminhtml. So this layout won't be loaded in this case. And therefore there are some critical elements missing inside scheduledStructure afterwards, like e.g. the root container.

Now, the bad thing happens. This structure gets stored into the cache, if there is none. This is done in lib/internal/Magento/Framework/View/Layout.php:generateElements.

    $cacheId = 'structure_' . $this->getUpdate()->getCacheId();
    $result = $this->cache->load($cacheId);
    if ($result) {
        $this->readerContext = unserialize($result);
    } else {
        \Magento\Framework\Profiler::start('build_structure');
        $this->readerPool->interpret($this->getReaderContext(), $this->getNode());
        \Magento\Framework\Profiler::stop('build_structure');
        $this->cache->save(serialize($this->getReaderContext()), $cacheId, $this->getUpdate()->getHandles());
    }

The $cacheId will be the same as for a normal request because it is based on used handles. But remember, we have missing elements in the structure like root. Next time, somebody visits the page, he will get a blank page without body, because this broken layout structure has been loaded from the cache.

broken

For normal requests on CMS pages, this is not happening, because the controller sets the pageLayout explicitly in app/code/Magento/Cms/Helper/Page.php:setLayoutType before the layout is loaded. This is usually 1column defined in app/code/Magento/Theme/view/frontend/page_layout/1column.xml and can then easily be loaded by lib/internal/Magento/Framework/View/Page/Layout/Reader.php.

valid

I would really appreciate any advice on how to fix this.

@erikhansen
Copy link
Contributor

erikhansen commented Feb 27, 2017

@bka Thanks for sharing your findings. I'm curious whether you spent time to find a solution to this issue?

I'm experiencing a very similar issue with an empty homepage (where the body tag is empty). Until I read your issue description, I wasn't able to consistently reproduce the issue, but I'm now able to reproduce it if I follow these steps:

  1. Flush Magento cache:
    bin/magento cache:flush
    
  2. Run a curl request like this to request the ESI (Edge Side Include) of the catalog nav that Varnish requests from Magento (note: This request will work even on environments that don't have Varnish configured, so it's possible to reproduce this in environments using the built-in full page cache):
    curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/
    
    The content returned by this curl request will be empty and will corrupt the layout cache, per your findings above.
  3. Run a curl request to request the homepage content:
    curl -s http://example.dev
    
    The content returned by this curl request will include an empty body, similar to your screenshot in your original post.

If I reverse steps 2 and 3, the homepage content will contain content like expected, and the curl request for the menu/navigation content will also contain content.

On a side note, I initially suspected Varnish was at fault with this issue as I know that an improper Varnish configuration can cause the catalog navigation to go missing. I've been experiencing issues with both a blank homepage and then sometimes the navigation is missing (but the rest of the homepage content is loaded). I was able to figure out the http://example.dev/page_cache/block/esi/blocks… url used in step 2 by using the Gist that was shared in this comment. After finding this issue, I tried requesting that ESI url in the manner you described and that's how I was able to consistently reproduce the issue.

I was curious to see what combination of handles were successful vs not, so I ran varnishlog while loading a category page to get an example query that did work. Here is what I found:

Broken requests:

curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22cms_index_index%22,%22cms_page_view%22,%22cms_index_index_id_home%22%5D/

Working requests:

curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22catalog_product_view%22,%22catalog_product_view_id_38640%22,%22catalog_product_view_sku_XX-XXX-XXX%22,%22catalog_product_view_type_simple%22%5D/
curl -s http://example.dev/page_cache/block/esi/blocks/%5B%22catalog.topnav%22%5D/handles/%5B%22default%22,%22catalog_product_view%22%5D/

@bka
Copy link
Contributor Author

bka commented Feb 27, 2017

Hi @erikhansen,

thanks for your feedback. I have some additional insights. This issue relates to usage of $_isScopePrivate. Devdocs say it wont work properly. This also explains why this issue did not appear on standard luma. Indeed, we had some blocks with isScopePrivate.

Having a block defined with isScopePrivate will trigger async loading of blocks also known as Varnish Hole Punching. Sad thing is, if this Block is present on the homepage, indicated by handles default, cms_index_index, cms_page_view, cms_index_index_id_home it corrupts the layout cache if it is empty.

Related source files are:

  • Module_PageCache/Controller/Block.php
  • Module_PageCache/Observer/ProcessLayoutRenderElement.php
  • Module_PageCache/view/frontend/web/js/page-cache.js

I fixed this by patching Module_PageCache/Controller/Block.php to include an additional pseudo handle which changes the generated cache id like this:

/**
 * Get blocks from layout by handles
 *
 * @return array [\Element\BlockInterface]
 */
protected function _getBlocks()
{
    $blocks = $this->getRequest()->getParam('blocks', '');
    $handles = $this->getRequest()->getParam('handles', '');

    if (!$handles || !$blocks) {
        return [];
    }
    $blocks = json_decode($blocks);
    $handles = json_decode($handles);

    // fix for layout corruption bug #8554
    if (is_array($handles)) {
        $handles[] = 'fpc_block';
    }

    $this->_view->loadLayout($handles, true, true, false);
    $data = [];

    $layout = $this->_view->getLayout();
    foreach ($blocks as $blockName) {
        $blockInstance = $layout->getBlock($blockName);
        if (is_object($blockInstance)) {
            $data[$blockName] = $blockInstance;
        }
    }

    return $data;
}

@erikhansen
Copy link
Contributor

@bka - Thanks for your feedback. While your workaround might fix your specific manifestation of this issue, it doesn't fix mine. And I don't think my manifestation of the issue has anything to do with $_isScopePrivate. I'll post here if I find the solution to my problem.

@magento - I was able to easily reproduce my issue on a vanilla Magento 2 install using the steps I outlined in my comment above.

@erikhansen
Copy link
Contributor

erikhansen commented Feb 28, 2017

@bka - I have a solution to the issue. You can reference the fix in this commit: 99bb170

Try applying this patch to your environment and see if the issue goes away:

diff --git a/vendor/magento/module-ui/view/base/layout/default.xml b/vendor/magento/module-ui/view/base/layout/default.xml
index 7f2efbd..64d5f14 100644
--- a/vendor/magento/module-ui/view/base/layout/default.xml
+++ b/vendor/magento/module-ui/view/base/layout/default.xml
@@ -5,7 +5,7 @@
  * See COPYING.txt for license details.
  */
 -->
-<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" layout="admin-1column" xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
+<page xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="urn:magento:framework:View/Layout/etc/page_configuration.xsd">
     <body>
         <referenceContainer name="after.body.start">
             <block class="Magento\Ui\Block\Logger" name="logger" template="Magento_Ui::logger.phtml"/>

If the issue does go away, I'd recommend closing this issue.

@erikhansen
Copy link
Contributor

@bka Have you had a chance to test my suggested patch yet?

@erikhansen
Copy link
Contributor

@bka Any update on this?

@andreymoskvenkov
Copy link

The root cause of this issue is that the "pageLayout" parameter for the layout builder is missing. However before starting blocks and XML generation Magento adds additional blocks to the list based on selected pageLayout: 1column, 2columns or empty. By default for the most of the pages the layout is: 1column. As the result: almost all default pages are working as expected. But once you customize your theme a little bit, using "move" actions, the full page cache becomes broken. The solution is pretty complex: I have overridden Magento\Framework\View\Page\Builder, Magento\Framework\View\Layout, Magento\PageCache\Observer\ProcessLayoutRenderElement and Magento\PageCache\Controller\Block\Esi, by adding a new parameter to the URL and to the controller, named: "page_layout" and then I set it into the view in the Esi controller. Works for me.

@cherreman
Copy link

@andreymoskvenkov Can you provide a patch so we can verify your solution?

@andreymoskvenkov
Copy link

andreymoskvenkov commented Mar 27, 2017

Sorry, but I can provide only a tip, which I have already done. Tips are free :)

@cherreman
Copy link

@andreymoskvenkov C'mon man, this is opensource. Take a little, give a little ;)

All kidding aside, I appreciate your tip, but I'm sure your patch would be appreciated even more and it could be validated. If you can't share it for whatever reason, I understand.

@erikhansen
Copy link
Contributor

@cherreman Did you happen to try the patch I included in my comment on February 28th?

@andreymoskvenkov
Copy link

andreymoskvenkov commented Mar 27, 2017

Yeah, there are many reasons of not sharing the code, one of them is - NDA of the company, where I am working for. We have fixed a lot of core bugs of M2 internally and this is one of the most difficult to find. And why have I posted a tip here is - because this issue number #8554 has also given me a tip about layout type such as "1column", "empty", etc. I was debugging it almost whole day and was starting to become a crazy, because I could not understand: why in the Layout object's "_blocks" property the list of generated blocks are only a few of them in page_cache controller, however in CMS_index controller the list of blocks in "_blocks" property is big: about 79 blocks there. And at the same time the final XML in both actions is almost the same! The cause of that was: because Magento adds some blocks to the "_blocks" property manually in a separate method depending on what "pageLayout" is set. As the result: in page_cache action no page layout was passed and no page layout was set.
I don't need a validation of my fix :) It is on Production already and it works for us 100% :)

adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue May 9, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue May 9, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue May 9, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue May 9, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue May 10, 2017
@miguelbalparda
Copy link
Contributor

#9560 seems to fix this issue, can you try that fix and report back? I'd like to merge that PR but I will need some help from you 😄

@andreymoskvenkov
Copy link

:-D
Sorry, but no, this PR will not fix the issue. In the fix need to Pass "page_layout" code into an API request together with the list of applied handlers on a page. Your fix just adds an additional handler, which is used only in the FullPageCache's controller.

@miguelbalparda
Copy link
Contributor

@adrian-martinez-interactiv4 can you chime in here?

@adrian-martinez-interactiv4
Copy link
Contributor

@andreymoskvenkov If I have understood well, you are proposing to include the additional handle in the ajax request, instead of putting it within the block page cache controller. I don't know if I'm missing something, could you clarify a bit your previous explanation pls? Thanks in advance.

@andreymoskvenkov
Copy link

andreymoskvenkov commented May 16, 2017

No, I propose to add an additional parameter to the PageCache's URL: 'page_layout', where to put what Layout type was used on the current page and then to add a processing of this parameter in the ESI controller, which should just take the value from that parameter from the request and to set it into Layout object. Then everything will work correctly. And this is not easy btw, because of crazy Magento 2 architecture (from what I see the core developers instead of thinking more deeply, they just split responsibilities between classes randomly and apply "protected", "public" and "private" keywords also randomly :) ). It is not a "handler", it is an additional characteristics (or parameter, or property) of Layout entity.

@adrian-martinez-interactiv4
Copy link
Contributor

Well, I'm been doing some research about this issue. I'll be explaining each point later, but long story short:

  1. This issue is still happening at CE 2.1.6, but it seems to be already solved in develop branch.
  2. Actually, issue described at PR Fix PageCache: async rendering of blocks can corrupt layout cache #8554 #9050 #9560 is not the same as the one described here; it references an issue related to layout updates related with the cms page not being applied. This issue PageCache: async rendering of blocks can corrupt layout cache #8554 and Randomly getting an empty <body> that gets cached #9050 have been referenced there because, indirectly, also solved this problem.
  3. Also, to solve issue described in Fix PageCache: async rendering of blocks can corrupt layout cache #8554 #9050 #9560, as it's related to the layout handles used in getCacheId, either an additional unique layout handle should be used to avoid wrong layout cache being generated, or, as you propose, an additional property should be added to the layout, to be used (if defined) in getCacheId method, so it can be used anywhere else.

About point 1:
As said before, I'm able to reproduce it in CE 2.1.6, this one is ok:
captura de pantalla 2017-05-16 a las 22 20 30

And this one shows the empty body:
captura de pantalla 2017-05-16 a las 22 21 23

Same steps reproduced at 2.2.0-dev, first shows non-empty body, as expected:
captura de pantalla 2017-05-16 a las 22 47 31

But the same steps that resulted in an empty body in CE 2.1.6, return non-empty body in 2.2.0-dev:
captura de pantalla 2017-05-16 a las 22 48 15

Without going too deep in what commits are involved in solving this issue, what seems clear is that this issue, as long as #9050 , that I consider a duplicate of this one, are already solved.

About point 2:
Unfortunately, #9560 is not solved yet. As explained in comment #9560 (comment), page structure is cached based on layout handles, and it's not correctly generated due to cms page not being loaded to include its layout updates within the cached page structure. This leads us to point 3.

About point 3:
At this point, I see two ways to solve the issue, as described above. I chose the additional / unique layout handle solution, as the easiest and least intrusive option. In the other hand, I think it would be a good idea to have a custom property in the layout that may be used to generate the cache id if defined, along with layout handles. This would make it possible to generate unique ids for layout cache if specified from anywhere. I'm giving a try to this approach, but I see no reason to include a new param in the request, as long as the controller should be responsible to set the property in the layout to ensure a different, unique layout cache id is generated for that request. Anyway, I'm giving a try on that too.

As this issue seems to be solved, I'll continue giving feedback about #9560 in the thread of that PR.

@andreymoskvenkov
Copy link

andreymoskvenkov commented May 17, 2017

:-D
Ok, continue adding new handlers for FPC, up to you!
I also don't see no reason in adding a new param to the request, until it was a core Magento approach how they build the final layout XML: they phisically set this layout type into Layout object and then even Create some blocks from PHP code, if a layout type was set. That's why any other fixes, such as making CacheID unique, etc, do Not fix he issue Completely for ALL cases.
I have successfully fixed it for our client in March and we continue shareing this fix internally from project to project.
But you may continue having fun in fixing other bugs many times instead of fixing the root cause once.
:-D

@miguelbalparda
Copy link
Contributor

@andreymoskvenkov can you clarify what "we have an internal patch" means? If you have a patch for this or other issues, would you kind sharing them for others to benefit? Thanks!

@andreymoskvenkov
Copy link

andreymoskvenkov commented May 18, 2017

It meas, that I have fixed this issue and wrapped it up as a separate module. However as I said earlier, I won't share it. Magento Community has helped me to find the root cause of the issue and that's why I was able to fix it. That's why I want to thank the community for this and share the root cause of it. However I will not share the fix, because if after my notes you cannot fix the issue like I fixed it, then you are useless as Magento specialists and will not be able to provide high quality assistance to your clients on Magento 2 platform. So the goal of my posts here is: to introduce an exercise, which is based on one of real problems, which faces a Magento developer almost every day. I want to make the community stronger. Just providing a final fix will not improve the level of community developers.

@miguelbalparda
Copy link
Contributor

miguelbalparda commented May 18, 2017

That is not how any of this works.
If you really think you are setting the bar higher let me tell you that you are doing the complete opposite, leaving a bug open because you think people should be "smart" like you to fix it. This is how you make a community weaker because you say that if anyone is not as good as you it should not participate at all and that is completely false. But hey, this is OSS and we deal with this kind of cases every day so chances are someone will probably come and fix it. In the meantime @adrian-martinez-interactiv4 keep up the good work!

@renttek
Copy link

renttek commented May 18, 2017

if one could see and read the fix, one can learn from it, trying to understand what the problem was and how the solution was build. For me thats a giant part of OSS. If you have a patch and don't want to share it, okay, go your way, but please don't brag (it's nothing more to me) about it.

(And yes, it is about you, since you were implying, that you are good and if one can't do one thing you can, he or she is useless)

@magento-team
Copy link
Contributor

Internal ticket to track issue progress: MAGETWO-69018

@magento-engcom-team magento-engcom-team added 2.1.x Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development bug report Component: Framework/Cache Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed and removed G1 Passed labels Sep 5, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 21, 2017
…ento#8554 magento#9050 magento#9560

Create cache key object to be injected separately with its own interface
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
…ento#8554 magento#9050 magento#9560

Renamed interface, LayoutCacheKeyInterface made optional in constructor, injected via di.xml, some other little fixes
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
adrian-martinez-interactiv4 added a commit to adrian-martinez-interactiv4/magento2 that referenced this issue Sep 22, 2017
@magento-engcom-team
Copy link
Contributor

@bka, thank you for your report.
The issue is already fixed in develop branch, 2.2.0

magento-team pushed a commit that referenced this issue Oct 5, 2017
…ayout cache #8554 #9050 #11174

 - Merge Pull Request #11174 from adrian-martinez-interactiv4/magento2:FR#FIX-PAGECACHE-LAYOUT-CACHE-KEY
 - Merged commits:
   1. 3bdfa1b
   2. 5a6a000
   3. bf7df0d
   4. 7feba6c
   5. 1bb25ed
   6. 0e830c1
   7. 8a92e46
   8. 0a69b5b
   9. bd6ed7c
   10. c6e2b39
   11. 3cc51dc
   12. 2ae9c26
   13. 516b529
   14. efa3edb
   15. 3f60168
@magento-team
Copy link
Contributor

Internal ticket to track issue progress: MAGETWO-80647

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Component: Framework/Cache Issue: Format is valid Gate 1 Passed. Automatic verification of issue format passed Issue: Ready for Work Gate 4. Acknowledged. Issue is added to backlog and ready for development
Projects
None yet
Development

No branches or pull requests