Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

aduth · 2018-01-19T02:31:01Z

Blocker for #3745
Related: #3545
Possible resolution for #3799 ?

This pull request seeks to refactor Gutenberg's the_content filtering of dynamic blocks to bypass the grammar parser altogether and explicitly seek and replace dynamic block fragments via regular expressions.

The reasons for this change include:

Parsing all of content into a data shape during the front-end display of a post is heavy-handed when the only operation necessary is to replace dynamic blocks.
- It is quite likely for the majority of users' posts that there are no dynamic blocks at all, and this is wasted effort.
It supports nested block content, which is non-supportable with the current implementation because static blocks render with innerHTML, which does not include the parsed inner blocks (see also Parsing: Replace server serialization by parsed outerHTML #3545).

The downsides of this change include:

Relying on regular expressions to match block fragments, including the attributes JSON string which is passed through json_decode.
- The latter note is not a requirement, and it's possible that we could still leverage the grammar parser to extract the attributes array.
- Related to this is the maintenance cost of keeping syntax identification in sync.
We no longer strip the comment demarcations for static blocks
- They are stripped for dynamic blocks, though not necessarily so

To Do: (Non-blocking)

Run a performance benchmark comparison, as I suspect the changes proposed here perform significantly better than the grammar-based parser.

Testing instructions:

Verify PHPUnit tests pass:

phpunit

Try manually creating a post (via Text mode or Classic Editor) including both dynamic blocks and blocks with nested content, ensuring that when viewing source on the front-end, both the dynamic block rendered content and nested content is rendered as expected:

<!-- wp:latest-posts /-->

<!-- wp:columns -->
<div class="wp-block-columns has-2-columns">
    <!-- wp:paragraph {"layout":"column-1"} -->
    <p class="layout-column-1">col 1</p>
    <!-- /wp:paragraph -->

    <!-- wp:paragraph {"layout":"column-2"} -->
    <p class="layout-column-2">col 2</p>
    <!-- /wp:paragraph -->
</div>
<!-- /wp:columns -->

<!-- wp:paragraph -->
<p>end text</p>
<!-- /wp:paragraph -->

jorgefilipecosta · 2018-01-19T10:07:21Z

Another potential downside of this change is that it collides with the objective proposed by @mtias of not saving some inline styles in the post content and render them based on the attributes. If the non-dynamic blocks are not parsed we can not generate the styles for them dynamically in the server.

mtias · 2018-01-19T13:23:22Z

lib/blocks.php

 */
-function gutenberg_render_block( $block ) {


I think we should keep both methods — with the full tree-parsing being only an option. (Say a REST endpoint that wants to serve the tree as JSON, etc.)

I think we should keep both methods — with the full tree-parsing being only an option. (Say a REST endpoint that wants to serve the tree as JSON, etc.)

The parsing class is left untouched, and in fact the gutenberg_parse_blocks helper function itself; this is what seemed to be the most value in what came to mind as needing a JSON tree of blocks. Do you think this specific function is still valuable, to generate the string output of a block object? All of the underlying tooling is still largely available, it just becomes:

$html = $block_type->is_dynamic() ? $block_type->render( $block['attrs'] ) : $block['innerHTML'];

aduth · 2018-01-19T14:18:16Z

Another potential downside of this change is that it collides with the objective proposed by @mtias of not saving some inline styles in the post content and render them based on the attributes. If the non-dynamic blocks are not parsed we can not generate the styles for them dynamically in the server.

As far as I can discern, the inline styles are themselves static content, so is it necessary or desirable for this to happen dynamically at the server (vs. in the client before save)? Or at least that this must occur at display time (vs. at save time, where a full parse might be more reasonable)?

mtias · 2018-01-19T14:19:46Z

Chatting with @jorgefilipecosta we are going to revise the color-styles approach in any case, and it's actually a good constrain to make it statically viable.

dmsnell · 2018-01-19T16:59:29Z

Let me know how I can be helpful here.

I'm not sure entirely what the goal of this is in terms of technically what it's doing to
the post_content but from looking at the RegExp machine I'm guessing it may still
be worth considering a PEG-based approach just because it would probably be much
clearer. That's not to say that we need to perform a full parse, but we could potentially
create another parser just to do specific actions.

One of the challenges here is that when we are talking about nesting we're manually
recreating the stack-context that the parser does for us in a very reliable and established
way. Thankfully though we have one thing working in our favor: the block boundaries
are easy to find and we cannot find in a legitimate post another  and so nesting is somewhat easier to track than if we used, for example,
parentheses.

dmsnell · 2018-01-19T22:43:35Z

@aduth after some talks with @mtias in Slack I wanted to post some doodles/idea I had where we can reuse the PEG parser generator to do some of these kinds of tasks without carrying over all the heft of the full grammar parser. The PEGs are still nice because they do have favorable performance characteristics and handle nesting in reliable and robust ways.

Check out the following fake grammar and document for an illustration of what I'm talking about
https://dmsnell.github.io/peg-parser-explorer/

Grammar

{
  const dynamicTypes = [ 'dynamic' ];

  function isDynamic( name ) {
    return dynamicTypes.indexOf( name ) !== -1
  }
}

Output
  = bs:Block *
  { return bs.join( '' ) }

Block
  = "<!-- wp:" name:$( !" ". ) + " -->" $( !"<!--". ) + "<!-- /wp:" $( !" ". ) + " -->"
  { return isDynamic( name.trim() ) ? "blargynamic" : text() }

Document

<!-- wp:core/paragraph -->
Just some text
<!-- /wp:core/paragraph --><!-- wp:dynamic -->
blarg
<!-- /wp:dynamic --><!-- wp:core/paragraph -->
More text
<!-- /wp:core/paragraph -->

So while the doodles are naïve and incomplete I think they show how we can write make something far more performant while borrowing some of the same rules and matchers from the full grammar. Note that the top function block can actually be PHP code instead of JavaScript but for the interactive grammar explorer I did JS.

pento · 2018-01-20T12:52:21Z

I like the idea of only parsing dynamic blocks, parsing static blocks that aren't going to be touched is probably a waste of time.

That said, there's definitely going to be requests for an ability to filter each block before output. For example, #2871 relies on that existing. While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

The "light parser" idea sounds promising. Split the post into the raw blocks, then loop over each block, fully parsing dynamic blocks, but only parsing static blocks if something is actually hooked into the relevant filter. This would also help with removing the comment demarcations, because outputting them to the frontend will not be a popular change. 🙂

dmsnell · 2018-01-20T15:12:12Z

While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

@pento I'm not quite grasping what you are communicating here. Could you help me understand/reword this?

aduth · 2018-01-20T21:22:15Z

That said, there's definitely going to be requests for an ability to filter each block before output. For example, #2871 relies on that existing. While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

While there was already some conversation there about it, a table of contents doesn't seem like something that necessarily needs dynamic front-end rendering to implement.

pento · 2018-01-21T10:27:32Z

I'm not quite grasping what you are communicating here. Could you help me understand/reword this?

Sorry, that was lazy writing on my part. 🙂

So, #2871 does two things that require iterating over static blocks:

It adds an id attribute to the <h.> tags in heading blocks.
It grabs the text content of heading blocks, to use in the table of contents block.

In order to implement these without server side rendering, we'd need to:

To add the id attribute, the easiest way would probably be to enforce the anchor attribute to always be true in heading blocks. We could kind of hack this in by adding a late filter on blocks.getSaveContent.extraProps, to override the core/anchor/save-props/addSaveProps filter, but that gives a bad UX, because the "HTML Anchor" InspectorControl selected state wouldn't necessarily match what's changed, with no explanation. The better option for this route would be to modify the core/anchor plugin recognise when a ToC block exists, and provide appropriate information to the end user.

To grab the text content for heading blocks, we'd need to register a new selector in the editor store, to expose that data.

A third feature, which hadn't been implemented yet in #2871, would add meta-links to the heading element: one for linking to itself (for easy copy/paste of the link), and one to link back to the ToC. I don't think there's a filter that allows modifying static block content, yet.

As @aduth mentioned, there's an argument for implementing the ToC block in JS. But to get back to my original point, such an implementation requires a bunch of modifications to Gutenberg's core functionality, while implementing it server-side requires adding a filter to the PHP block loop. While I can easily make the required changes, they're specific changes for a particular case, that option isn't available to the vast majority of plugin developers. Having a more generic filter that runs over every block at render-time, however, is a simple solution that makes cool and complex things possible.

It's also worth considering future behaviour, here—when every part of the entire page render is a block, I think that having a standard render-time filter to interact with each block in the page would be super useful.

aduth · 2018-01-25T15:43:04Z

@dmsnell

it may still be worth considering a PEG-based approach just because it would probably be much
clearer. That's not to say that we need to perform a full parse, but we could potentially create another parser just to do specific actions.

Yeah, I could imagine something like this as well. The challenge I saw, which is still unclear to me after your demonstration, is how (at least given the current tooling) we would feed information about known dynamic block names into the parser at runtime.

The PEGs are still nice because they do have favorable performance characteristics

Aside and of my own curiosity: More favorable than RegExp?

aduth · 2018-01-25T15:44:21Z

Might be worth noting that there's nothing that had been applied by the_content filter we used previously which could not still be achieved by a plugin's own filter after the changes proposed here. Which is to say, the full parser still exists, and a plugin can still parse, extract, and manipulate individual blocks from their own the_content filter handlers.

dmsnell · 2018-01-25T15:55:15Z

how we would feed information about known dynamic block names into the parser at runtime.

This would occur through globals for better or worse. I think in the PHP side it's actually not as bad since that's more normative for PHP modules. My random idea was to have a function that the PEG parser calls to handle the blocks.

More favorable than RegExp?

The nice things about PEGs are that they run in linear time with respect to the length of the document being parsed (well, when combined with a Packrat memoizing parser as pegjs pulls in). RegExps are notoriously difficult to reason about for performance because they run in time corresponding to the product of the complexity of the RegExp and the document. That is how we get catastrophic backtracking and other RegExp DOS attacks.

In my personal experience mostly here on Gutenberg too the biggest performance issues with our grammar are mostly obvious. Our memory issue is probably related to the fact that we build a huge list of characters when parsing freeform HTML and then we join that list at the end. Other spots could be potentially improved by sharing structure among related rules (sharing <!-- among block boundaries, for example). The RegExp for me just normally ends up being an opaque wall I have a hard time understanding or confidently modifying.

aduth · 2018-01-30T21:21:05Z

As this pull request is a blocker for #3745 and all concerns have been either addressed or are subject to future backwards-compatible refactoring, I'm inclined to move forward with the changes proposed here.

Re: Grammars: I agree we should explore creating a smaller grammar which is a subset of the current functionality targeting segmenting on dynamic blocks alone. This is only beneficial if it proves to be more performant or if the performance difference is negligible considering the maintainability benefits of keeping the grammars more closely in sync, or simply avoiding RegExps.

Re: Exposing full parse to plugin authors: As noted above, this is still available, though no longer occurring by default with front-end content rendering. As we should want to keep front-end rendering performant in the default case, I think this is a fair compromise.

aduth · 2018-01-30T21:22:12Z

Alternative approaches we might consider, if we really want to keep the current heavy parse for front-end rendering, are:

The parser innerHTML should include nested block content
or, the innerHTML should at least include a marker for where nested content is to be inserted, so the marker can be replaced with innerHTML of its inner blocks

phpbits · 2018-02-01T03:57:40Z

@aduth Could you add filter before adding each blocks? Based on the previous do_block code, something like this :

$content_after_blocks .= apply_filters( 'do_content_block', gutenberg_render_block( $block ), $block );

I hope this is possible for us developers to manage each blocks easily rather that overriding the_content. Thanks!

aduth · 2018-02-01T15:26:25Z

@phpbits As implemented here, we're identifying only dynamic blocks when processing content:

gutenberg/lib/blocks.php

Lines 129 to 142 in a5ca791

    
           $dynamic_block_names   = get_dynamic_block_names(); 
        
           $dynamic_block_pattern = ( 
        
           	'/<!--\s+wp:(' . 
        
           	str_replace( '/', '\/',                 // Escape namespace, not handled by preg_quote. 
        
           		str_replace( 'core/', '(?:core/)?', // Allow implicit core namespace, but don't capture. 
        
           			implode( '|',                   // Join block names into capture group alternation. 
        
           				array_map( 'preg_quote',    // Escape block name for regular expression. 
        
           					$dynamic_block_names 
        
           				) 
        
           			) 
        
           		) 
        
           	) . 
        
           	')(\s+(\{.*?\}))?\s+(\/)?-->/' 
        
           );

It's also important to note that this logic is flawed, specifically on failure to take balanced tags into account. This is a compromise we're okay with for dynamic blocks, assuming that dynamic blocks won't include nested content (or at least not nest itself).

It would fall apart on specific static blocks though:

<!-- wp:columns -->
<div class="wp-block-columns">
	<!-- wp:paragraph -->
	<p>Hello</p>
	<!-- wp:paragraph -->

	<!-- wp:columns -->
	<div class="wp-block-columns"></div>
	<!-- /wp:columns -->
</div>
<!-- /wp:columns -->

In the above case, it would treat the inner block's closing demarcation as the terminator for the wrapper columns block.

This is where a proper parser / grammar would be better suited which, while probably a desirable enhancement, is not currently implemented.

Which is all to say: It could be possible, yes, but not with the current logic.

phpbits · 2018-02-01T15:38:15Z

@aduth Thanks! Are there any particular reasons why you do not want to add filters? Something that developers can use so that we do not need to create another parser? Like on this code :

$rendered_content .= $block_type->render( $attributes );

Would be really helpful aside from overriding the_content. I've been integrating my plugin and in need to define the visibility of the content based on the custom attributes I've added. You can check the preview here : https://widget-options.com/block-options-widget-options-gutenberg-editor/ . Right now I'm removing the do_blocks on the filters and adding it again with modifications. Would be really helpful if you can provide hooks and filters on that function. Thank you very much!

aduth · 2018-02-01T15:53:31Z

Are there any particular reasons why you do not want to add filters?

To be clear, I've never implied this.

phpbits · 2018-02-01T16:38:02Z

@aduth My apology didn't mean to imply that you said that. Just asking if there are any particular reasons since I've asked if you can add filter on my previous reply. Let me know if it's doable. Thanks!

phpbits · 2018-02-06T14:22:16Z

@aduth i'm using the build on this repo now and not the release on .org repository. Seems like the WP_Block_Type_Registry::get_instance()->get_all_registered(); doesn't return all blocks like paragraph, headings etc. Do you mind letting me know how can I make your preg_match to get all blocks?

gutenberg/lib/blocks.php

Lines 129 to 142 in a5ca791

    
           $dynamic_block_names   = get_dynamic_block_names(); 
        
           $dynamic_block_pattern = ( 
        
           	'/<!--\s+wp:(' . 
        
           	str_replace( '/', '\/',                 // Escape namespace, not handled by preg_quote. 
        
           		str_replace( 'core/', '(?:core/)?', // Allow implicit core namespace, but don't capture. 
        
           			implode( '|',                   // Join block names into capture group alternation. 
        
           				array_map( 'preg_quote',    // Escape block name for regular expression. 
        
           					$dynamic_block_names 
        
           				) 
        
           			) 
        
           		) 
        
           	) . 
        
           	')(\s+(\{.*?\}))?\s+(\/)?-->/' 
        
           );

Thank you very much!

aduth · 2018-02-06T20:02:39Z

At the moment core static blocks are not known to the server, but this is planned to change and tracked at #2751.

phpbits · 2018-02-07T01:24:15Z

@aduth That's perfect! So WP_Block_Type_Registry::get_instance()->get_all_registered(); will be returning all block types including the core defaults like heading etc..? Will this be included on the next update? Thanks!

aduth · 2018-02-08T02:14:28Z

I don't expect it will be included in the next release, but it's on my list for the near future.

phpbits · 2018-02-08T02:18:48Z

@aduth Thanks! I've just copy the do_blocks and change the pattern and it's working perfectly now :) I just want to test it with the latest development updates like nested blocks but the build is missing the edit-post/ style.css : #4891 (comment) . Would you mind taking a look on this one too? Thank you very much for your help. Cheers!

gschoppe · 2018-04-06T12:33:26Z

This is a compromise we're okay with for dynamic blocks, assuming that dynamic blocks won't include nested content (or at least not nest itself)
@aduth

This choice limits a lot of options with Gutenberg. There are many features that would require the capability to self-nest dynamic blocks.

For example, imagine a block that allows the user to display different arbitrary block content on various days and times:

<!-- wp:gjs/timerelease -->
	<!-- wp:paragraph {"gjs/timerelease-interval":"day","gjs/timerelease-index":"3"} -->
	<p class="layout-column-1">Happy Thursday!</p>
	<!-- /wp:paragraph -->
	<!-- wp:gjs/timerelease {"gjs/timerelease-interval":"day","gjs/timerelease-index":"4"} -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"0-12"} -->
		<p class="layout-column-1">Happy Friday Morning!</p>
		<!-- /wp:paragraph -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"12-16"} -->
		<p class="layout-column-1">Happy Friday Afternoon!</p>
		<!-- /wp:paragraph -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"16-24"} -->
		<p class="layout-column-1">Happy Friday Evening!</p>
		<!-- /wp:paragraph -->
	<!-- /wp:gjs/timerelease -->
<!-- /wp:gjs/timerelease -->

This could be solved without self-nesting, by giving the blocks very complicated logic, but it would be much easier for users to understand if each level of nesting had a specific context applied.

A very similar situation could exist for a "restricted content" block, where you might want to filter by whether users are logged in or not, then filter by whether they have specific user levels, then filter by exclusive inclusions or exclusions. Once again, the logic could be flattened into a single extremely complicated block, but offering nested binary options would make the interface much more intuitive for users.

There are many more cases that I can think of that require content parsing but not necessarily self-nesting.

I believe arbitrarily limiting dynamic blocks to make them non-nesting, or to make them block self-nesting due to the performance of a specific implementation of block parsing isn't a good enough reason to remove features provided by the formal grammar.

aduth · 2018-04-08T18:06:35Z

Hi @gschoppe . I should clarify that I'm not overly attached to a RegExp based implementation. The objective here was primarily to unblock progress of nested blocks, where the previous full parse behavior was insufficient in being able to reconstruct the contents of a post, particularly in the re-serialization of a block's own inner blocks. A few alternative ideas were discussed here, covered in #4591 (comment) (also discussed in Slack). Ultimately and to your concerns, something like the limited / subset grammar previously discussed could make for a good iteration.

aduth added Framework Issues related to broader framework topics, especially as it relates to javascript [Type] Performance Related to performance efforts labels Jan 19, 2018

aduth requested a review from dmsnell January 19, 2018 02:31

mtias reviewed Jan 19, 2018

View reviewed changes

aduth mentioned this pull request Jan 20, 2018

Add the $content parameter to the block render callback documentation #4507

Closed

3 tasks

This was referenced Jan 24, 2018

Blocks: Support block nesting (Add Columns block, InnerBlocks component) #3745

Merged

Pass the block name as a parameter to the dynamic block render callback #4673

Closed

mtias added this to the Feature Complete milestone Jan 26, 2018

aduth force-pushed the update/server-do-blocks-parse branch from 740f274 to e39030c Compare January 30, 2018 19:18

aduth mentioned this pull request Jan 30, 2018

Refactor WP_Block_Type::render() logic. #4716

Closed

2 tasks

Framework: Skip server grammar parse, seek dynamic blocks explicitly

969b7c9

aduth force-pushed the update/server-do-blocks-parse branch from e39030c to 969b7c9 Compare January 30, 2018 21:20

aduth merged commit 49f05e5 into master Jan 30, 2018

aduth deleted the update/server-do-blocks-parse branch January 30, 2018 21:27

aduth mentioned this pull request Feb 1, 2018

Calling render_callback() without register_block_type? #4723

Closed

aduth mentioned this pull request Feb 14, 2018

Framework: Strip comment demarcations in content filtering #5042

Merged

oandregal mentioned this pull request Feb 21, 2018

Out of memory in parser #3799

Closed

mattheu mentioned this pull request Mar 23, 2018

Pass content to render callback #5760

Closed

mtias mentioned this pull request Apr 4, 2018

Gutenberg still parsing HTML with regular expressions #5967

Closed

dmsnell mentioned this pull request Apr 13, 2018

WIP: Add dynamic block #6170

Closed

mattheu mentioned this pull request Apr 18, 2018

Pass content to block render callback #6239

Closed

4 tasks

aduth mentioned this pull request Jun 19, 2018

Add support for Child Blocks #6753

Merged

mcsf mentioned this pull request Jul 27, 2018

Overview of Short-term Parsing Enhancements #8244

Closed

11 tasks

mcsf mentioned this pull request Aug 30, 2018

Parser: Propose new hand-coded parser #8083

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

aduth commented Jan 19, 2018 •

edited by mtias

Loading

jorgefilipecosta commented Jan 19, 2018

mtias Jan 19, 2018

aduth Jan 19, 2018

aduth commented Jan 19, 2018

mtias commented Jan 19, 2018

dmsnell commented Jan 19, 2018

dmsnell commented Jan 19, 2018 •

edited

Loading

pento commented Jan 20, 2018

dmsnell commented Jan 20, 2018

aduth commented Jan 20, 2018

pento commented Jan 21, 2018

aduth commented Jan 25, 2018

aduth commented Jan 25, 2018

dmsnell commented Jan 25, 2018

aduth commented Jan 30, 2018

aduth commented Jan 30, 2018

phpbits commented Feb 1, 2018

aduth commented Feb 1, 2018

phpbits commented Feb 1, 2018

aduth commented Feb 1, 2018

phpbits commented Feb 1, 2018

phpbits commented Feb 6, 2018 •

edited

Loading

aduth commented Feb 6, 2018

phpbits commented Feb 7, 2018

aduth commented Feb 8, 2018

phpbits commented Feb 8, 2018

gschoppe commented Apr 6, 2018 •

edited

Loading

aduth commented Apr 8, 2018

Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

Conversation

aduth commented Jan 19, 2018 • edited by mtias Loading

jorgefilipecosta commented Jan 19, 2018

mtias Jan 19, 2018

Choose a reason for hiding this comment

aduth Jan 19, 2018

Choose a reason for hiding this comment

aduth commented Jan 19, 2018

mtias commented Jan 19, 2018

dmsnell commented Jan 19, 2018

dmsnell commented Jan 19, 2018 • edited Loading

pento commented Jan 20, 2018

dmsnell commented Jan 20, 2018

aduth commented Jan 20, 2018

pento commented Jan 21, 2018

aduth commented Jan 25, 2018

aduth commented Jan 25, 2018

dmsnell commented Jan 25, 2018

aduth commented Jan 30, 2018

aduth commented Jan 30, 2018

phpbits commented Feb 1, 2018

aduth commented Feb 1, 2018

phpbits commented Feb 1, 2018

aduth commented Feb 1, 2018

phpbits commented Feb 1, 2018

phpbits commented Feb 6, 2018 • edited Loading

aduth commented Feb 6, 2018

phpbits commented Feb 7, 2018

aduth commented Feb 8, 2018

phpbits commented Feb 8, 2018

gschoppe commented Apr 6, 2018 • edited Loading

aduth commented Apr 8, 2018

aduth commented Jan 19, 2018 •

edited by mtias

Loading

dmsnell commented Jan 19, 2018 •

edited

Loading

phpbits commented Feb 6, 2018 •

edited

Loading

gschoppe commented Apr 6, 2018 •

edited

Loading