Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Framework: Skip server grammar parse, seek and replace dynamic blocks #4591

Merged
merged 1 commit into from
Jan 30, 2018

Conversation

aduth
Copy link
Member

@aduth aduth commented Jan 19, 2018

Blocker for #3745
Related: #3545
Possible resolution for #3799 ?

This pull request seeks to refactor Gutenberg's the_content filtering of dynamic blocks to bypass the grammar parser altogether and explicitly seek and replace dynamic block fragments via regular expressions.

The reasons for this change include:

  • Parsing all of content into a data shape during the front-end display of a post is heavy-handed when the only operation necessary is to replace dynamic blocks.
    • It is quite likely for the majority of users' posts that there are no dynamic blocks at all, and this is wasted effort.
  • It supports nested block content, which is non-supportable with the current implementation because static blocks render with innerHTML, which does not include the parsed inner blocks (see also Parsing: Replace server serialization by parsed outerHTML #3545).

The downsides of this change include:

  • Relying on regular expressions to match block fragments, including the attributes JSON string which is passed through json_decode.
    • The latter note is not a requirement, and it's possible that we could still leverage the grammar parser to extract the attributes array.
    • Related to this is the maintenance cost of keeping syntax identification in sync.
  • We no longer strip the comment demarcations for static blocks
    • They are stripped for dynamic blocks, though not necessarily so

To Do: (Non-blocking)

  • Run a performance benchmark comparison, as I suspect the changes proposed here perform significantly better than the grammar-based parser.

Testing instructions:

Verify PHPUnit tests pass:

phpunit

Try manually creating a post (via Text mode or Classic Editor) including both dynamic blocks and blocks with nested content, ensuring that when viewing source on the front-end, both the dynamic block rendered content and nested content is rendered as expected:

<!-- wp:latest-posts /-->

<!-- wp:columns -->
<div class="wp-block-columns has-2-columns">
    <!-- wp:paragraph {"layout":"column-1"} -->
    <p class="layout-column-1">col 1</p>
    <!-- /wp:paragraph -->

    <!-- wp:paragraph {"layout":"column-2"} -->
    <p class="layout-column-2">col 2</p>
    <!-- /wp:paragraph -->
</div>
<!-- /wp:columns -->

<!-- wp:paragraph -->
<p>end text</p>
<!-- /wp:paragraph -->

@aduth aduth added Framework Issues related to broader framework topics, especially as it relates to javascript [Type] Performance Related to performance efforts labels Jan 19, 2018
@aduth aduth requested a review from dmsnell January 19, 2018 02:31
@jorgefilipecosta
Copy link
Member

Another potential downside of this change is that it collides with the objective proposed by @mtias of not saving some inline styles in the post content and render them based on the attributes. If the non-dynamic blocks are not parsed we can not generate the styles for them dynamically in the server.

lib/blocks.php Outdated
*/
function gutenberg_render_block( $block ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep both methods — with the full tree-parsing being only an option. (Say a REST endpoint that wants to serve the tree as JSON, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep both methods — with the full tree-parsing being only an option. (Say a REST endpoint that wants to serve the tree as JSON, etc.)

The parsing class is left untouched, and in fact the gutenberg_parse_blocks helper function itself; this is what seemed to be the most value in what came to mind as needing a JSON tree of blocks. Do you think this specific function is still valuable, to generate the string output of a block object? All of the underlying tooling is still largely available, it just becomes:

$html = $block_type->is_dynamic() ? $block_type->render( $block['attrs'] ) : $block['innerHTML'];

@aduth
Copy link
Member Author

aduth commented Jan 19, 2018

Another potential downside of this change is that it collides with the objective proposed by @mtias of not saving some inline styles in the post content and render them based on the attributes. If the non-dynamic blocks are not parsed we can not generate the styles for them dynamically in the server.

As far as I can discern, the inline styles are themselves static content, so is it necessary or desirable for this to happen dynamically at the server (vs. in the client before save)? Or at least that this must occur at display time (vs. at save time, where a full parse might be more reasonable)?

@mtias
Copy link
Member

mtias commented Jan 19, 2018

Chatting with @jorgefilipecosta we are going to revise the color-styles approach in any case, and it's actually a good constrain to make it statically viable.

@dmsnell
Copy link
Member

dmsnell commented Jan 19, 2018

Let me know how I can be helpful here.

I'm not sure entirely what the goal of this is in terms of technically what it's doing to
the post_content but from looking at the RegExp machine I'm guessing it may still
be worth considering a PEG-based approach just because it would probably be much
clearer. That's not to say that we need to perform a full parse, but we could potentially
create another parser just to do specific actions.

One of the challenges here is that when we are talking about nesting we're manually
recreating the stack-context that the parser does for us in a very reliable and established
way. Thankfully though we have one thing working in our favor: the block boundaries
are easy to find and we cannot find in a legitimate post another <!-- wp: before we
have a --> and so nesting is somewhat easier to track than if we used, for example,
parentheses.

@dmsnell
Copy link
Member

dmsnell commented Jan 19, 2018

@aduth after some talks with @mtias in Slack I wanted to post some doodles/idea I had where we can reuse the PEG parser generator to do some of these kinds of tasks without carrying over all the heft of the full grammar parser. The PEGs are still nice because they do have favorable performance characteristics and handle nesting in reliable and robust ways.

Check out the following fake grammar and document for an illustration of what I'm talking about
https://dmsnell.github.io/peg-parser-explorer/

Grammar

{
  const dynamicTypes = [ 'dynamic' ];

  function isDynamic( name ) {
    return dynamicTypes.indexOf( name ) !== -1
  }
}

Output
  = bs:Block *
  { return bs.join( '' ) }

Block
  = "<!-- wp:" name:$( !" ". ) + " -->" $( !"<!--". ) + "<!-- /wp:" $( !" ". ) + " -->"
  { return isDynamic( name.trim() ) ? "blargynamic" : text() }

Document

<!-- wp:core/paragraph -->
Just some text
<!-- /wp:core/paragraph --><!-- wp:dynamic -->
blarg
<!-- /wp:dynamic --><!-- wp:core/paragraph -->
More text
<!-- /wp:core/paragraph -->

So while the doodles are naïve and incomplete I think they show how we can write make something far more performant while borrowing some of the same rules and matchers from the full grammar. Note that the top function block can actually be PHP code instead of JavaScript but for the interactive grammar explorer I did JS.

@pento
Copy link
Member

pento commented Jan 20, 2018

I like the idea of only parsing dynamic blocks, parsing static blocks that aren't going to be touched is probably a waste of time.

That said, there's definitely going to be requests for an ability to filter each block before output. For example, #2871 relies on that existing. While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

The "light parser" idea sounds promising. Split the post into the raw blocks, then loop over each block, fully parsing dynamic blocks, but only parsing static blocks if something is actually hooked into the relevant filter. This would also help with removing the comment demarcations, because outputting them to the frontend will not be a popular change. 🙂

@dmsnell
Copy link
Member

dmsnell commented Jan 20, 2018

While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

@pento I'm not quite grasping what you are communicating here. Could you help me understand/reword this?

@aduth
Copy link
Member Author

aduth commented Jan 20, 2018

That said, there's definitely going to be requests for an ability to filter each block before output. For example, #2871 relies on that existing. While that PR could be rewritten to still do its thing, it'd probably need some changes to Gutenberg core, which won't necessarily be an option for plugin devs.

While there was already some conversation there about it, a table of contents doesn't seem like something that necessarily needs dynamic front-end rendering to implement.

@pento
Copy link
Member

pento commented Jan 21, 2018

I'm not quite grasping what you are communicating here. Could you help me understand/reword this?

Sorry, that was lazy writing on my part. 🙂

So, #2871 does two things that require iterating over static blocks:

  • It adds an id attribute to the <h.> tags in heading blocks.
  • It grabs the text content of heading blocks, to use in the table of contents block.

In order to implement these without server side rendering, we'd need to:

To add the id attribute, the easiest way would probably be to enforce the anchor attribute to always be true in heading blocks. We could kind of hack this in by adding a late filter on blocks.getSaveContent.extraProps, to override the core/anchor/save-props/addSaveProps filter, but that gives a bad UX, because the "HTML Anchor" InspectorControl selected state wouldn't necessarily match what's changed, with no explanation. The better option for this route would be to modify the core/anchor plugin recognise when a ToC block exists, and provide appropriate information to the end user.

To grab the text content for heading blocks, we'd need to register a new selector in the editor store, to expose that data.

A third feature, which hadn't been implemented yet in #2871, would add meta-links to the heading element: one for linking to itself (for easy copy/paste of the link), and one to link back to the ToC. I don't think there's a filter that allows modifying static block content, yet.

As @aduth mentioned, there's an argument for implementing the ToC block in JS. But to get back to my original point, such an implementation requires a bunch of modifications to Gutenberg's core functionality, while implementing it server-side requires adding a filter to the PHP block loop. While I can easily make the required changes, they're specific changes for a particular case, that option isn't available to the vast majority of plugin developers. Having a more generic filter that runs over every block at render-time, however, is a simple solution that makes cool and complex things possible.

It's also worth considering future behaviour, here—when every part of the entire page render is a block, I think that having a standard render-time filter to interact with each block in the page would be super useful.

@aduth
Copy link
Member Author

aduth commented Jan 25, 2018

@dmsnell

it may still be worth considering a PEG-based approach just because it would probably be much
clearer. That's not to say that we need to perform a full parse, but we could potentially create another parser just to do specific actions.

Yeah, I could imagine something like this as well. The challenge I saw, which is still unclear to me after your demonstration, is how (at least given the current tooling) we would feed information about known dynamic block names into the parser at runtime.

The PEGs are still nice because they do have favorable performance characteristics

Aside and of my own curiosity: More favorable than RegExp?

@aduth
Copy link
Member Author

aduth commented Jan 25, 2018

Might be worth noting that there's nothing that had been applied by the_content filter we used previously which could not still be achieved by a plugin's own filter after the changes proposed here. Which is to say, the full parser still exists, and a plugin can still parse, extract, and manipulate individual blocks from their own the_content filter handlers.

@dmsnell
Copy link
Member

dmsnell commented Jan 25, 2018

how we would feed information about known dynamic block names into the parser at runtime.

This would occur through globals for better or worse. I think in the PHP side it's actually not as bad since that's more normative for PHP modules. My random idea was to have a function that the PEG parser calls to handle the blocks.

More favorable than RegExp?

The nice things about PEGs are that they run in linear time with respect to the length of the document being parsed (well, when combined with a Packrat memoizing parser as pegjs pulls in). RegExps are notoriously difficult to reason about for performance because they run in time corresponding to the product of the complexity of the RegExp and the document. That is how we get catastrophic backtracking and other RegExp DOS attacks.

In my personal experience mostly here on Gutenberg too the biggest performance issues with our grammar are mostly obvious. Our memory issue is probably related to the fact that we build a huge list of characters when parsing freeform HTML and then we join that list at the end. Other spots could be potentially improved by sharing structure among related rules (sharing <!-- among block boundaries, for example). The RegExp for me just normally ends up being an opaque wall I have a hard time understanding or confidently modifying.

@mtias mtias added this to the Feature Complete milestone Jan 26, 2018
@aduth aduth force-pushed the update/server-do-blocks-parse branch from 740f274 to e39030c Compare January 30, 2018 19:18
@aduth aduth force-pushed the update/server-do-blocks-parse branch from e39030c to 969b7c9 Compare January 30, 2018 21:20
@aduth
Copy link
Member Author

aduth commented Jan 30, 2018

As this pull request is a blocker for #3745 and all concerns have been either addressed or are subject to future backwards-compatible refactoring, I'm inclined to move forward with the changes proposed here.

Re: Grammars: I agree we should explore creating a smaller grammar which is a subset of the current functionality targeting segmenting on dynamic blocks alone. This is only beneficial if it proves to be more performant or if the performance difference is negligible considering the maintainability benefits of keeping the grammars more closely in sync, or simply avoiding RegExps.

Re: Exposing full parse to plugin authors: As noted above, this is still available, though no longer occurring by default with front-end content rendering. As we should want to keep front-end rendering performant in the default case, I think this is a fair compromise.

@aduth
Copy link
Member Author

aduth commented Jan 30, 2018

Alternative approaches we might consider, if we really want to keep the current heavy parse for front-end rendering, are:

  • The parser innerHTML should include nested block content
  • or, the innerHTML should at least include a marker for where nested content is to be inserted, so the marker can be replaced with innerHTML of its inner blocks

@aduth aduth merged commit 49f05e5 into master Jan 30, 2018
@aduth aduth deleted the update/server-do-blocks-parse branch January 30, 2018 21:27
@phpbits
Copy link
Contributor

phpbits commented Feb 1, 2018

@aduth Could you add filter before adding each blocks? Based on the previous do_block code, something like this :

$content_after_blocks .= apply_filters( 'do_content_block', gutenberg_render_block( $block ), $block );

I hope this is possible for us developers to manage each blocks easily rather that overriding the_content. Thanks!

@aduth
Copy link
Member Author

aduth commented Feb 1, 2018

@phpbits As implemented here, we're identifying only dynamic blocks when processing content:

gutenberg/lib/blocks.php

Lines 129 to 142 in a5ca791

$dynamic_block_names = get_dynamic_block_names();
$dynamic_block_pattern = (
'/<!--\s+wp:(' .
str_replace( '/', '\/', // Escape namespace, not handled by preg_quote.
str_replace( 'core/', '(?:core/)?', // Allow implicit core namespace, but don't capture.
implode( '|', // Join block names into capture group alternation.
array_map( 'preg_quote', // Escape block name for regular expression.
$dynamic_block_names
)
)
)
) .
')(\s+(\{.*?\}))?\s+(\/)?-->/'
);

It's also important to note that this logic is flawed, specifically on failure to take balanced tags into account. This is a compromise we're okay with for dynamic blocks, assuming that dynamic blocks won't include nested content (or at least not nest itself).

It would fall apart on specific static blocks though:

<!-- wp:columns -->
<div class="wp-block-columns">
	<!-- wp:paragraph -->
	<p>Hello</p>
	<!-- wp:paragraph -->

	<!-- wp:columns -->
	<div class="wp-block-columns"></div>
	<!-- /wp:columns -->
</div>
<!-- /wp:columns -->

In the above case, it would treat the inner block's closing demarcation as the terminator for the wrapper columns block.

This is where a proper parser / grammar would be better suited which, while probably a desirable enhancement, is not currently implemented.

Which is all to say: It could be possible, yes, but not with the current logic.

@phpbits
Copy link
Contributor

phpbits commented Feb 1, 2018

@aduth Thanks! Are there any particular reasons why you do not want to add filters? Something that developers can use so that we do not need to create another parser? Like on this code :

$rendered_content .= $block_type->render( $attributes );

Would be really helpful aside from overriding the_content. I've been integrating my plugin and in need to define the visibility of the content based on the custom attributes I've added. You can check the preview here : https://widget-options.com/block-options-widget-options-gutenberg-editor/ . Right now I'm removing the do_blocks on the filters and adding it again with modifications. Would be really helpful if you can provide hooks and filters on that function. Thank you very much!

@aduth
Copy link
Member Author

aduth commented Feb 1, 2018

Are there any particular reasons why you do not want to add filters?

To be clear, I've never implied this.

@phpbits
Copy link
Contributor

phpbits commented Feb 1, 2018

@aduth My apology didn't mean to imply that you said that. Just asking if there are any particular reasons since I've asked if you can add filter on my previous reply. Let me know if it's doable. Thanks!

@phpbits
Copy link
Contributor

phpbits commented Feb 6, 2018

@aduth i'm using the build on this repo now and not the release on .org repository. Seems like the WP_Block_Type_Registry::get_instance()->get_all_registered(); doesn't return all blocks like paragraph, headings etc. Do you mind letting me know how can I make your preg_match to get all blocks?

gutenberg/lib/blocks.php

Lines 129 to 142 in a5ca791

$dynamic_block_names = get_dynamic_block_names();
$dynamic_block_pattern = (
'/<!--\s+wp:(' .
str_replace( '/', '\/', // Escape namespace, not handled by preg_quote.
str_replace( 'core/', '(?:core/)?', // Allow implicit core namespace, but don't capture.
implode( '|', // Join block names into capture group alternation.
array_map( 'preg_quote', // Escape block name for regular expression.
$dynamic_block_names
)
)
)
) .
')(\s+(\{.*?\}))?\s+(\/)?-->/'
);

Thank you very much!

@aduth
Copy link
Member Author

aduth commented Feb 6, 2018

At the moment core static blocks are not known to the server, but this is planned to change and tracked at #2751.

@phpbits
Copy link
Contributor

phpbits commented Feb 7, 2018

@aduth That's perfect! So WP_Block_Type_Registry::get_instance()->get_all_registered(); will be returning all block types including the core defaults like heading etc..? Will this be included on the next update? Thanks!

@aduth
Copy link
Member Author

aduth commented Feb 8, 2018

I don't expect it will be included in the next release, but it's on my list for the near future.

@phpbits
Copy link
Contributor

phpbits commented Feb 8, 2018

@aduth Thanks! I've just copy the do_blocks and change the pattern and it's working perfectly now :) I just want to test it with the latest development updates like nested blocks but the build is missing the edit-post/ style.css : #4891 (comment) . Would you mind taking a look on this one too? Thank you very much for your help. Cheers!

@gschoppe
Copy link

gschoppe commented Apr 6, 2018

This is a compromise we're okay with for dynamic blocks, assuming that dynamic blocks won't include nested content (or at least not nest itself)
@aduth

This choice limits a lot of options with Gutenberg. There are many features that would require the capability to self-nest dynamic blocks.

For example, imagine a block that allows the user to display different arbitrary block content on various days and times:

<!-- wp:gjs/timerelease -->
	<!-- wp:paragraph {"gjs/timerelease-interval":"day","gjs/timerelease-index":"3"} -->
	<p class="layout-column-1">Happy Thursday!</p>
	<!-- /wp:paragraph -->
	<!-- wp:gjs/timerelease {"gjs/timerelease-interval":"day","gjs/timerelease-index":"4"} -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"0-12"} -->
		<p class="layout-column-1">Happy Friday Morning!</p>
		<!-- /wp:paragraph -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"12-16"} -->
		<p class="layout-column-1">Happy Friday Afternoon!</p>
		<!-- /wp:paragraph -->
		<!-- wp:paragraph {"gjs/timerelease-interval":"hour","gjs/timerelease-index":"16-24"} -->
		<p class="layout-column-1">Happy Friday Evening!</p>
		<!-- /wp:paragraph -->
	<!-- /wp:gjs/timerelease -->
<!-- /wp:gjs/timerelease -->

This could be solved without self-nesting, by giving the blocks very complicated logic, but it would be much easier for users to understand if each level of nesting had a specific context applied.

A very similar situation could exist for a "restricted content" block, where you might want to filter by whether users are logged in or not, then filter by whether they have specific user levels, then filter by exclusive inclusions or exclusions. Once again, the logic could be flattened into a single extremely complicated block, but offering nested binary options would make the interface much more intuitive for users.

There are many more cases that I can think of that require content parsing but not necessarily self-nesting.

I believe arbitrarily limiting dynamic blocks to make them non-nesting, or to make them block self-nesting due to the performance of a specific implementation of block parsing isn't a good enough reason to remove features provided by the formal grammar.

@aduth
Copy link
Member Author

aduth commented Apr 8, 2018

Hi @gschoppe . I should clarify that I'm not overly attached to a RegExp based implementation. The objective here was primarily to unblock progress of nested blocks, where the previous full parse behavior was insufficient in being able to reconstruct the contents of a post, particularly in the re-serialization of a block's own inner blocks. A few alternative ideas were discussed here, covered in #4591 (comment) (also discussed in Slack). Ultimately and to your concerns, something like the limited / subset grammar previously discussed could make for a good iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework Issues related to broader framework topics, especially as it relates to javascript [Type] Performance Related to performance efforts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants