Plugin: Adding a first version of block parser using a `pegjs` grammar #301

youknowriad · 2017-03-21T12:19:54Z

in this PR, I'm adding a v1 for a block parser using a pegjs grammar.

Parsing a block like this:

<-- wp:myBlockType myAttr:value -->
<p>HTML content</p>
<-- /wp:myBlockType -->

returns the following output

[
  { 
    blockType: 'myBlockType',
    attrs: { myAttr: 'value' },
    rawContent: '<p>HTML content</p>'
  }
]

This approach means we need another step for parsing. This step would be part of the block API to extract attributes from the rawContent depending on the blockType.

aduth · 2017-03-21T15:45:40Z

I know this is pretty blocking, but there's some assumptions in the grammar that will change perhaps significantly depending on the API we arrive at for blocks (#104).

For example:

This approach means we need another step for parsing.

Ideally not, whether this is through treating state values explicitly (including text) or an API from the block's own rendering function to "retrieve" values.

youknowriad · 2017-03-21T15:51:08Z

an API from the block's own rendering function to "retrieve" values.

this is still another parsing step, because a block needs to define the Minimum required attributes to retrieve from the content (could be just a simple array), and fallback to html if not present.

My idea was to get this and #289 merged. That way, we can try the "retrieve" values approach.

aduth · 2017-03-21T15:52:31Z

because a block needs to define the Minimum required attributes to retrieve from the content (could be just a simple array)

Does it? Why can't we infer an empty/unset value at the time of rendering?

youknowriad · 2017-03-21T15:55:03Z

Does it? Why can't we infer an empty/unset value at the time of rendering?

I was assuming some blocks have mandatory attributes. Maybe not after reflexion. But you agree with me that this step (the grammar) is required (or replaced by a dom parser) in either of these approaches

aduth · 2017-03-21T15:57:59Z

Yeah, I'd be inclined to simplify the grammar even further from the start to avoid even concepts of attributes in comments. Not sure the effort involved to do this, but if we can at least try to avoid relying on those assumptions too much in implementation, this could be a good base to start with 👍

nylen · 2017-03-21T20:27:40Z

modules/editor/grammar/post.pegjs

+  } }
+
+WP_Block_End
+  = "<!--" __ "/wp" __ "-->"


I am not sure how to do it but this needs to include the block name.

I'm ok making this change if we reach a consensus. I feel that it hasn't been discussed yet.

I'm personally neutral, I don't think it adds much value, (I see the HTML as a serialisation that should not be updated frequently by hand, it's our job to make the best possible editor to avoid having to use the HTML editor) but I understand that some might find it more readable.

@aduth @mtias thoughts

"needs" is a strong word 😄

Honestly I care very little one way or the other.

I see this as necessary for several reasons:

A fallback so that we can reliably detect more failure conditions. What if someone edits the HTML(+ comment delimiters) and accidentally transposes part of a block into another block? We need to be able to throw a parse error here, and if our ending delimiters are just generic /wp, we can't really do that.

For complicated posts, WP post content is very much "tag soup" already, and eventually as people start using the block-based editor these comment delimiters will become the same way. Especially with very long blocks, it needs to be clear both to humans and computers reading the post content not just that a block is ending, but which block is ending along with the beginning of a new block.

If we ever want to be able to support nested blocks this becomes even more of a necessity. (Conversely, if we never support nested blocks, this doesn't hurt anything.)

As usual I am trying to argue in favor of the most robust approach possible from the very start, because we're building a product which will need to support a lot of complicated, varied, and unexpected use cases. I think that if we don't do this now, we will wish we had, but it'll be too late to change it.

Playing a little bit of devil's advocate: How is this any different than how the JavaScript runtime is smart enough to interpret the closing curly brace } of an if condition. Other languages might make this more explicit e.g. fi, endif but certainly doesn't seem indicative of a poorly constructed syntax.

It doesn't "need" to be there but definitely should be. It will make it easy for human eyeballs rather than computer eyeballs. Imagine trying to read html or xml without a matching closing tag, a parser could technically understand it but it would be a lot less friendly to work with.

I think the points nylen brought up are all very compelling as well.

the closing curly brace } of an if condition

You're right, this is a problem. We should consider writing the plugin code in VBScript to alleviate it.

Trolling aside, I can think of a couple of relevant differences between C-style languages and HTML:

HTML is designed to be deeply nested, whereas in code this is generally considered an antipattern, and readability of nesting is managed by strict indentation rules.

HTML has far more "block types" than JavaScript, and we are introducing the capability to define an arbitrary number of other tags in addition to the standard ones.

I also think of what we're building here as an extension of HTML. Therefore it makes sense to me to have fairly similar tag semantics to our "parent language".

One of the purposes of these is to remain as out of the way as possible. A few people have asked me to not even have closing tags ;)

That said, I'm ok with either way now, and I'd expect us to refine the approach as we test with complex, long, and broken posts. The key is to balance readability of the HTML with the presence of structural hints.

youknowriad · 2017-03-22T08:24:43Z

In d427daf I updated the PR to add the block type to the closing comment. I also created another "example post" to avoid breaking the prototypes that rely on the old format.

nylen · 2017-03-22T13:01:16Z

A couple of questions about this, which may be easily answered due to my lack of knowledge of pegjs:

How do we expect plugins to extend this grammar with their own block structure? Will this parsing layer include the HTML parsing needed to extract information about block content?
Does this functionality need tests? What is the best way to achieve this?

youknowriad · 2017-03-22T13:16:54Z

How do we expect plugins to extend this grammar with their own block structure? Will this parsing layer include the HTML parsing needed to extract information about block content?

This is the second part of the parse I'm talking about above. There are different approaches.

I'm leaning to add a parse function each block API needs to implement to extract the state from the HTML. The grammar can not be extended per block. (See per-block prototype for an example of this)
@aduth and other are proposing to have all the state data stored elsewhere (attributes or post-meta), so no need to parse the HTML which becomes only an output. (serialization of the state)

Does this functionality need tests? What is the best way to achieve this?

We can add unit tests ( parse(string) => JS and check the JS) but I'm not sure this is relevant since the code is generated.

Closing comment updated

aduth · 2017-03-23T02:14:02Z

The manual revisions to allow importing the CommonJS PEG output was bothering me since it seemed a pain to maintain. Tracked this back to #4039 which doesn't have much of a workaround.

Instead, I found a Webpack loader for PEG.js which seems much more convenient because it avoids a separate build for the grammar (import directly from grammar). Only complication is that it was skipping the Babelification step, so I had to update the grammar a bit to downgrade ES6 to ES5. There might be another way to fix this, but it was a reasonable compromise I thought.

In any case, I'm feeling good with this as a step forward. Feel free to merge in your morning 👍

aduth · 2017-03-23T03:14:32Z

Oh, lame, it appears the changes in a5c83e8 broke the build for the exact reason we wanted Webpack involved in Mocha tests in #289, namely since Mocha alone is incapable of understanding loaders.

Unless we can get that working, maybe best to revert back to the original approach for now.

youknowriad · 2017-03-23T08:11:06Z

@aduth What if we avoid loading this file on the tests, by moving the "registration functions" into a separate file and leave all these "webpack" specific loaders to the index.js

youknowriad · 2017-03-23T08:18:13Z

Tests fixed, merging this for now. we can revisit later of we discover any limit to this parsing approach

youknowriad added the Framework Issues related to broader framework topics, especially as it relates to javascript label Mar 21, 2017

youknowriad self-assigned this Mar 21, 2017

youknowriad requested review from mtias and aduth March 21, 2017 12:19

youknowriad force-pushed the add/grammar-parsing branch from e17ad5e to dad8f83 Compare March 21, 2017 12:20

nylen previously requested changes Mar 21, 2017

View reviewed changes

youknowriad force-pushed the add/grammar-parsing branch from 479d175 to ccc6a7b Compare March 22, 2017 14:39

aduth mentioned this pull request Mar 22, 2017

Block discovery via DOM parser #306

Closed

youknowriad and others added 6 commits March 23, 2017 09:15

Plugin: Adding a first version of block parser using a pegjs grammar

f2b46d6

Changes per review: closing comment

aad8eb9

fix lint rules

2f104e3

Moving the grammar to blocks

06cdb5f

Use pegjs-loader for importing post grammar

cee68aa

Fix unit tests

7e045d4

youknowriad force-pushed the add/grammar-parsing branch from 9aabad0 to 7e045d4 Compare March 23, 2017 08:16

youknowriad merged commit c6995cd into master Mar 23, 2017

youknowriad deleted the add/grammar-parsing branch March 23, 2017 08:18

aduth mentioned this pull request Mar 23, 2017

Run Mocha on Webpack-generated bundle #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin: Adding a first version of block parser using a `pegjs` grammar #301

Plugin: Adding a first version of block parser using a `pegjs` grammar #301

youknowriad commented Mar 21, 2017 •

edited

Loading

aduth commented Mar 21, 2017

youknowriad commented Mar 21, 2017 •

edited

Loading

aduth commented Mar 21, 2017

youknowriad commented Mar 21, 2017

aduth commented Mar 21, 2017

nylen Mar 21, 2017

youknowriad Mar 21, 2017

aduth Mar 21, 2017

nylen Mar 21, 2017 •

edited

Loading

aduth Mar 21, 2017

BE-Webdesign Mar 22, 2017 •

edited

Loading

nylen Mar 22, 2017

mtias Mar 22, 2017

youknowriad commented Mar 22, 2017

nylen commented Mar 22, 2017

youknowriad commented Mar 22, 2017

aduth commented Mar 23, 2017

aduth commented Mar 23, 2017

youknowriad commented Mar 23, 2017

youknowriad commented Mar 23, 2017

Plugin: Adding a first version of block parser using a pegjs grammar #301

Plugin: Adding a first version of block parser using a pegjs grammar #301

Conversation

youknowriad commented Mar 21, 2017 • edited Loading

aduth commented Mar 21, 2017

youknowriad commented Mar 21, 2017 • edited Loading

aduth commented Mar 21, 2017

youknowriad commented Mar 21, 2017

aduth commented Mar 21, 2017

nylen Mar 21, 2017

Choose a reason for hiding this comment

youknowriad Mar 21, 2017

Choose a reason for hiding this comment

aduth Mar 21, 2017

Choose a reason for hiding this comment

nylen Mar 21, 2017 • edited Loading

Choose a reason for hiding this comment

aduth Mar 21, 2017

Choose a reason for hiding this comment

BE-Webdesign Mar 22, 2017 • edited Loading

Choose a reason for hiding this comment

nylen Mar 22, 2017

Choose a reason for hiding this comment

mtias Mar 22, 2017

Choose a reason for hiding this comment

youknowriad commented Mar 22, 2017

nylen commented Mar 22, 2017

youknowriad commented Mar 22, 2017

aduth commented Mar 23, 2017

aduth commented Mar 23, 2017

youknowriad commented Mar 23, 2017

youknowriad commented Mar 23, 2017

Plugin: Adding a first version of block parser using a `pegjs` grammar #301

Plugin: Adding a first version of block parser using a `pegjs` grammar #301

youknowriad commented Mar 21, 2017 •

edited

Loading

youknowriad commented Mar 21, 2017 •

edited

Loading

nylen Mar 21, 2017 •

edited

Loading

BE-Webdesign Mar 22, 2017 •

edited

Loading