-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugin: Adding a first version of block parser using a pegjs
grammar
#301
Conversation
e17ad5e
to
dad8f83
Compare
I know this is pretty blocking, but there's some assumptions in the grammar that will change perhaps significantly depending on the API we arrive at for blocks (#104). For example:
Ideally not, whether this is through treating state values explicitly (including text) or an API from the block's own rendering function to "retrieve" values. |
this is still another parsing step, because a block needs to define the Minimum required attributes to retrieve from the content (could be just a simple array), and fallback to My idea was to get this and #289 merged. That way, we can try the "retrieve" values approach. |
Does it? Why can't we infer an empty/unset value at the time of rendering? |
I was assuming some blocks have mandatory attributes. Maybe not after reflexion. But you agree with me that this step (the grammar) is required (or replaced by a dom parser) in either of these approaches |
Yeah, I'd be inclined to simplify the grammar even further from the start to avoid even concepts of attributes in comments. Not sure the effort involved to do this, but if we can at least try to avoid relying on those assumptions too much in implementation, this could be a good base to start with 👍 |
modules/editor/grammar/post.pegjs
Outdated
} } | ||
|
||
WP_Block_End | ||
= "<!--" __ "/wp" __ "-->" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how to do it but this needs to include the block name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok making this change if we reach a consensus. I feel that it hasn't been discussed yet.
I'm personally neutral, I don't think it adds much value, (I see the HTML as a serialisation that should not be updated frequently by hand, it's our job to make the best possible editor to avoid having to use the HTML editor) but I understand that some might find it more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"needs" is a strong word 😄
Honestly I care very little one way or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this as necessary for several reasons:
- A fallback so that we can reliably detect more failure conditions. What if someone edits the HTML(+ comment delimiters) and accidentally transposes part of a block into another block? We need to be able to throw a parse error here, and if our ending delimiters are just generic
/wp
, we can't really do that. - For complicated posts, WP post content is very much "tag soup" already, and eventually as people start using the block-based editor these comment delimiters will become the same way. Especially with very long blocks, it needs to be clear both to humans and computers reading the post content not just that a block is ending, but which block is ending along with the beginning of a new block.
- If we ever want to be able to support nested blocks this becomes even more of a necessity. (Conversely, if we never support nested blocks, this doesn't hurt anything.)
As usual I am trying to argue in favor of the most robust approach possible from the very start, because we're building a product which will need to support a lot of complicated, varied, and unexpected use cases. I think that if we don't do this now, we will wish we had, but it'll be too late to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing a little bit of devil's advocate: How is this any different than how the JavaScript runtime is smart enough to interpret the closing curly brace }
of an if
condition. Other languages might make this more explicit e.g. fi
, endif
but certainly doesn't seem indicative of a poorly constructed syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't "need" to be there but definitely should be. It will make it easy for human eyeballs rather than computer eyeballs. Imagine trying to read html or xml without a matching closing tag, a parser could technically understand it but it would be a lot less friendly to work with.
I think the points nylen brought up are all very compelling as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the closing curly brace
}
of anif
condition
You're right, this is a problem. We should consider writing the plugin code in VBScript to alleviate it.
Trolling aside, I can think of a couple of relevant differences between C-style languages and HTML:
- HTML is designed to be deeply nested, whereas in code this is generally considered an antipattern, and readability of nesting is managed by strict indentation rules.
- HTML has far more "block types" than JavaScript, and we are introducing the capability to define an arbitrary number of other tags in addition to the standard ones.
I also think of what we're building here as an extension of HTML. Therefore it makes sense to me to have fairly similar tag semantics to our "parent language".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the purposes of these is to remain as out of the way as possible. A few people have asked me to not even have closing tags ;)
That said, I'm ok with either way now, and I'd expect us to refine the approach as we test with complex, long, and broken posts. The key is to balance readability of the HTML with the presence of structural hints.
In d427daf I updated the PR to add the block type to the closing comment. I also created another "example post" to avoid breaking the prototypes that rely on the old format. |
A couple of questions about this, which may be easily answered due to my lack of knowledge of pegjs:
|
This is the second part of the parse I'm talking about above. There are different approaches.
We can add unit tests ( |
479d175
to
ccc6a7b
Compare
The manual revisions to allow importing the CommonJS PEG output was bothering me since it seemed a pain to maintain. Tracked this back to #4039 which doesn't have much of a workaround. Instead, I found a Webpack loader for PEG.js which seems much more convenient because it avoids a separate build for the grammar (import directly from grammar). Only complication is that it was skipping the Babelification step, so I had to update the grammar a bit to downgrade ES6 to ES5. There might be another way to fix this, but it was a reasonable compromise I thought. In any case, I'm feeling good with this as a step forward. Feel free to merge in your morning 👍 |
Oh, lame, it appears the changes in a5c83e8 broke the build for the exact reason we wanted Webpack involved in Mocha tests in #289, namely since Mocha alone is incapable of understanding loaders. Unless we can get that working, maybe best to revert back to the original approach for now. |
@aduth What if we avoid loading this file on the tests, by moving the "registration functions" into a separate file and leave all these "webpack" specific loaders to the |
9aabad0
to
7e045d4
Compare
Tests fixed, merging this for now. we can revisit later of we discover any limit to this parsing approach |
in this PR, I'm adding a v1 for a block parser using a
pegjs
grammar.Parsing a block like this:
returns the following output
This approach means we need another step for parsing. This step would be part of the block API to extract attributes from the rawContent depending on the blockType.