Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PHP parser based on PEG.js grammar #1152

Merged
merged 18 commits into from
Jun 30, 2017
Merged

Add PHP parser based on PEG.js grammar #1152

merged 18 commits into from
Jun 30, 2017

Conversation

nylen
Copy link
Member

@nylen nylen commented Jun 13, 2017

There is a lot left to be done here, but this PR takes steps towards unifying our parsing logic across JS and PHP by using the same PEG grammar in both places.

There is a library for this that we can use: php-pegjs. It's a PEG.js plugin that generates a parser in PHP instead of JavaScript. However, there are lots of bits of JavaScript code interspersed in our PEG to make it work, and unfortunately, php-pegjs expects those to be written in PHP only.

So, this PR also points to my fork of php-pegjs which allows these code blocks to be specified in both PHP and JavaScript (see Nordth/php-pegjs#5).

Once finished, this will fix #1086 and hopefully also other issues such as #882.

Remaining items:

  • PHP 5.2 support: Remove namespaces in generated php-pegjs output; allow specifying a function prefix instead. Eliminate use of closures.
  • Actually use the generated parser in the plugin
  • Add PHPUnit tests for parser output using our existing HTML fixtures.
  • Require php-pegjs as a normal, stable dependency, by getting the needed changes merged into the main repo or a fork.

It would also be nice to upgrade php-pegjs to support newer versions of pegjs - while downgrading to pegjs 0.8.x causes the pegjs-loader peer dependency not to be met, in the actual code this will work fine.

@nylen nylen added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] In Progress Tracking issues with work in progress labels Jun 13, 2017
@dmsnell
Copy link
Member

dmsnell commented Jun 13, 2017

Thanks @nylen!

This is something I had also planned on looping back around to. I'm thinking it may be possible to end up with a grammar that has little-to-no JS or PHP.

That shouldn't stop this from moving forward, but I think that we can be carefully plan out what the parser should be parsing and ignoring and automatically return the right data structures. For example, I was thinking of ripping out the keyValue function anyway if I can ignore the part of the parse we don't want.

@westonruter
Copy link
Member

Awesome. I'm excited about this.

Another problem with the generated PHP parser in 5.2 is that it using closures.

.travis.yml Outdated
npm install || exit 1
npm run build || exit 1
# Syntax check for php-pegjs parser
node bin/create-php-parser.js || exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be added to the NPM build and dev scripts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not yet - instead, we need to make sure we run our existing parsing tests against the PHP parser as well.

{
/** <?php
return array(
'blockType' => $blockType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: With #974, we'll need to update naming here to blockName

@dmsnell
Copy link
Member

dmsnell commented Jun 14, 2017

I spoke with @westonruter today about this and I'm really wondering if this is the right path. I'm pretty sure that the generated parser for PHP will be very sub-par on performance and the library doesn't appear to have been touched in over two years.

@nylen what do you think about simply having a separate parser file for PHP. Copying this for now and replacing the JS bits with PHP would seem reasonable, but I think that in the end we might be better served by a hand-written parser in PHP that implements the PEG spec simply on account of the different PHP semantics.

@nylen
Copy link
Member Author

nylen commented Jun 20, 2017

@nylen what do you think about simply having a separate parser file for PHP

I missed this conversation, but I'm not a fan of having separate parsing logic, as it guarantees that this critical piece of logic will continue to diverge between client and server. See #1190 and #1213 for recent examples, and note that we still can't or don't test the server-side parser for these cases or to nearly the same extent as we do the client-side parser.

I'm pretty sure that the generated parser for PHP will be very sub-par on performance

What are your specific concerns here? PEG parsers are supposed to be quite performant.

and the library doesn't appear to have been touched in over two years

This is true; it seems likely that we will have to fork it for our needs.

@dmsnell
Copy link
Member

dmsnell commented Jun 20, 2017

What are your specific concerns here? PEG parsers are supposed to be quite performant.

My concern is how the parser generator will generate the PHP and the overhead it brings with it. My fears should not be trusted until we know it's a problem. On the other hand, it's hard to establish a baseline. How many ms should it take to parse a document?

as it guarantees that this critical piece of logic will continue to diverge between client and server

I see your point though it's my hope that we will always end up with more than one. The spec parser which is clear to read and implementations that meet the spec but are faster. Already I've had some decisions to tradeoff readability against speed within the PEG

@nylen
Copy link
Member Author

nylen commented Jun 23, 2017

I've rebased and updated this PR to use phpegjs, my fork of php-pegjs. It has WIP support for PEG.js 0.10.0 (the latest version, which we have been using in this repository).

There were some significant changes required to support the new version of PEG.js, and there are still some bugs, as evidenced by the failing PHP parser tests (newly introduced in this PR).

However, this is good progress all the same, and I still think this task is doable.

@nylen
Copy link
Member Author

nylen commented Jun 27, 2017

One step closer: the generated PHP parser now has a bunch of tests based on our existing fixtures. They pass everywhere except PHP 5.2. 🎉

The logic here has also been rebased to support JSON-style attributes (#1213).

Still left to do: PHP 5.2 support; actually use the new parser for do_blocks server-side parsing and rendering.

@paulwilde
Copy link
Contributor

paulwilde commented Jun 28, 2017

Honest question - Why should Gutenberg support a 11 year old version of PHP? Surely by the time Gutenberg is merged into core (Late 2017/Early 2018) it's time to think about updating the minimum requirements for PHP.

Gutenberg is a nice forward step in making WordPress a modern piece of software, so why should the PHP side of things suffer?

If Gutenberg is going to break backwards-compatibility, surely it's a good time to also force people to move away from a highly outdated and insecure version of PHP.

@nylen
Copy link
Member Author

nylen commented Jun 28, 2017

Honest question - Why should Gutenberg support a 11 year old version of PHP?

It would be nice if we didn't have to support PHP 5.2. However, I think this decision should be made along with the rest of core, and there are ongoing discussions about this.

Quoting from recent discussion in the #core-php Slack channel:

There will be a summary posted from the summit discussion within the next week or so (notes are being collected from the note takers currently). The consensus was to do as much as we can to encourage upgrading to see if we can get the percentage of users on old PHP versions down, and revisit in a little while. #41191 and #40934 were the first steps. Would be great to get further feedback on that post once it happens.

As far as supporting PHP 5.2 in this PR, the work is already mostly done, via this commit.

@nylen nylen force-pushed the add/php-pegjs branch 5 times, most recently from 57389f4 to 83fb025 Compare June 29, 2017 17:11
@nylen
Copy link
Member Author

nylen commented Jun 29, 2017

This is ready for review/merge.

@nylen nylen removed the [Status] In Progress Tracking issues with work in progress label Jun 29, 2017
Add `php-pegjs` and downgrade `pegjs` version accordingly

Add script to generate PHP parser

Update PEG grammar with PHP and JS code

Run syntax check against php-pegjs parser

Exclude generated PHP parser from phpcs checks

Fix lint errors
nylen added 17 commits June 30, 2017 04:57
Or using the following commands (TODO - add a npm script for this?):

```
bin/create-php-parser.js
RUN_PARSER_TESTS=1 phpunit
```
Fix all the things (except PHP 5.2 support).
Otherwise PHPUnit 4.5.0 (used with PHP 5.6 Travis build) complains
about a suite with no tests in it.
Adds support for generating a parser compatible with PHP 5.2.
This will simplify the process of loading the plugin.  Otherwise we'd
have to add this step to `npm run dev` and a couple of other places.
This reverts commit 57389f49de1bce57bf97012df4d207bf2fffdea3.
@nylen
Copy link
Member Author

nylen commented Jun 30, 2017

Merging to get this and the breaking change layered on top of it (#1593) into a release as soon as possible.

@aduth
Copy link
Member

aduth commented Jul 3, 2017

These changes define pegjs as a duplicate dependency between dependencies and devDependencies, with a warning on npm install:

npm WARN The package pegjs is included as both a dev and production dependency.

@nylen
Copy link
Member Author

nylen commented Jul 5, 2017

These changes define pegjs as a duplicate dependency between dependencies and devDependencies, with a warning on npm install

Fixed in #1708.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

De-duplicate parsing logic across PHP and JS
5 participants