Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #220 : Properly parse expression lists passed to <?= #245

Merged
merged 1 commit into from
Jun 11, 2018

Conversation

TysonAndre
Copy link
Contributor

Also, treat the literal echo as a statement instead of an expression,
while preserving the generated ASTs.

  • Contrast with print expr, (PrintIntrinsicExpression),
    which is correctly treated as an expression.

Before, tolerant-php-parser parsed the tokens after <?= and <?php
the same way.
This PR adds TokenKind::ScriptSectionStartWithEchoTag and handles the
subsequent tokens differently (as an expressionList) if that token kind is seen instead of
ScriptSectionStartTag.

  • This reuses EchoExpression to make it easier for applications to reuse code handling print
    statements. The resulting expression will have no echoKeyword
    because <?= was part of the preceding InlineHTML Node.

    • In the majority of cases,
      the EchoExpression will be moved into the outer statement list.
    • Haven't yet investigated how this will affect php-language-server or other projects.

Virtually all of the AST trees were changed due to the
addition of a new property to InlineHTML.

  • I'm not familiar enough with the language spec to be sure that I
    will be able to catch and normalize everywhere that InlineHTML <?= can occur,
    so I gave up and made it permissible to be part of the final AST.

    The addition of the new property to CHILD_NODES can be undone if you're confident.
    (loops? other statements? (etc.).)

@roblourens
Copy link
Member

This looks good at first look, but I will take a closer look soon, merge it, and cut a new release.

@TysonAndre
Copy link
Contributor Author

I'm not familiar enough with the language spec to be sure that I
will be able to catch and normalize everywhere that InlineHTML <?= can occur,
so I gave up and made it permissible to be part of the final AST.

I think this can only be a top-level statement or placed within {} in the same places as an echo, so it should work even if it isn't allowed as part of the final AST (and not involve test updates).

E.g. if (cond) ?>echoed text<? will unconditionally echo "echoed text".

Are you fine with taking it back out of CHILD_NODES?

@roblourens
Copy link
Member

I think you're right about the places where this can occur. I'm ok with removing the new property from the AST.

But I'm not clear what this will look like exactly, will you just have parseInlineHtml return either an InlineHtml or an EchoExpression?

@jens1o
Copy link
Contributor

jens1o commented Jun 7, 2018

E.g. if (cond) ?>echoed text<? will unconditionally echo "echoed text".

It will only if cond is true. This has been made so you do not need to care about escaping everything.
Living example: https://github.com/WoltLab/WCF/blob/44c55faa3da6c041fe12bf307995dff2c239dc60/wcfsetup/test.php

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Jun 7, 2018

But I'm not clear what this will look like exactly, will you just have parseInlineHtml return either an InlineHtml or an EchoExpression?

I was planning to continue creating that property, but just to exclude it from CHILD_NODES.

Other approaches that could also be used:

  • Return a temporary subclass of the InlineHTML node, force the caller to create a new instance of the original class with both inline HTML and restore the original class later.
  • Return a pair with the inline HTML and echo expression

  • E.g. if (cond) ?>echoed text<?php will unconditionally echo "echoed text".
  • E.g. if (cond) { ?>echoed text<?php } with the braces will have the output depend on cond.

See https://3v4l.org/ELbqZ for an example

// FIXME the tolerant-php-parser has a bug, this code always echoes if executed.
// (i.e. same as `if (false); echo "hello world"` with an implicit semicolon)
// NOTE: If the inline HTML is surrounded by brackets, then this would never echo.
if (false)?>hello world<?php
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug wasn't introduced by my PR, so I consider it out of scope. I filed #246 for followup work.

  • The tree should change once the bug is fixed

See https://3v4l.org/ELbqZ

@mattacosta
Copy link

mattacosta commented Jun 8, 2018

I think this can only be a top-level statement or placed within {} in the same places as an echo, so it should work even if it isn't allowed as part of the final AST (and not involve test updates).

@TysonAndre I can confirm that T_INLINE_HTML and T_ECHO both occur in the same rule, so you are correct. Additionally, when in parsing mode, PHP will convert T_OPEN_TAG_WITH_ECHO tokens to T_ECHO. This means that whenever <?= is encountered, the parser should call the same function as it would if echo were encountered.


I also wanted to note that internally PHP treats T_INLINE_HTML tokens as echo statements with a single "argument". This technically means that InlineHtml nodes should not exist, but this can be left up to the parser.

@TysonAndre
Copy link
Contributor Author

This technically means that InlineHtml nodes should not exist.

That isn't what this change does. It's also completely normal for me to see InlineHtml (tolerant-php-parser is an accurate representation of the tokens in the PHP code)

What this change does:

Instead of representing the fragment ?>some text<?= 'expr or expr list' as InlineHtml followed by an expression, the parser now represents that fragment (in the final returned AST) as InlineHtml followed by an ExpressionStatement with an EchoExpression (containing a list of comma separated expressions)

@mattacosta
Copy link

mattacosta commented Jun 8, 2018

That isn't what this change does. It's also completely normal for me to see InlineHtml (tolerant-php-parser is an accurate representation of the tokens in the PHP code)

I am aware of what this change does, and yes the parser can do whatever it wants with regard to the nodes it creates. I was simply noting that reusing an EchoExpression object would result in the exact same representation while also conforming with what PHP actually does.

However, my original point was that <?= statements should be equivalent to echo statements, so at the most there should be some minor refactoring instead of a different implementation.

Just glancing at the code, I believe that you may have also introduced a new bug:

<?php echo $a  // ERROR: ';' expected
<?= $a  // NO ERROR

Edit: Formatting. Bug confirmed.

@TysonAndre
Copy link
Contributor Author

It definitely introduces a new bug. To fix this, I'll need to do the same thing as the parser would when parsing the rest of an ExpressionStatement (check for ?> or ;)

Before, tolerant-php-parser parsed the tokens after `<?=` and `<?php`
the same way.

Fixes microsoft#220 (This commit and subsequent commits in this PR)

This adds `TokenKind::ScriptSectionStartWithEchoTag` and handles the
subsequent tokens differently if that token kind is seen instead of
`ScriptSectionStartTag`.

This reuses `ExpressionStatement` with an `EchoExpression` for simplicity.
The resulting expression will have no `echoKeyword` because
`<?=` was part of the preceding `InlineHTML` Node.

- In all cases that I'm aware of,
  the ExpressionStatement with the EchoExpression will be moved
  into the outer statement list.

NOTE: echo is a statement, not an expression.
It must be followed by `?>` or `;`

- `echo exprList` can't be used as an expression.

- Contrast with `print expr`, (`PrintIntrinsicExpression`),
  which is correctly treated as an expression.

Update documentation

Add a test that the Parser will warn about `<?= 'expr'<EOF>`.

- Fixing that bug got rid of an incorrect EmptyExpression in another test
  (programStructure5.php.tree)
@@ -0,0 +1 @@
<?= "expression" // properly warns about missing ';'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests the regression that was mentioned

@TysonAndre
Copy link
Contributor Author

This is ready for review. Tests pass, and the bug was fixed

I changed InlineHtml->echoExpression to InlineHtml->echoStatement. It didn't make sense to add an EchoExpression to the list of statements (The new behavior matches what a regular echo statement would do).

Copy link
Member

@roblourens roblourens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks!

@roblourens roblourens merged commit bce97ae into microsoft:master Jun 11, 2018
@TysonAndre TysonAndre deleted the fix-short-echo branch March 27, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants