Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent removal of closing table and td tags in script[template="amp-mustache"] #4333

Merged
merged 13 commits into from
Mar 10, 2020

Conversation

kienstra
Copy link
Contributor

@kienstra kienstra commented Feb 25, 2020

Summary

Before, <script template="amp-mustache" ...> had the closing tags of child <table> and <td> stripped.

Now, given a Custom HTML block with the snippet from #4254 (comment):

<script type="text/plain" template="amp-mustache"><table><tr>{{#foo}}<td></td>{{/foo}}</tr></table></script>

...that snippet now appears the same on the front-end:

must

Fixes #4254
Fixes #4276

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

…tput

Now, it doesn't have </table> stripped,
or the closing </td>.
@googlebot googlebot added the cla: yes Signed the Google CLA label Feb 25, 2020
*/
private function replace_mustache_templates( $html ) {
return preg_replace(
'#<script(\s[^>]*template="amp-mustache"[^>]*)>(.*?)</script>#s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made a bit more robust. Consider attribute values using single quotes or not quotes at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add handling for cases like template='amp-mustache' or template=amp-mustache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about 83d3dd6 ?

],
'amp_mustache_template_script_with_table' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script id="baz" type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for a script that had more than one child node? This will help ensure no bugs get introduced in regards to the order at restoration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 5ef39b7

As Weston suggested,
to prevent regressions.
Weston mentioned that this could have
single or no quotes.
I hadn't thought of that.
@kienstra
Copy link
Contributor Author

kienstra commented Feb 25, 2020

Test_AMP_Validation_Manager::test_has_parameters_passed_by_reference has been consistently failing here. Though it's in the allowed failures in Travis.

*/
private function replace_mustache_templates( $html ) {
return preg_replace(
'#<script(\s[^>]*template=["\']?amp-mustache["\']?[^>]*)>(.*?)</script>#s',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend the following changes to the regex to make it less brittle:

  • Make it case-insensitive by adding the i flag
  • Support whitespace at the end of the closing tag (</script >), which is valid HTML
  • Only match a pair of matching quotes, instead of a mismatch, which would be parsed differently
  • Make the initial non-closing-bracket group non-greedy to avoid some backtracking (<script(\s[^>]*?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, applied in 52d49e9

@schlessera
Copy link
Collaborator

Updated description to note that this PR also fixes #4276.

Allow spaces in the closing </script>,
like </script >
Also, make this case-insensitive
and ensure the quotes types are the same.
Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll merge this after the mammoth PR #4019 is merged (which will likely require merge conflict resolutions on this PR).

@westonruter westonruter mentioned this pull request Mar 4, 2020
9 tasks
@westonruter
Copy link
Member

Merge conflicts need to be resolved here. Take special care that this code needs to be included/adapted into the new Document:

amp-wp/src/Dom/Document.php

Lines 357 to 366 in 65a16ef

// Add the required utf-8 meta charset tag.
$charset = $this->createElement( 'meta' );
$charset->setAttribute( 'charset', self::AMP_ENCODING );
$this->head->insertBefore( $charset, $this->head->firstChild );
// Do some further clean-up.
$this->deduplicate_tag( self::TAG_HEAD );
$this->deduplicate_tag( self::TAG_BODY );
$this->move_invalid_head_nodes_to_body();
$this->convert_head_profile_to_link();

That includes code from #4193 and #4107 at least.

@westonruter
Copy link
Member

This PR is extra ready for merge conflict resolution now that #4297 is merged.

The conflicts were deleted files:
src/Dom/Document.php
tests/php/test-class-amp-dom-document.php

This simply moves the changes in those
files to the relevant new files.
@kienstra
Copy link
Contributor Author

kienstra commented Mar 9, 2020

Hm, merging in develop didn't go very well.

The conflicts were deleted files:
src/Dom/Document.php
tests/php/test-class-amp-dom-document.php

This simply moves the changes in those
files to the relevant new files.
Also, correct a function name
in a DocBlock tag.
Maybe <script async shouldn't be changed to
<script async=""
But with how the workaround works,
it sets the script attributes to their previous attributes.
So <script async
will eventually end up as <script async="".
@kienstra
Copy link
Contributor Author

kienstra commented Mar 9, 2020

Hi @westonruter,
Hope you had a great weekend.

Merge conflicts need to be resolved here. Take special care that this code needs to be included/adapted into the new Document:

Good point to look at loadHTML().

But do you think $this->insertMissingCharset() takes care of that?

if ($success) {
$this->insertMissingCharset();
// Do some further clean-up.
$this->deduplicateTag(Tag::HEAD);
$this->deduplicateTag(Tag::BODY);
$this->moveInvalidHeadNodesToBody();
$this->movePostBodyNodesToBody();
$this->convertHeadProfileToLink();

So far, it looks like it does, but I definitely could be missing something.

Thanks, Weston!

@westonruter
Copy link
Member

But do you think $this->insertMissingCharset() takes care of that?

I don't understand. I was just saying that there was a merge conflict with PRs merged from #4193 and #4107 (among others).

@westonruter
Copy link
Member

@schlessera Please approve if the changes are 👍 from your end as well.

@kienstra
Copy link
Contributor Author

I don't understand. I was just saying that there was a merge conflict with PRs merged from #4193 and #4107 (among others).

Ah, OK. I misunderstood, I think all is well 😄

lib/common/src/Dom/Document.php Outdated Show resolved Hide resolved
lib/common/src/Dom/Document.php Outdated Show resolved Hide resolved
Co-Authored-By: Alain Schlesser <alain.schlesser@gmail.com>
@westonruter westonruter added this to the v1.5 milestone Mar 10, 2020
@westonruter westonruter merged commit 62634ed into develop Mar 10, 2020
@westonruter westonruter deleted the fix/mustache-template-workaround branch March 10, 2020 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed the Google CLA
Projects
None yet
4 participants