Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent removal of closing table and td tags in script[template="amp-mustache"] #4333

Merged
merged 13 commits into from
Mar 10, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions src/Dom/Document.php
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ public function loadHTML( $source, $options = 0 ) {
$source = $this->replace_self_closing_tags( $source );
$source = $this->normalize_document_structure( $source );
$source = $this->maybe_replace_noscript_elements( $source );
$source = $this->replace_mustache_templates( $source );
$source = $this->secure_doctype_node( $source );

list( $source, $this->original_encoding ) = $this->detect_and_strip_encoding( $source );
Expand All @@ -341,6 +342,7 @@ public function loadHTML( $source, $options = 0 ) {

if ( $success ) {
$this->normalize_html_attributes();
$this->restore_mustache_templates();

// Remove http-equiv charset again.
$meta = $this->head->firstChild;
Expand Down Expand Up @@ -666,6 +668,45 @@ private function maybe_restore_noscript_elements( $html ) {
);
}

/**
* Replaces the tag name of script[template="amp-mustache"], as a workaround to a parsing issue.
*
* This script can have closing tags of its children table and td stripped.
* So this changes its name from script to tmp-script to avoid this.
*
* @link https://github.com/ampproject/amp-wp/issues/4254
* @see restore_mustache_templates() Reciprocal function.
*
* @param string $html To replace the tag name of the mustache templates in.
* @return string The HTML, with the tag name of the mustache templates replaced.
*/
private function replace_mustache_templates( $html ) {
return preg_replace(
'#<script(\s[^>]*template=["\']?amp-mustache["\']?[^>]*)>(.*?)</script>#s',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend the following changes to the regex to make it less brittle:

  • Make it case-insensitive by adding the i flag
  • Support whitespace at the end of the closing tag (</script >), which is valid HTML
  • Only match a pair of matching quotes, instead of a mismatch, which would be parsed differently
  • Make the initial non-closing-bracket group non-greedy to avoid some backtracking (<script(\s[^>]*?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, applied in 52d49e9

'<tmp-script$1>$2</tmp-script>',
$html
);
}

/**
* Restores the tag names of script[template="amp-mustache"] elements that were replaced earlier.
*
* @see replace_mustache_templates() Reciprocal function.
*/
private function restore_mustache_templates() {
$tmp_script_elements = iterator_to_array( $this->getElementsByTagName( 'tmp-script' ) );
foreach ( $tmp_script_elements as $tmp_script_element ) {
$script = $this->createElement( 'script' );
foreach ( $tmp_script_element->attributes as $attr ) {
$script->setAttribute( $attr->nodeName, $attr->nodeValue );
}
while ( $tmp_script_element->firstChild ) {
$script->appendChild( $tmp_script_element->firstChild );
}
$tmp_script_element->parentNode->replaceChild( $script, $tmp_script_element );
}
}

/**
* Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes).
*
Expand Down
25 changes: 25 additions & 0 deletions tests/php/test-class-amp-dom-document.php
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,31 @@ public function data_dom_document() {
'<!DOCTYPE html><html><head profile=""></head><body></body></html>',
'<!DOCTYPE html><html><head><meta charset="utf-8"></head><body></body></html>',
],
'amp_mustache_template_with_table' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><template type="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></template></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><template type="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></template></body></html>',
],
'amp_mustache_template_script_with_table' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script id="baz" type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for a script that had more than one child node? This will help ensure no bugs get introduced in regards to the order at restoration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 5ef39b7

'<!DOCTYPE html><html>' . $head . '<body><script id="baz" type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
],
'amp_mustache_template_single_quotes' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script type=\'text/plain\' template=\'amp-mustache\'><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><script type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
],
'amp_mustache_script_multiple_children' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script type="text/plain" template="amp-mustache"><h1>{{heading}}</h1><p>{{content}}</p><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><script type="text/plain" template="amp-mustache"><h1>{{heading}}</h1><p>{{content}}</p><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
],
'multiple_mustache_templates_still_appear' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script template="amp-mustache" type="text/plain" id="foo"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script><script type="text/plain" template="amp-mustache" id="example"><p>{{#baz}}This is inside a template{{/baz}}</p></script></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><script template="amp-mustache" type="text/plain" id="foo"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script><script type="text/plain" template="amp-mustache" id="example"><p>{{#baz}}This is inside a template{{/baz}}</p></script></body></html>',
],
];
}

Expand Down