Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent removal of closing table and td tags in script[template="amp-mustache"] #4333

Merged
merged 13 commits into from
Mar 10, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions src/Dom/Document.php
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ public function loadHTML( $source, $options = 0 ) {
$source = $this->replace_self_closing_tags( $source );
$source = $this->normalize_document_structure( $source );
$source = $this->maybe_replace_noscript_elements( $source );
$source = $this->replace_mustache_templates( $source );
$source = $this->secure_doctype_node( $source );

list( $source, $this->original_encoding ) = $this->detect_and_strip_encoding( $source );
Expand All @@ -341,6 +342,7 @@ public function loadHTML( $source, $options = 0 ) {

if ( $success ) {
$this->normalize_html_attributes();
$this->restore_mustache_templates();

// Remove http-equiv charset again.
$meta = $this->head->firstChild;
Expand Down Expand Up @@ -666,6 +668,45 @@ private function maybe_restore_noscript_elements( $html ) {
);
}

/**
* Replaces the tag name of script[template="amp-mustache"], as a workaround to a parsing issue.
*
* This script can have closing tags of its children table and td stripped.
* So this changes its name from script to tmp-script to avoid this.
*
* @link https://github.com/ampproject/amp-wp/issues/4254
* @see restore_mustache_templates() Reciprocal function.
*
* @param string $html To replace the tag name of the mustache templates in.
* @return string The HTML, with the tag name of the mustache templates replaced.
*/
private function replace_mustache_templates( $html ) {
return preg_replace(
'#<script(\s[^>]*template="amp-mustache"[^>]*)>(.*?)</script>#s',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made a bit more robust. Consider attribute values using single quotes or not quotes at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add handling for cases like template='amp-mustache' or template=amp-mustache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about 83d3dd6 ?

'<tmp-script$1>$2</tmp-script>',
$html
);
}

/**
* Restores the tag names of script[template="amp-mustache"] elements that were replaced earlier.
*
* @see replace_mustache_templates() Reciprocal function.
*/
private function restore_mustache_templates() {
$tmp_script_elements = iterator_to_array( $this->getElementsByTagName( 'tmp-script' ) );
foreach ( $tmp_script_elements as $tmp_script_element ) {
$script = $this->createElement( 'script' );
foreach ( $tmp_script_element->attributes as $attr ) {
$script->setAttribute( $attr->nodeName, $attr->nodeValue );
}
while ( $tmp_script_element->firstChild ) {
$script->appendChild( $tmp_script_element->firstChild );
}
$tmp_script_element->parentNode->replaceChild( $script, $tmp_script_element );
}
}

/**
* Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes).
*
Expand Down
15 changes: 15 additions & 0 deletions tests/php/test-class-amp-dom-document.php
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,21 @@ public function data_dom_document() {
'<!DOCTYPE html><html><head profile=""></head><body></body></html>',
'<!DOCTYPE html><html><head><meta charset="utf-8"></head><body></body></html>',
],
'amp_mustache_template_with_table' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><template type="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></template></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><template type="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></template></body></html>',
],
'amp_mustache_template_script_with_table' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script id="baz" type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for a script that had more than one child node? This will help ensure no bugs get introduced in regards to the order at restoration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 5ef39b7

'<!DOCTYPE html><html>' . $head . '<body><script id="baz" type="text/plain" template="amp-mustache"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script></body></html>',
],
'multiple_mustache_templates_still_appear' => [
'utf-8',
'<!DOCTYPE html><html>' . $head . '<body><script template="amp-mustache" type="text/plain" id="foo"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script><script type="text/plain" template="amp-mustache" id="example"><p>{{#baz}}This is inside a template{{/baz}}</p></script></body></html>',
'<!DOCTYPE html><html>' . $head . '<body><script template="amp-mustache" type="text/plain" id="foo"><table><tr>{{#example}}<td></td>{{/example}}</tr></table></script><script type="text/plain" template="amp-mustache" id="example"><p>{{#baz}}This is inside a template{{/baz}}</p></script></body></html>',
],
];
}

Expand Down