Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for comments #871

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7ec60e4
Make add `get_dom` method to AMP_DOM_Utils
DavidCramer Jan 17, 2018
eed214f
add amp_form_sanitizer
DavidCramer Jan 17, 2018
4067b59
add form sanitisation on output buuffer
DavidCramer Jan 17, 2018
a8d2ad3
merge develop in
DavidCramer Jan 18, 2018
98874ba
Merge develop
DavidCramer Jan 19, 2018
4f6dcd6
add comments REST endpoints
DavidCramer Jan 19, 2018
317f90f
Add AMP Comments Walker
DavidCramer Jan 19, 2018
451db91
Add comments template overrides and methods
DavidCramer Jan 19, 2018
0541686
add comment sanitizer
DavidCramer Jan 19, 2018
2c8575e
Make walker add the HTML to reduce load in the sanitiser.
DavidCramer Jan 19, 2018
ac0cc53
its a filter, not an action.
DavidCramer Jan 19, 2018
2741fdf
Merge develop
DavidCramer Jan 22, 2018
ebffaae
add form sanitisation changes
DavidCramer Jan 22, 2018
9790be3
make action setting more sane
DavidCramer Jan 22, 2018
b425b19
Add sanitisation and component_type method
DavidCramer Jan 22, 2018
b7c5963
optimize comments endpoint
DavidCramer Jan 23, 2018
94aee02
wp_unslash request_uri for action attribute.
DavidCramer Jan 23, 2018
d0fc6c5
cleaner comments template generation
DavidCramer Jan 23, 2018
6b660b3
merge develop
DavidCramer Jan 23, 2018
69d1342
cleanup formatting
DavidCramer Jan 24, 2018
4200d9e
output comment template on same comments pass
DavidCramer Jan 24, 2018
366bef3
cleanup comments sanitiser and add url template string filtering
DavidCramer Jan 24, 2018
7b63836
add comment template url string filters
DavidCramer Jan 24, 2018
8b289c1
remove unnecessary duplicated `wp_list_comments` pass
DavidCramer Jan 24, 2018
15d327d
Hook into AMP xhr-submissions to handle headers and JSON responses.
DavidCramer Jan 24, 2018
561189b
add amp-list fallback for comments.
DavidCramer Jan 24, 2018
a1bf8d3
add comment form success and error handlers and wrapper code.
DavidCramer Jan 24, 2018
b40e932
lowercase `post` k, thanks.
DavidCramer Jan 24, 2018
dc1f9c3
Merge develop
DavidCramer Jan 24, 2018
187e136
fix formatting fail.
DavidCramer Jan 24, 2018
82e8021
if no template, move on.
DavidCramer Jan 25, 2018
d4fd8bc
No need to check for the `amp_comments` tag anymore.
DavidCramer Jan 25, 2018
bbabbd7
add tests for form and comment sanitisation.
DavidCramer Jan 25, 2018
0fd1ab5
add docs to tests
DavidCramer Jan 25, 2018
9f99abe
No need to get images again.
DavidCramer Jan 25, 2018
9216e4e
check has images before converting urls
DavidCramer Jan 25, 2018
fafa679
fix phpdoc
DavidCramer Jan 25, 2018
9536794
Merge branch 'develop' of https://github.com/Automattic/amp-wp into a…
westonruter Jan 25, 2018
74ae3b5
Prevent saveHTML from munging amp-mustache curly braces via URL-encoding
westonruter Jan 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions includes/class-amp-autoloader.php
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ class AMP_Autoloader {
'AMP_Blacklist_Sanitizer' => 'includes/sanitizers/class-amp-blacklist-sanitizer',
'AMP_Iframe_Sanitizer' => 'includes/sanitizers/class-amp-iframe-sanitizer',
'AMP_Img_Sanitizer' => 'includes/sanitizers/class-amp-img-sanitizer',
'AMP_Form_Sanitizer' => 'includes/sanitizers/class-amp-form-sanitizer',
'AMP_Playbuzz_Sanitizer' => 'includes/sanitizers/class-amp-playbuzz-sanitizer',
'AMP_Style_Sanitizer' => 'includes/sanitizers/class-amp-style-sanitizer',
'AMP_Tag_And_Attribute_Sanitizer' => 'includes/sanitizers/class-amp-tag-and-attribute-sanitizer',
Expand Down
7 changes: 7 additions & 0 deletions includes/class-amp-theme-support.php
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,14 @@ public static function finish_output_buffering( $output ) {
1
);

$dom = AMP_DOM_Utils::get_dom( $output );
// Sanitize forms in the document.
$sanitizer = new AMP_Form_Sanitizer( $dom );
$sanitizer->sanitize();

// @todo Add more validation checking and potentially the whitelist sanitizer.
$output = $dom->saveHTML();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidCramer I chatted about this with @ThierryA, but I think we should avoid having to load the entire response into the DOM for sanitization. I think that we should limit sanitization to just the post content, 3rd-party widgets, and other “leaf nodes” which we know are liable to be invalid AMP.

Is this right? Or am I off track? My worry is that loading the HTML into a DOMDocument for every request will be too heavy. But at the same time, if we're already spinning up a DOMDocument for every instance of the_content() maybe it is actually better to just do one for the entire HTML. This would certainly simplify somethings for sanitizing widgets (cc @kienstra). It could also mean that we wouldn't have to have this ad hoc logic to search for elements requiring components: https://github.com/Automattic/amp-wp/blob/e60cb152a50e2ffbf5504d41aee859ad1d0e0baa/includes/class-amp-theme-support.php#L448-L455

Maybe we should decide to not prematurely optimize and instead keep things simple at first and just go ahead and sanitize the entire response. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See findings at #875 (comment)

It seems we should definitely not sanitize the entire output buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter whats the consensus now on this? It looks like your findings indicated it may be a good idea to sanitize the whole buffer output. I personally think this is the most efficient. I spoke with @ThierryA about this yesterday and he was going to do some tests, like you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward sanitizing the entire output buffer now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidCramer do you want to build off of #875 (comment) in a new PR to serve as a basis for what you're introducing here with form sanitization?


return $output;
}
}
74 changes: 74 additions & 0 deletions includes/sanitizers/class-amp-form-sanitizer.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<?php
/**
* Class AMP_Form_Sanitizer.
*
* @package AMP
*/

/**
* Class AMP_Form_Sanitizer
*
* Strips and corrects attributes in forms.
*/
class AMP_Form_Sanitizer extends AMP_Base_Sanitizer {

/**
* Tag.
*
* @var string HTML <form> tag to identify and process.
*
* @since 0.2
*/
public static $tag = 'form';

/**
* Sanitize the <img> elements from the HTML contained in this instance's DOMDocument.
*
* @since 0.2
*/
public function sanitize() {

/**
* Node list.
*
* @var DOMNodeList $node
*/
$nodes = $this->dom->getElementsByTagName( self::$tag );
$num_nodes = $nodes->length;

if ( 0 === $num_nodes ) {
return;
}

for ( $i = $num_nodes - 1; $i >= 0; $i-- ) {
$node = $nodes->item( $i );
if ( ! $node instanceof DOMElement ) {
continue;
}

if ( ! $node->hasAttribute( 'action' ) || '' === $node->getAttribute( 'action' ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the spec: https://www.ampproject.org/docs/reference/components/amp-form#target

This attribute is required for method=GET. For method=POST, the action attribute is invalid, use action-xhr instead.

Is the second part of this conditional intended to check if 'get' === $node->getAttribute( 'method' )? Since the default method is get then I think there should be something above that does:

$method = 'get';
if ( $node->hasAttribute( 'method' ) ) {
    $method = strtolower( $node->getAttribute( 'method' ) );
}

And then use the $method variable going forward.

$node->parentNode->removeChild( $node );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right. If method is get and there is no action, then what this really should do I think is just set the action to be esc_url_raw( 'https://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'] ), because the default action for a form is the URL that the form appears on.

continue;
}

// Correct action.
if ( $node->hasAttribute( 'action' ) && ! $node->hasAttribute( 'action-xhr' ) ) {
$action_url = $node->getAttribute( 'action' );
$action_url = str_replace( 'http:', '', $action_url );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that http: appears somewhere else in the URL. So then this should instead use a regex like: preg_replace( '#^http:(?=//)#', '', $action_url ) to ensure it only replaces at the beginning of the string.

$node->setAttribute( 'action', $action_url );

if ( 'post' === $node->getAttribute( 'method' ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this could compare with $method.

$node->setAttribute( 'action-xhr', $action_url );
$node->removeAttribute( 'action' );
}
}

// Set a target if needed.
if ( ! $node->hasAttribute( 'target' ) ) {
$node->setAttribute( 'target', '_blank' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the spec: https://www.ampproject.org/docs/reference/components/amp-form#target

Indicates where to display the form response after submitting the form. The value must be _blank or _top.

The default I think should be _top as that is closer to the default behavior for a form. The actual default is _self but that's not valid AMP. So I think this should be doing:

if ( ! in_array( $node->hasAttribute( 'target' ), array( '_top', '_blank' ) ) ) {
    $node->setAttribute( 'target', '_top' );

}
}

}

}
37 changes: 30 additions & 7 deletions includes/utils/class-amp-dom-utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,15 @@
class AMP_DOM_Utils {

/**
* Return a valid DOMDocument representing arbitrary HTML content passed as a parameter.
*
* @see Reciprocal function get_content_from_dom()
* Return a valid DOMDocument representing HTML document passed as a parameter.
*
* @since 0.2
* @since 0.7
*
* @param string $content Valid HTML content to be represented by a DOMDocument.
* @param string $document Valid HTML document to be represented by a DOMDocument.
*
* @return DOMDocument|false Returns DOMDocument, or false if conversion failed.
*/
public static function get_dom_from_content( $content ) {
public static function get_dom( $document ) {
$libxml_previous_state = libxml_use_internal_errors( true );

$dom = new DOMDocument();
Expand All @@ -35,7 +33,7 @@ public static function get_dom_from_content( $content ) {
* Add utf-8 charset so loadHTML does not have problems parsing it.
* See: http://php.net/manual/en/domdocument.loadhtml.php#78243
*/
$result = $dom->loadHTML( '<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body>' . $content . '</body></html>' );
$result = $dom->loadHTML( $document );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here are conflicting in develop.


libxml_clear_errors();
libxml_use_internal_errors( $libxml_previous_state );
Expand All @@ -47,6 +45,31 @@ public static function get_dom_from_content( $content ) {
return $dom;
}

/**
* Return a valid DOMDocument representing arbitrary HTML content passed as a parameter.
*
* @see Reciprocal function get_content_from_dom()
*
* @since 0.2
*
* @param string $content Valid HTML content to be represented by a DOMDocument.
*
* @return DOMDocument|false Returns DOMDocument, or false if conversion failed.
*/
public static function get_dom_from_content( $content ) {
/*
* Wrap in dummy tags, since XML needs one parent node.
* It also makes it easier to loop through nodes.
* We can later use this to extract our nodes.
* Add utf-8 charset so loadHTML does not have problems parsing it.
* See: http://php.net/manual/en/domdocument.loadhtml.php#78243
*/
$document = '<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body>' . $content . '</body></html>';

return self::get_dom( $document );

}

/**
* Return valid HTML content extracted from the DOMDocument passed as a parameter.
*
Expand Down