Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize invalid children of amp-story and amp-story-page elements to prevent white story of death #3336

Merged
merged 2 commits into from
Sep 24, 2019

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Sep 24, 2019

A compatibility issue was discovered in #3321 with the Reading Time WP plugin, but it is likely going to happen with other plugins as well. The Reading Time WP plugin filters the_content to inject this at the beginning:

<span class="rt-reading-time" style="display: block;"><span class="rt-label">Reading Time: </span> <span class="rt-time">1</span> <span class="rt-label rt-postfix">minute</span></span>

This results in invalid amp-story which restricts its children to elements like amp-story-page. When this span is a direct child and the child_tag_name_oneof constraint is violated, the result is the entire amp-story being invalid and a white story of death (where the body has no children). The validation error is not helpful at all:

image

This problem was actually “prophesied” in #2926:

👉 A side-effect of the change here is the sanitization of AMP components with invalid children will be more draconian. For example, if an amp-story has an invalid child element, then the entire amp-story element will be removed, as opposed to the invalid child alone being removed. Nevertheless, since only AMP components have such validation constraints for children, it should not so common for this to occur in WordPress sites that are just outputting normal HTML. It only will become an issue when starting to use AMP components, and this makes #1420 much more important as it will be needed to explain why the element was removed.

So this PR fixes the problem by extending the AMP_Story_Sanitizer to preemptively remove AMP Story elements under amp-story and amp-story-page which are invalid. These are the two elements which have the child_tag_name_oneof constraint. This special case sanitizer is especially important for AMP Stories since all of the markup for a story is in post_content and is prone to be mutated with the_content filters to add elements like word counts, sharing buttons, and related posts. This PR prevents such elements from being seen by the tag-and-attribute sanitizer, thus preventing the amp-story and amp-story-page as a whole from being removed.

In the case of the span which the Reading Time WP plugin adds to the_content, the validation error now becomes much more helpful:

image

And no white story of death occurs.

Fixes #3321.

Copy link
Collaborator

@schlessera schlessera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpicks only.

/**
* Sanitize the AMP elements contained by <amp-story-page> element where necessary.
*
* @since 0.2
*/
public function sanitize() {
$nodes = $this->dom->getElementsByTagName( self::$tag );
$num_nodes = $nodes->length;
$this->amp_story_tag_spec = AMP_Allowed_Tags_Generated::get_allowed_tag( 'amp-story' )[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_allowed_tag() can potentially return null and this will then throw a notice on PHP 7.4+: https://3v4l.org/pnjnl

However, I assume we fully control the allowed tags here and those we check for here can't be filtered away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. If we update the Validator spec and it results in these tag specs being null then we'd catch it in the unit test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schlessera I hardened this in 685cfa1.

return;
$amp_story_element = $this->dom->getElementsByTagName( 'amp-story' )->item( 0 );
if ( $amp_story_element instanceof DOMElement ) {
$this->sanitize_story_element( $amp_story_element );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this flows seems counterintuitive to me, it makes it look like sanitizing the story element is an edge case.

I would prefer it for the condition to be inversed and add an early return. Then have the sanitize_story_element() as the default next step.

$node = $element->firstChild;
while ( $node ) {
$next_node = $node->nextSibling;
if ( $node instanceof DOMElement ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic inversion here, I would prefer an early return (continue in this case) instead of making the main logic look like an edge case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but the reason why I did it this way was because of $node = $next_node needing to run below. Otherwise, I'd have added:

if ( ! $node instanceof DOMElement ) {
    $node = $next_node;
    continue;
}

But that seems worse because the logic is duplicated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this:

$node = $element->firstChild;
do {
	$next_node = $node->nextSibling;
	if ( ! $node instanceof DOMElement ) {
		continue;
	}
	if ( 'amp-story-page' === $node->nodeName ) {
		$page_number++;
		$this->sanitize_story_page_element( $node, $page_number );
	} elseif ( ! in_array( $node->nodeName, $this->amp_story_tag_spec['tag_spec']['child_tags']['child_tag_name_oneof'], true ) ) {
		$this->remove_invalid_child( $node );
	}
} while ( $node = $next_node );

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is mostly just preference here. I'll approve the changes and let you decide whether you want to make changes or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like that. However, I tried it and then there is a PHPCS compliant: WordPress.CodeAnalysis.AssignmentInCondition.FoundInWhileCondition. We can revisit later.

$node = $element->firstChild;
while ( $node ) {
$next_node = $node->nextSibling;
if ( $node instanceof DOMElement ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer an early return/continue here instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reasoning above.

'story_with_invalid_layer_siblings' => [
'<amp-story-page><p>Before layer</p><amp-story-grid-layer><p>Lorem Ipsum Demet Delorit.</p></amp-story-grid-layer><p>After layer</p></amp-story-page</p>',
'<amp-story-page><amp-story-grid-layer><p>Lorem Ipsum Demet Delorit.</p></amp-story-grid-layer></amp-story-page>',
],
];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for CTA removal is missing...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be covered above by story_with_cta_on_first_page and story_with_multiple_cta_on_second_page.

@westonruter westonruter mentioned this pull request Sep 24, 2019
7 tasks
if ( ! isset( $rule_specs ) ) {
continue;
}
foreach ( $rule_specs as $rule_spec ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the $rule_specs array only has one item in it.

@westonruter westonruter merged commit 559e9a8 into develop Sep 24, 2019
@westonruter westonruter deleted the fix/story-sanitization branch September 24, 2019 18:01
westonruter added a commit that referenced this pull request Sep 24, 2019
… prevent white story of death (#3336)

* Sanitize invalid children of amp-story and amp-story-page elements

* Harden logic for gathering allowed children for AMP Stories
westonruter added a commit that referenced this pull request Oct 1, 2019
* tag '1.3.0': (318 commits)
  Bump 1.3.0
  Add inline styles for custom fonts (#3345)
  Limit deeply-nesting test to 200 to fix Xdebug error (#3341)
  Bump 1.3-RC2 (#3335)
  Sanitize invalid children of amp-story and amp-story-page elements to prevent white story of death (#3336)
  Remove unused Travis deploy stage (#3340)
  Implement automated accessibility testing using Axe (#3294)
  Only add all Google Font style rules in editor context
  Prevent adding AMP query var to Story URLs in Compatibility Tool
  Prevent attempting to redirect Stories with rejected validation errors
  Ensure all AMP scripts (including v0.js) get moved to the head
  Make sure that media picker is background types are filter correctly.
  Normalize style[type] attribute quote style after r46164 in WP core
  Fix phpunit covers tags
  Bump version to 1.3-RC1
  Strip 100% width/height from layout=fill elements
  Fix issue with cut (#3246)
  Remove unused Google Fonts SVGs (#3289)
  Fix resize for non-fit text box (#3259)
  Use template_dir consistently as signal for transitional mode
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed the Google CLA Sanitizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Markup prepended via the_content filter causes amp-story to be removed from page
4 participants