Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle additional validation errors #310

Merged
merged 28 commits into from
Mar 28, 2016
Merged

Handle additional validation errors #310

merged 28 commits into from
Mar 28, 2016

Conversation

bcampeau
Copy link
Contributor

@bcampeau bcampeau commented Mar 4, 2016

  • Removes <a> tags that have invalid href attributes, but preserves child elements.
  • Removes <font> tags, which aren't allowed, but preserves child elements.
  • Slightly improved handling of target attributes which must be lowercase for validation.
  • Removes embeds during sanitization that weren't converted during embed handling.

@@ -102,6 +107,26 @@ public function get_data() {
'<a href="http://example.com" target="boom">Link</a>',
'<a href="http://example.com">Link</a>',
),

'a_with_href_mailto' => array(
'<a href="mailto:email@domain.com">Link</a>',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was throwing validation errors in Webmaster Tools as recently as last week. It looks like they've just updated the spec to support it and a few others:

  allowed_protocol: "http"
  allowed_protocol: "https"
  allowed_protocol: "mailto"
  # Whitelisting additional commonly observed third party
  # protocols which should be safe
  allowed_protocol: "sms"
  allowed_protocol: "tel"
  allowed_protocol: "viber"
  allowed_protocol: "whatsapp"

Will update accordingly.

@mjangda mjangda changed the title Handle additional validation errors [WIP] Handle additional validation errors Mar 4, 2016
@bcampeau
Copy link
Contributor Author

bcampeau commented Mar 8, 2016

With regards to the comment about moving the new filters amp_blacklisted_protocols, amp_blacklisted_tags and amp_blacklisted_attributes to use args like add_placeholder on the iframe class, I dug deep into this logic.

The initial problem is that the base class is using array_merge to replace default args with the supplied args for the class. Following the same pattern, you'd end up needing to supply the entire list of current blacklisted protocols/tags/attributes to add to or modify the list if they were directly customizable. A filter works better in this use case since it doesn't require you to keep up with changes to the plugin just to add a few additional tags/protocols/attributes required by your theme's content.

However, I can't think of a scenario where you'd ever want to allow someone to modify the built in since they will always be invalid. In essence, you'd be letting someone intentionally break validation which seems pointless. So, the args I've added are only for additional tags/protocols/attributes and do not allow any modification of the built-in list. I think this addresses following the pattern established with the iframe class and also protects anyone using this plugin against doing something intentionally or accidentally invalid.

@bcampeau bcampeau changed the title [WIP] Handle additional validation errors Handle additional validation errors Mar 8, 2016
@bcampeau
Copy link
Contributor Author

bcampeau commented Mar 8, 2016

This is ready for final consideration for merging and is no longer a work in progress. I decided to add a new function to AMP_Base_Sanitizer called validate_src_attribute that allows the implementing function (iframe, video or audio) to decide whether or not the src should be forced to https (iframes do this currently, audio and video don't) or allow the theme to decide via the amp_require_https_src filter whether https should be required. If this filter returns true, the src is removed altogether.

If an iframe, audio or video node is found to have no valid src, it is then altogether removed. This only happens if the amp_require_https_src is used so without that, the functionality is identical to how it is currently. We could consider making the removal more automated once the determination is made on the proper fallback methodology. However, I advocate for merging this before then as this filter will help many sites resolve a large batch of validation errors.

I've also added a bunch of new unit tests for these scenarios.

$protocol = strtok( $src, ':' );
if ( 'https' !== $protocol ) {
// Check if https is required
$https_required = apply_filters( 'amp_require_https_src', false );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this to an arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Rewrote this and the corresponding unit tests.

$this->sanitize_a_attribute( $node, $attribute );
// Sanitize the tag, but remove it entirely if the href is invalid.
// Children will be preserved as part of the parent.
if ( false === $this->sanitize_a_attribute( $node, $attribute ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should separate out the validation and sanitization. No point in sanitizing the attributes if we're just going to drop the link. May have to do the validation before we process any attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point indeed. I've handled this and also added the scenario of <a name="section2"></a> both to the validation and unit tests.

@bcampeau
Copy link
Contributor Author

This is ready for another review.


// If no href is set and this isn't an anchor, it's invalid
if ( empty( $href ) ) {
if ( ! empty( $node->getAttribute( 'name' ) ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks in PHP 5.2:

Fatal error: Can't use method return value in write context in /home/travis/build/Automattic/amp-wp/includes/sanitizers/class-amp-blacklist-sanitizer.php on line 124

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@bcampeau
Copy link
Contributor Author

This is ready for another review. Just got back from a few days off and saw/fixed the failing test.

// TODO: `source` does not have closing tag, and DOMDocument doesn't handle it well.
foreach ( $node->childNodes as $child_node ) {
$new_child_node = $child_node->cloneNode( true );
$new_node->appendChild( $new_child_node );
$old_child_attributes = AMP_DOM_Utils::get_node_attributes_as_assoc_array( $new_child_node );
$new_child_attributes = $this->filter_attributes( $old_child_attributes );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to block merging (I will open a new issue for this) but we should probably filter attributes for the source elements separately (since the general attributes don't apply for it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point and agreed on a new issue

@mjangda
Copy link
Contributor

mjangda commented Mar 26, 2016

Spotted two things but otherwise this is good to go. Thanks again for working on this!

@bcampeau
Copy link
Contributor Author

Fixed or commented on those items, when you're ready to take another look.

@mjangda mjangda merged commit 572df89 into ampproject:master Mar 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants