Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid validation error and AMP-to-AMP linking failure for mailto: links #4182

Closed
westonruter opened this issue Jan 26, 2020 · 6 comments · Fixed by #4251
Closed

Invalid validation error and AMP-to-AMP linking failure for mailto: links #4182

westonruter opened this issue Jan 26, 2020 · 6 comments · Fixed by #4251
Labels
Bug Something isn't working QA passed Has passed QA and is done
Milestone

Comments

@westonruter
Copy link
Member

westonruter commented Jan 26, 2020

Bug Description

Add a Custom HTML block that contains the following markup:

<a href="mailto:?&#038;subject=Example&#038;body=https://example.com/">Email this</a>

This results in the AMP plugin detecting an INVALID_URL_PROTOCOL validation error:

image

This is in spite of the markup being entirely valid in AMP:

image

The issue also happens when an email address is provided:

<a href="mailto:foo@example.com?&#038;subject=Example&#038;body=https://example.com/">Email this</a>

Please note: the issue only occurs in Standard mode. In Transitional mode, it turns out that the AMP_Link_Sanitizer is mutating the mailto: link in a way that causes it to no longer be invalid. This turns out to also be an error, as \AMP_Link_Sanitizer::is_frontend_url() needs to only apply to URLs with the HTTP and HTTPS scheme. (See #4183 which fixes this so you can test in Transitional mode if checking out that branch.)

The reason for why AMP_Link_Sanitizer prevents the links from being invalid is that it mutates the URL via add_query_arg() to append ?amp. So it is making this change:

- mailto:?&subject=Example&body=https://example.com/
+ mailto:?subject=Example&body=https%3A%2F%2Fexample.com%2F&amp

The tag-and-attribute sanitizer likes the latter but not the former. But the AMP validator is OK with both, so we need to update our behavior to match. Our validator appears to be failing when the URL contains an unencoded colon (:), even though that is perfectly fine.

Expected Behaviour

Links with mailto: protocol that contain a query parameter that have a URL value get flagged as a validation error.

Steps to reproduce

  1. Enable Standard mode
  2. Create a Custom HTML block with the above mailto: link.
  3. Notice validation error.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

QA testing instructions

Demo

Changelog entry

@westonruter westonruter added the Bug Something isn't working label Jan 26, 2020
@westonruter westonruter added this to the v1.5 milestone Jan 26, 2020
@kienstra kienstra self-assigned this Feb 6, 2020
@kienstra
Copy link
Contributor

kienstra commented Feb 6, 2020

Implementation

Hi @westonruter,
Hope you had a great day.

What do you think about:

diff --git a/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php b/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php
index 6786a213d..9227c6a62 100644
--- a/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php
+++ b/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php
@@ -1666,7 +1666,7 @@ class AMP_Tag_And_Attribute_Sanitizer extends AMP_Base_Sanitizer {
         * @return string|null Protocol without colon if matched. Otherwise null.
         */
        private function parse_protocol( $url ) {
-               if ( preg_match( '#^[^/]+(?=:)#', $url, $matches ) ) {
+               if ( preg_match( '#^[^/]+?(?=:)#', $url, $matches ) ) {
                        return $matches[0];
                }
                return null;

...and of course I'll add some test cases for this.

That changes to the pattern to 'lazily' match [^/], so if there are two colons, it only matches up to the first colon.

For example, with mailto:foo:bar, it'll only match mailto instead of mailto:foo.

This looks to prevent validation errors in Standard mode for the 2 snippets you added:

<a href="mailto:?&#038;subject=Example&#038;body=https://example.com/">Email this</a>
<a href="mailto:foo@example.com?&#038;subject=Example&#038;body=https://example.com/">Email this</a>

And it prevents validation errors in Paired mode on the #4183 branch.

Thanks, Weston!

@westonruter
Copy link
Member Author

Yes, that seems like it will work well.

Alternatively, instead of [^/]+? a more restrictive set of characters. Per this SO answer and RFC 2396

3.1. Scheme Component

   Just as there are many different methods of access to resources,
   there are a variety of schemes for identifying such resources.  The
   URI syntax consists of a sequence of components separated by reserved
   characters, with the first component defining the semantics for the
   remainder of the URI string.

   Scheme names consist of a sequence of characters beginning with a
   lower case letter and followed by any combination of lower case
   letters, digits, plus ("+"), period ("."), or hyphen ("-").  For
   resiliency, programs interpreting URI should treat upper case letters
   as equivalent to lower case in scheme names (e.g., allow "HTTP" as
   well as "http").

      scheme        = alpha *( alpha | digit | "+" | "-" | "." )

   Relative URI references are distinguished from absolute URI in that
   they do not begin with a scheme name.  Instead, the scheme is
   inherited from the base URI, as described in Section 5.2.

So the hardened regex pattern could be ^[a-z][a-z0-9+.-]*(?=:). This would match all the currently-allowed protocols in AMP:

'ftp',
'geo',
'http',
'https',
'mailto',
'maps',
'bip',
'bbmi',
'itms-services',
'fb-me',
'fb-messenger',
'intent',
'line',
'skype',
'sms',
'snapchat',
'tel',
'tg',
'threema',
'twitter',
'viber',
'webcal',
'web+mastodon',
'wh',
'whatsapp',

@kienstra
Copy link
Contributor

kienstra commented Feb 6, 2020

Thanks, Weston! The new regex looks good, in line with actual protocols in class-amp-allowed-tags-generated.php.

@csossi
Copy link

csossi commented Feb 13, 2020

Able to add QA instructions, @kienstra ?

@kienstra
Copy link
Contributor

kienstra commented Feb 13, 2020

Sure, the instructions here should be fine: #4251 (comment)

@csossi
Copy link

csossi commented Feb 13, 2020

Verified in QA

@csossi csossi added the QA passed Has passed QA and is done label Feb 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working QA passed Has passed QA and is done
Projects
None yet
4 participants