Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

Linky filter breaks with certain input #8945

Closed
kevinrenskers opened this issue Sep 5, 2014 · 17 comments
Closed

Linky filter breaks with certain input #8945

kevinrenskers opened this issue Sep 5, 2014 · 17 comments

Comments

@kevinrenskers
Copy link

For example, when the input is DLog("%@",url) then you'll get this error: [$sanitize:badparse] The sanitizer was unable to parse the following block of html: <a href="mailto:%@",url">%@&#34;,url</a>

Example here: http://plnkr.co/edit/AnEumWvCjXRQ4GROh578?p=preview

@kevinrenskers
Copy link
Author

Problem exists in 1.2.21 and 1.3.0-rc.0.

@lgalfaso
Copy link
Contributor

lgalfaso commented Sep 5, 2014

It looks like linky is not handling the double quote character as expected

lgalfaso added a commit to lgalfaso/angular.js that referenced this issue Sep 5, 2014
Handle the double quote character as part of the domain
in email addresses

Closes angular#8945
@lgalfaso lgalfaso removed their assignment Sep 5, 2014
@kevinrenskers
Copy link
Author

That commit doesn't really solve the problem, it will still make this into a link:

DLog("<a target="_blank" href="mailto:%@&quot;,url">%@",url</a>)

It should simply not even see this as a valid email address. A double quote shouldn't be part of an email address right? Shouldn't it simply be removed from the LINKY_URL_REGEXP?

@kevinrenskers
Copy link
Author

Hm the Syntax part of http://en.wikipedia.org/wiki/Email_address just told me that a double quote can indeed be part of the local part of an email address. It seems that the LINKY_URL_REGEXP is much too simple, and needs to be split into 2 different regexes, one for links and one for email addresses. Clearly DLog("%@",url) is not an url.

@kevinrenskers
Copy link
Author

Another issue: "http://www.apple.com/". If the link is contained within double quotes you'll get this error:

[$sanitize:badparse] The sanitizer was unable to parse the following block of html: <a href="http://www.apple.com/"">http://www.apple.com/&#34;</a>

@caitp
Copy link
Contributor

caitp commented Sep 9, 2014

@kevinrenskers have you tried since 1.3.0-beta.19? I believe that should be fixed by a9d2271

@kevinrenskers
Copy link
Author

Ah, the link surrounded by quotes in indeed solved in rc-0, but DLog("%@",url) problem is not.

@caitp
Copy link
Contributor

caitp commented Sep 9, 2014

I'm not sure what DLog is --- but I agree with you that the regexp is very simple and not really adequate for some cases. However, I think it's probably good enough for most cases, and it would be much more complicated to allow in an html context (how do we decide whether it's supposed to be "<a href="mailto:address@host.com"></a>" or <a href="mailto:\"address@host.com\""></a>?)

@kevinrenskers
Copy link
Author

I'm using linky in a chat application and DLog("%@",url) is simply one of the messages that broke it. Yes, the simple regex is good enough for most cases, but the real problem is those other cases. Once you hit that JavaScript error the entire JS app is broken, and someone has to go into the database to change the message, or it'll be broken for all users of that chat.

At the very least have a try/catch that returns an empty string or something? And a better regex that catches only valid links and emailaddresses would be most welcome. Of course it's supposed to be "<a href="mailto:address@host.com"></a>" because "address@host.com" is not a valid email address.

@kevinrenskers
Copy link
Author

I would say that for our chat application the severity is critical, since the entire chat is broken is someone types something that linky sees as a link but then the sanitizer breaks on.

@caitp
Copy link
Contributor

caitp commented Sep 9, 2014

yes, it's not ideal --- are you using linky on its own without the rest of sanitize?

@kevinrenskers
Copy link
Author

With the rest.

@caitp
Copy link
Contributor

caitp commented Sep 9, 2014

what I mean is basically, are you rendering the output of the page with ngBindHtml + $sanitize, or just filtering the output through linky

@kevinrenskers
Copy link
Author

ng-bind-html + linky (plus some other for using emoticons and stuff).

@lgalfaso
Copy link
Contributor

lgalfaso commented Sep 9, 2014

@caitp #8964 only handles the fact that whatever there is a double quote and this will be rendered within an attribute, then it is escaped. For sure that there are other ways to make it better and add some extra smartness on what is linkified as an email or url, but that is another issue

@lorenzhs
Copy link

@kevinrenskers I'm pretty much in the same boat as you are (IRC web frontend). However, the case of quote-enlosed URLs is not entirely fixed -- it still breaks for URLs enclosed in [" and "], for which I filed #9256

@kevinrenskers
Copy link
Author

I now have to remember to put the fixes back in after updating AngularJS and the Sanitize module. This is the current version I'm using:

angular.module('ngSanitize').filter('linky', ['$sanitize', function($sanitize) {
  var LINKY_URL_REGEXP =
        /((ftp|https?):\/\/|(mailto:)?[A-Za-z0-9._+-]+@)\S*[^\s.;,(){}<>"]/,
      MAILTO_REGEXP = /^mailto:/;

  return function(text, target) {
    if (!text) return text;
    var match;
    var raw = text;
    var html = [];
    var url;
    var i;
    while ((match = raw.match(LINKY_URL_REGEXP))) {
      // We can not end in these as they are sometimes found at the end of the sentence
      url = match[0];
      // if we did not match ftp/http/mailto then assume mailto
      if (match[2] == match[3]) url = 'mailto:' + url;
      i = match.index;
      addText(raw.substr(0, i));
      addLink(url, match[0].replace(MAILTO_REGEXP, ''));
      raw = raw.substring(i + match[0].length);
    }
    addText(raw);
    return $sanitize(html.join(''));

    function addText(text) {
      if (!text) {
        return;
      }
      html.push(sanitizeText(text));
    }

    function addLink(url, text) {
      html.push('<a ');
      if (angular.isDefined(target)) {
        html.push('target="');
        html.push(target);
        html.push('" ');
      }
      html.push('href="');
      html.push(url.replace('"', '&quot;'));
      html.push('">');
      addText(text);
      html.push('</a>');
    }
  };
}]);

lgalfaso added a commit that referenced this issue Nov 23, 2014
Email addresses can (under certain restrictions) include double quote
characters. See http://tools.ietf.org/html/rfc3696#section-3.

For example, `"Jo Bloggs"@abc.com` is a valid email address.

When serializing emails to the `href` attribute of an anchor element,
we must HTML encode these double quote characters. See
http://www.w3.org/TR/html-markup/syntax.html#syntax-attr-double-quoted

This commit does not attempt to improve the functionality (i.e. regex)
that attempts to identify email addresses in a general string.

Closes #8945
Closes #8964
Closes #5946
Closes #10090
Closes #9256
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants