Improve validation for URL field. #306

miketaylr · 2014-10-20T15:33:50Z

See https://bugzilla.mozilla.org/show_bug.cgi?id=1024807#c42. Currently just a single letter will pass domain validation, which isn't very useful.

Possibly we might want to perform actual domain validation.

karlcow · 2014-10-28T07:33:34Z

Let's think about it:

There is a difference between domain name, link (href content for example) and URL validation.
a doesn't seem to be a domain name but could be a link in a href
愛.com on the other hand is a valid domain name
mailto:foobar+nospam@example.org is a valid URL.

Will we block some URIs input with stricter validation rules? When I was cleaning bugzilla for old bugs, I have seen a load of bugs related to desktop apps but having Web UI (ala FilemakerPro). They certainly have URLs but not necessary known.

But maybe we can still make it more useful by guiding the user to add a useful link. Maybe it's a matter of encouraging with an appropriate message based on patterns s/he has input.

Maybe we can check if the entered link calls home aka not 4** or 5** HTTP Responses. If the answer is 4** or 5**, we could say "Are you sure about the link, it doesn't seem to work." Which might introduce issues if the test or the bug report is about a 4** page 💫

miketaylr · 2014-10-28T15:13:58Z

Will we block some URIs input with stricter validation rules?

I would prefer to be pretty liberal in what we accept. But "a" is probably too liberal. ^_^

I like the idea of doing a quick XHR request looking for non 4XX or 5XX responses. But indeed a report could be "foo.com serving 500 to bar browser".

miketaylr · 2015-02-06T18:31:34Z

https://mathiasbynens.be/demo/url-regex is interesting, perhaps of use.

miketaylr · 2015-02-06T18:37:07Z

From that list, Diego's satisfies all the constraints but perhaps it isn't that useful--it is supposed to fail on foo.com, which we want. edit: Actually, looking at the Regex--we can make the protocol part optional.

miketaylr · 2015-02-06T18:51:33Z

Just exploring this option. Here's the modified regex:

var re_weburl = new RegExp(
  "^" +
    // protocol identifier
    "(?:(?:https?|ftp)://)?" +
    // user:pass authentication
    "(?:\\S+(?::\\S*)?@)?" +
    "(?:" +
      // IP address exclusion
      // private & local networks
      "(?!(?:10|127)(?:\\.\\d{1,3}){3})" +
      "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" +
      "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" +
      // IP address dotted notation octets
      // excludes loopback network 0.0.0.0
      // excludes reserved space >= 224.0.0.0
      // excludes network & broacast addresses
      // (first & last IP address of each class)
      "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" +
      "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" +
      "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" +
    "|" +
      // host name
      "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)" +
      // domain name
      "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*" +
      // TLD identifier
      "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))" +
    ")" +
    // port number
    "(?::\\d{2,5})?" +
    // resource path
    "(?:/\\S*)?" +
  "$", "i"
);

Some tests (you can paste that into your console and test via re_weburl.test('blah'):

Good results:
re_weburl.test('a') // false
re_weburl.test('a.com') // true
re_weburl.test('愛.com') //true
re_weburl.test('😃.com') //true
re_weburl.test('😃.expert') //true

Less good results?:
re_weburl.test('😃.totallydoesntexist') //true
re_weburl.test('mailto:foobar+nospam@example.org') //true

mailto: would be easy to exclude by tweaking the regex.

One reason I'm leaning towards this versus the XHR test is that it will be slightly more liberal. I know the 5XX or 4XX point that @karlcow mentions is not a very common bug report we get, but it would be very frustrating if you were trying to report just that and the form prevented you from doing so. This regex wouldn't have that problem. But, as is, it could potentially let in invalid URLs.

Just a point of data, I haven't really seen many bogus URLs be reported, apart from the few anonymous spam/test reports.

miketaylr · 2015-08-03T19:49:03Z

Setting "help-wanted" and "good-first-patch" on this bug. It's fairly low priority (IMO), but would be a good way to get to know some of the code-base.

miketaylr · 2015-09-30T17:00:44Z

Let's close this and revisit if we find we're getting lots of bad URLs.

miketaylr added type: site-development type: bug labels Oct 20, 2014

miketaylr mentioned this issue Nov 15, 2014

Fixes #391 - trim wysiwyg:// from URL field if it's there. #392

Merged

miketaylr added this to the Better mobile bug reporting milestone Dec 3, 2014

miketaylr added help-wanted prio: good first bug labels Aug 3, 2015

miketaylr removed this from the Better mobile bug reporting milestone Aug 3, 2015

miketaylr closed this as completed Sep 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve validation for URL field. #306

Improve validation for URL field. #306

miketaylr commented Oct 20, 2014

karlcow commented Oct 28, 2014

miketaylr commented Oct 28, 2014

miketaylr commented Feb 6, 2015

miketaylr commented Feb 6, 2015

miketaylr commented Feb 6, 2015

miketaylr commented Aug 3, 2015

miketaylr commented Sep 30, 2015

Improve validation for URL field. #306

Improve validation for URL field. #306

Comments

miketaylr commented Oct 20, 2014

karlcow commented Oct 28, 2014

miketaylr commented Oct 28, 2014

miketaylr commented Feb 6, 2015

miketaylr commented Feb 6, 2015

miketaylr commented Feb 6, 2015

miketaylr commented Aug 3, 2015

miketaylr commented Sep 30, 2015