Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validation for URL field. #306

Closed
miketaylr opened this issue Oct 20, 2014 · 7 comments
Closed

Improve validation for URL field. #306

miketaylr opened this issue Oct 20, 2014 · 7 comments

Comments

@miketaylr
Copy link
Member

See https://bugzilla.mozilla.org/show_bug.cgi?id=1024807#c42. Currently just a single letter will pass domain validation, which isn't very useful.

Possibly we might want to perform actual domain validation.

@karlcow
Copy link
Member

karlcow commented Oct 28, 2014

Let's think about it:

  • There is a difference between domain name, link (href content for example) and URL validation.
  • a doesn't seem to be a domain name but could be a link in a href
  • 愛.com on the other hand is a valid domain name
  • mailto:foobar+nospam@example.org is a valid URL.

Will we block some URIs input with stricter validation rules? When I was cleaning bugzilla for old bugs, I have seen a load of bugs related to desktop apps but having Web UI (ala FilemakerPro). They certainly have URLs but not necessary known.

But maybe we can still make it more useful by guiding the user to add a useful link. Maybe it's a matter of encouraging with an appropriate message based on patterns s/he has input.

Maybe we can check if the entered link calls home aka not 4** or 5** HTTP Responses. If the answer is 4** or 5**, we could say "Are you sure about the link, it doesn't seem to work." Which might introduce issues if the test or the bug report is about a 4** page 💫

@miketaylr
Copy link
Member Author

Will we block some URIs input with stricter validation rules?

I would prefer to be pretty liberal in what we accept. But "a" is probably too liberal. ^_^

I like the idea of doing a quick XHR request looking for non 4XX or 5XX responses. But indeed a report could be "foo.com serving 500 to bar browser".

@miketaylr
Copy link
Member Author

https://mathiasbynens.be/demo/url-regex is interesting, perhaps of use.

@miketaylr
Copy link
Member Author

From that list, Diego's satisfies all the constraints but perhaps it isn't that useful--it is supposed to fail on foo.com, which we want. edit: Actually, looking at the Regex--we can make the protocol part optional.

@miketaylr
Copy link
Member Author

Just exploring this option. Here's the modified regex:

var re_weburl = new RegExp(
  "^" +
    // protocol identifier
    "(?:(?:https?|ftp)://)?" +
    // user:pass authentication
    "(?:\\S+(?::\\S*)?@)?" +
    "(?:" +
      // IP address exclusion
      // private & local networks
      "(?!(?:10|127)(?:\\.\\d{1,3}){3})" +
      "(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" +
      "(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" +
      // IP address dotted notation octets
      // excludes loopback network 0.0.0.0
      // excludes reserved space >= 224.0.0.0
      // excludes network & broacast addresses
      // (first & last IP address of each class)
      "(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" +
      "(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" +
      "(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" +
    "|" +
      // host name
      "(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)" +
      // domain name
      "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*" +
      // TLD identifier
      "(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))" +
    ")" +
    // port number
    "(?::\\d{2,5})?" +
    // resource path
    "(?:/\\S*)?" +
  "$", "i"
);

Some tests (you can paste that into your console and test via re_weburl.test('blah'):

Good results:
re_weburl.test('a') // false
re_weburl.test('a.com') // true
re_weburl.test('愛.com') //true
re_weburl.test('😃.com') //true
re_weburl.test('😃.expert') //true

Less good results?:
re_weburl.test('😃.totallydoesntexist') //true
re_weburl.test('mailto:foobar+nospam@example.org') //true

mailto: would be easy to exclude by tweaking the regex.

One reason I'm leaning towards this versus the XHR test is that it will be slightly more liberal. I know the 5XX or 4XX point that @karlcow mentions is not a very common bug report we get, but it would be very frustrating if you were trying to report just that and the form prevented you from doing so. This regex wouldn't have that problem. But, as is, it could potentially let in invalid URLs.

Just a point of data, I haven't really seen many bogus URLs be reported, apart from the few anonymous spam/test reports.

@miketaylr
Copy link
Member Author

Setting "help-wanted" and "good-first-patch" on this bug. It's fairly low priority (IMO), but would be a good way to get to know some of the code-base.

@miketaylr miketaylr removed this from the Better mobile bug reporting milestone Aug 3, 2015
@miketaylr
Copy link
Member Author

Let's close this and revisit if we find we're getting lots of bad URLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants