Skip to content
This repository was archived by the owner on Apr 12, 2024. It is now read-only.

URL validation does not recognize URLs with no protocol #6634

Closed
jplaut opened this issue Mar 10, 2014 · 19 comments
Closed

URL validation does not recognize URLs with no protocol #6634

jplaut opened this issue Mar 10, 2014 · 19 comments

Comments

@jplaut
Copy link

jplaut commented Mar 10, 2014

http://w is a valid url, but www.google.com is not. This seems to be a bug.

Working fiddle here: http://jsfiddle.net/HB7LU/2390/

A better regex that recognizes both types of urls can be found here: http://stackoverflow.com/questions/833469/regular-expression-for-url

@tbosch tbosch self-assigned this Mar 10, 2014
@tbosch tbosch added this to the 1.3.x milestone Mar 10, 2014
@tbosch tbosch removed their assignment Mar 10, 2014
SekibOmazic added a commit to SekibOmazic/angular.js that referenced this issue Mar 11, 2014
change the url regexp to recognize urls without protocol

closes angular#6634
@auser
Copy link
Contributor

auser commented Mar 11, 2014

In discussion with @IgorMinar:

Does it make sense that the url is valid without a protocol? For instance, currently http://google.com is a valid url, while google.com is not. Similarly, http://localhost is a valid url, while localhost is not a valid url.

I believe that if localhost were a valid URL, then it would follow suit that foo would be a valid URL and the URL validation would be kind of useless.

I think http://w is a valid url as it is in the same form as http://localhost.

@jplaut
Copy link
Author

jplaut commented Mar 12, 2014

I don't think that localhost is a valid URL. localhost with a port number, which is how localhost is used 99% of the time (localhost:3000, for example) would be valid. I also think that google.com and www.google.com should be valid. Here's a regex that matches google.com, www.google.com, http://google.com, localhost:3000, but not just a word like localhost.

^(http|https|ftp)?(://)?(www|ftp)?.?[a-z0-9-]+(.|:)([a-z0-9-]+)+([/?].*)?$

@blaise-io
Copy link
Contributor

I'd prefer TLD's to be required, because in the real world, all websites have a TLD, except for localhost and IP addresses. So maybe adopt the Django URL validator RegExp, which does just that, and is already battle-tested?

@caitp
Copy link
Contributor

caitp commented Mar 17, 2014

No matter what you do, you're always going to find people who have a problem with it. This is why I prefer to go with the bare bones validation provided by the RFC/recommended by the W3C, they can be extended if necessary using pattern validators, but otherwise work very well.

@blaise-io
Copy link
Contributor

There is currently no easy way of extending the patterns provided by AngularJS.
(if you know a way, please answer this question :)

@caitp
Copy link
Contributor

caitp commented Mar 17, 2014

There certainly is, you can use the ng-pattern attribute to extend validation to include your custom requirements (I don't do stackoverflow, though)

@auser
Copy link
Contributor

auser commented Mar 17, 2014

I think having the user use ng-pattern, even if it feels less than ideal. I think we're the safest if we stick by the RFC.

@blaise-io
Copy link
Contributor

@caitp I know that solution, and it's currently the "Angularest" way of solving the problem, but it's not ideal because it means I have to pollute every of my URL inputs with a big ng-pattern. And ng-pattern is additional, it does not replace the default URL validation, so that would result in hacks or big workarounds if I wanted to disabled the default Angular URL validation AND keep my HTML semantic.

@auser I understand Angular wants to stick with the RFC, but it would be grrrreat if it could be extended or configured in an easier way than it is now.

@auser auser self-assigned this Mar 17, 2014
@auser auser modified the milestones: Ice Box, 1.3.0 Mar 17, 2014
@auser
Copy link
Contributor

auser commented Mar 17, 2014

@blaise-io What are you thinking that would look like?

@caitp +1

@blaise-io
Copy link
Contributor

@auser Ideally this is 1) configurable per module, with the defaults being as they are now, and 2) configurable per input directive.

I think a sensible approach is to allow configuration to replace the defaults in *InputType so that I could replace (for example) urlInputType with my own function. Something like

someInjectedObject.setInputType('url', function(scope, element, attr, ctrl, $sniffer, $browser) {
    // My custom input type handling
});

someInjectedObject could be the current formDirectiveFactory, converted to, or wrapped in a provider, configurable in myApp.config() like other configurations. (Alternatively, NgModelController could inject from a configurable provider). This would implement configuration per module (1).

someInjectedObject could also be configured per url directive, using another directive on the input field that requires ngModel, on which setInputType could be called. This would implement configuring a single input directive (2).

@IgorMinar
Copy link
Contributor

why not just create your custom validator and use it as <input type="text" my-url>?

that way you get all the flexibility you need without any verbosity. I think we should stick to RFC because of danger of false positives.

as we make modularization core part of angular (in v2) there will be no difference in effort between using ngUrl vs myUrl.

It's important to realize that angular can't solve everyone's issues by default, but it must be extensible so that all use-cases can be covered if needed.

@blaise-io
Copy link
Contributor

@IgorMinar Custom directives are not ideal and promote bad semantics.

my-url, as opposed to type=url, does not benefit from browsers implementing useful features on top of HTML5 input types. For example, iPhone tries to capitalize the first character in type=text, but not in type=url, and when using type=email, some browsers suggest values from your address book and won't suggest entries from your history that don't match the input type.

But AngularJS already hijacked these types as directives, without offering a way to extend or replace that default behavior.

@caitp
Copy link
Contributor

caitp commented Mar 19, 2014

That's not really true, Angular lets you decorate directives, so you could decorate it to provide a custom handler for the url or email types.

@blaise-io
Copy link
Contributor

Thanks caitp, I'll look into that.

@SekibOmazic
Copy link
Contributor

This should be closed as "won't fix".

@kosso
Copy link

kosso commented Nov 24, 2015

The URL validation appears to still be broken.

It thinks that http:google.com is valid without the //.

Try it on the demo it the bottom of the documentation page.
https://docs.angularjs.org/api/ng/input/input%5Burl%5D

(Running AngularJS v1.4.7 btw)

@gkalpak
Copy link
Member

gkalpak commented Nov 24, 2015

The relevant URL_REGEXP has been recently updated (see ffb6b2f).

Although surely not perfect, it tries to mimic the way browsers (escpecially Chromium) do things.
See #11381 for more context.

That said, I couldn't find anything related to the number of slashes in the spec (with a quick look), but if it is appropriate for Chromium and Mozilla, I guess it is more than appropriate for our needs 😃

If you can provide some source of info according to which `http:google.com` is **not** a valid URL, I'd be happy to look into it.

@zipper01
Copy link

zipper01 commented Mar 31, 2019

Hi, I got url validation error with 'www.xyz.com' and google led me there. I roughly checked the lengthy discussion and cannot get a good answer, but rather the arrogant words "This should be closed as "won't fix", wtf---is this even a technical question? How many of us do you think will type "https://www.google.com" instead of "www.google.com" for browsing? Did you ever heard some one say 'visit us at HTTPS//xxxx' instead of 'visit us at www.xxx' on tv or radio? frastraiting...

@whereisaaron
Copy link

It is an old discussion, with lots of opinions but lacking is rampant RFC-referencing, so here you go:

https://tools.ietf.org/html/rfc3986#section-3

@gkalpak http:google.com is not a URI because the authority component must be prefixed by a //, so http://google.com is a valid URI.

The scheme component of a URI is required, so www.example.com is a valid domain name, but not a URI.

URI-references are either URIs or relative-refs. E.g. you can use relative-refs in the context of a browser where default schemes and authorities are available (i.e. the URI the page was loaded from).

https://tools.ietf.org/html/rfc3986#section-4

E.g. //foo.example.com/bar, it is a network relative reference, the browser will default to the protocol used to load the page. Used for site that support both HTTP and HTTPS (hopefully trending to none at this point :-).

E.g. `/bar' is an absolute path relative-ref, the browser will default to the protocol and authority used to load the page.

However www.example.com and localhost and localhost:3000 are not URI nor a relative-ref, they lacks the prefix/suffix required to identify whether it is a scheme (:), authority (//), path (/), or fragment (#). You can't assume it is an authority. That would be a heuristic behavior you need to build in.

The RFC does allow for this sort of heuristic resolution in the context of human interfaces (like a dialogue box) with 'URI suffix references` which are basically ambiguous relative-refs from which you can infer real relative-refs, and from there real URIs. This is to cover the real-world syntax used by humans.

https://tools.ietf.org/html/rfc3986#section-4.5

URI suffix references would include www.example.com, www.example.com/bar`, localhost and localhost:3000.

So I think the discussion is some people who what isUriSuffixReference() semantics (aka isHumanUri()) and those who think it should be be implementing isUri() semantics (aka isUri()). But they are two different things.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.