-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
valid URL with hashtag in fragment do not pass validation #403
Comments
Seem like a bad query field. after |
The issue is that there is a hashtag in the fragment, which, according to RFC3986 isn't valid. But the fragment isn't actually sent to the web server, so it's up to the browser to handle it. I would add another flag "validate_fragment" or "strict_fragment" to handle it, WDYT? |
RFC3986[3.5]. Fragment
The fragment identifier component of a URI allows indirect
identification of a secondary resource by reference to a primary
resource and additional identifying information. The identified
secondary resource may be some portion or subset of the primary
resource, some view on representations of the primary resource, or
some other resource defined or described by those representations. A
fragment identifier component is indicated by the presence of a
number sign ("#") character and terminated by the end of the URI.
fragment = *( pchar / "/" / "?" )
The semantics of a fragment identifier are defined by the set of
representations that might result from a retrieval action on the
primary resource. The fragment's format and resolution is therefore
dependent on the media type [RFC2046] of a potentially retrieved
representation, even though such a retrieval is only performed if the
URI is dereferenced. If no such representation exists, then the
semantics of the fragment are considered unknown and are effectively
unconstrained. Fragment identifier semantics are independent of the
URI scheme and thus cannot be redefined by scheme specifications.
Individual media types may define their own restrictions on or
structures within the fragment identifier syntax for specifying
different types of subsets, views, or external references that are
identifiable as secondary resources by that media type. If the
primary resource has multiple representations, as is often the case
for resources whose representation is selected based on attributes of
the retrieval request (a.k.a., content negotiation), then whatever is
identified by the fragment should be consistent across all of those
representations. Each representation should either define the
fragment so that it corresponds to the same secondary resource,
regardless of how it is represented, or should leave the fragment
undefined (i.e., not found).
As with any URI, use of a fragment identifier component does not
imply that a retrieval action will take place. A URI with a fragment
identifier may be used to refer to the secondary resource without any
implication that the primary resource is accessible or will ever be
accessed.
Fragment identifiers have a special role in information retrieval
systems as the primary form of client-side indirect referencing,
allowing an author to specifically identify aspects of an existing
resource that are only indirectly provided by the resource owner. As
such, the fragment identifier is not used in the scheme-specific
processing of a URI; instead, the fragment identifier is separated
from the rest of the URI prior to a dereference, and thus the
identifying information within the fragment itself is dereferenced
solely by the user agent, regardless of the URI scheme. Although
this separate handling is often perceived to be a loss of
information, particularly for accurate redirection of references as
resources move over time, it also serves to prevent information
providers from denying reference authors the right to refer to
information within a resource selectively. Indirect referencing also
provides additional flexibility and extensibility to systems that use
URIs, as new media types are easier to define and deploy than new
schemes of identification.
The characters slash ("/") and question mark ("?") are allowed to
represent data within the fragment identifier. Beware that some
older, erroneous implementations may not handle this data correctly
when it is used as the base URI for relative references [Section]
[5.1].
It does not explicitly state that |
Well actually, https://stackoverflow.com/questions/26088849/url-fragment-allowed-characters/26119120#26119120 explains how
|
It makes sense, but it means we are deviating from the standard. |
A similar idea of modifiable validation was raised here: #396 (comment). I think it defeats the purpose of this library.
True, but aren't de-facto standards compelling in a practical sense? |
I think the library is done more than just regex when working with URLs - it parse the url correctly and do various checks. |
Hi,
The following URL is not passing validation:
It seems that the code below is failing due to the presence of a hashtag in the fragment:
Thanks!
The text was updated successfully, but these errors were encountered: