-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax parsing restrictions around host and hostname #1606
Conversation
This approach allows us to match and process any potential input and still meet the contractual requirements of `Request#split_authority` _without_ requiring the input to be well-formed or RFC-compliant. In turn, this eliminates unexpected failures on invalid-but-functional inputs. If we deemed it necessary to do validation (particularly on IP addresses), the `AUTHORITY` regex still provides enough context on which validations should apply, though doing so would require some consideration on what should be returned upon validation failure. If validations can be ruled unnecessary – I suspect this should be the case – the implementation could be replaced by the semantically identical: ``` ruby def split_authority(authority) /\A(?<host>\[\g<addr>\]|(?<addr>.*?))(:(?<port>\d+))?\Z/ =~ authority return host, addr, port&.to_i end ``` This gives up differentiation between IPv6, IPv4, and DNS addresses, but is arguably simpler.
This looks good to me, you took what I did and made it way better. Thanks! |
I’ll backport to 2-2-stable as I consider this a bug fix. |
AUTHORITY = /^ | ||
# The host: | ||
AUTHORITY = / | ||
\A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvande what is the difference between \A
and \Z
vs ^
and $
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\A
matches the beginning of the string. ^
matches any beginning of line in the string. \Z
matches the end of string, allowing possible newline at the end of the string. $
matches the end of any line in the string.
In general, you are much more likely to want \A
and \Z
(or \z
, which matches the end of string) than ^
and $
. Using ^
and $
usually leads to bugs, in my experience.
This approach allows us to match and process any potential input and still meet the contractual requirements of
Request#split_authority
without requiring the input to be well-formed or RFC-compliant. In turn, this eliminates unexpected failures on invalid-but-functional inputs.If we deemed it necessary to do validation (particularly on IP addresses), the
AUTHORITY
regex still provides enough context on which validations should apply, though doing so would require some consideration on what should be returned upon validation failure.If validations can be ruled unnecessary – I suspect this should be the case – the implementation could be replaced by the semantically identical:
This gives up differentiation between IPv6, IPv4, and DNS addresses, but is arguably simpler.
(Resolves #1604)