Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax parsing restrictions around host and hostname #1606

Merged
merged 2 commits into from
Feb 16, 2020

Conversation

pvande
Copy link
Contributor

@pvande pvande commented Feb 16, 2020

This approach allows us to match and process any potential input and still meet the contractual requirements of Request#split_authority without requiring the input to be well-formed or RFC-compliant. In turn, this eliminates unexpected failures on invalid-but-functional inputs.

If we deemed it necessary to do validation (particularly on IP addresses), the AUTHORITY regex still provides enough context on which validations should apply, though doing so would require some consideration on what should be returned upon validation failure.

If validations can be ruled unnecessary – I suspect this should be the case – the implementation could be replaced by the semantically identical:

def split_authority(authority)
  /\A(?<host>\[\g<addr>\]|(?<addr>.*?))(:(?<port>\d+))?\Z/ =~ authority
  return host, addr, port&.to_i
end

This gives up differentiation between IPv6, IPv4, and DNS addresses, but is arguably simpler.

(Resolves #1604)

This approach allows us to match and process any potential input and still meet the contractual requirements of `Request#split_authority` _without_ requiring the input to be well-formed or RFC-compliant.  In turn, this eliminates unexpected failures on invalid-but-functional inputs.

If we deemed it necessary to do validation (particularly on IP addresses), the `AUTHORITY` regex still provides enough context on which validations should apply, though doing so would require some consideration on what should be returned upon validation failure.

If validations can be ruled unnecessary – I suspect this should be the case – the implementation could be replaced by the semantically identical:

``` ruby
def split_authority(authority)
  /\A(?<host>\[\g<addr>\]|(?<addr>.*?))(:(?<port>\d+))?\Z/ =~ authority
  return host, addr, port&.to_i
end
```

This gives up differentiation between IPv6, IPv4, and DNS addresses, but is arguably simpler.
@pvande pvande requested a review from ioquatix February 16, 2020 08:23
@ioquatix
Copy link
Member

This looks good to me, you took what I did and made it way better. Thanks!

@ioquatix ioquatix merged commit 766761b into rack:master Feb 16, 2020
@ioquatix
Copy link
Member

I’ll backport to 2-2-stable as I consider this a bug fix.

@ioquatix ioquatix added this to the v2.2.3 milestone Feb 16, 2020
AUTHORITY = /^
# The host:
AUTHORITY = /
\A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvande what is the difference between \A and \Z vs ^ and $?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\A matches the beginning of the string. ^ matches any beginning of line in the string. \Z matches the end of string, allowing possible newline at the end of the string. $ matches the end of any line in the string.

In general, you are much more likely to want \A and \Z (or \z, which matches the end of string) than ^ and $. Using ^ and $ usually leads to bugs, in my experience.

@pvande pvande deleted the relax-hostname-validations branch February 16, 2020 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rack::Request#host returns truncated data for domain names containing underscores, other characters
3 participants