Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are hostnames with underscores / ampersands legit? #141

Closed
voxik opened this issue Dec 10, 2024 · 7 comments
Closed

Are hostnames with underscores / ampersands legit? #141

voxik opened this issue Dec 10, 2024 · 7 comments

Comments

@voxik
Copy link

voxik commented Dec 10, 2024

Testing globalid Ruby 3.4, there some issues such as:

Failure:
GlobalIDTest#test_invalid_app_name [test/cases/global_id_test.rb:13]:
ArgumentError expected but nothing was raised.

rails test test/cases/global_id_test.rb:8

F

Digging closer into this test error, I have spotted this difference:

$ ruby -ruri -ruri/version -e 'puts URI::VERSION; URI::Generic.new("gid", nil, "blog_app", nil, nil, "/Model/1", nil, nil, nil, nil, true)'
0.13.1
/usr/share/ruby/uri/generic.rb:601:in `check_host': bad component(expected host component): blog_app (URI::InvalidComponentError)
	from /usr/share/ruby/uri/generic.rb:640:in `host='
	from /usr/share/ruby/uri/generic.rb:673:in `hostname='
	from /usr/share/ruby/uri/generic.rb:190:in `initialize'
	from -e:1:in `new'
	from -e:1:in `<main>'

vs

$ ruby -ruri -ruri/version -e 'puts URI::VERSION; URI::Generic.new("gid", nil, "blog_app", nil, nil, "/Model/1", nil, nil, nil, nil, true)'
1.0.2

As you can see, for the URI 0.13.1, the underscore is not allowed, while with URI 1.0.2 underscore works just fine. Is that expected?

@voxik voxik changed the title Are hostnames with underscores legible? Are hostnames with underscores / ampersands legit? Dec 11, 2024
@voxik
Copy link
Author

voxik commented Dec 11, 2024

Ampersand is the similar case:

$ ruby -ruri -ruri/version -e 'puts URI::VERSION; URI::Generic.new("gid", nil, "blog&app", nil, nil, "/Model/1", nil, nil, nil, nil, true)'

@mtasaka
Copy link

mtasaka commented Dec 13, 2024

git bisect shows the following commit makes this change:

d7dc19a
which is part of:
#107

So URI 0.13.0 uses RFC2396 by default, 1.0.0 and above switched into RFC3986 as default.

Actually the following code behaves the same both on URL 0.13.0 and 1.0.2:

#!/usr/bin/ruby

require "uri"
require "uri/version"

$stdout.sync = true

rfc3986_parser = URI::RFC3986_PARSER
begin
  rfc2396_parser = URI::RFC2396_PARSER
rescue NameError
  rfc2396_parser = URI::DEFAULT_PARSER
end

puts URI::VERSION
puts "#{ URI::Generic.new("gid", nil, "blog_app", nil, nil, "/Model/1", nil, nil, nil, rfc3986_parser, true) }"
puts "#{ URI::Generic.new("gid", nil, "blog_app", nil, nil, "/Model/1", nil, nil, nil, rfc2396_parser, true) }"

With URI 1.0.2:

1.0.2
gid://blog_app/Model/1
/builddir/build/GIT/uri/lib/uri/generic.rb:601:in `check_host': bad component(expected host component): blog_app (URI::InvalidComponentError)
	from /builddir/build/GIT/uri/lib/uri/generic.rb:640:in `host='
	from /builddir/build/GIT/uri/lib/uri/generic.rb:673:in `hostname='
	from /builddir/build/GIT/uri/lib/uri/generic.rb:190:in `initialize'
	from ./uri_undersore.rb:17:in `new'
	from ./uri_undersore.rb:17:in `<main>'

@mtasaka
Copy link

mtasaka commented Dec 13, 2024

Maybe it is better that this line is updated ??

URI is a module providing classes to handle Uniform Resource Identifiers [RFC2396](http://tools.ietf.org/html/rfc2396).

@voxik
Copy link
Author

voxik commented Dec 13, 2024

I have found this one, which seems to confirm the change is intentional.

However, the ampersand is not covered.

@mtasaka
Copy link

mtasaka commented Dec 13, 2024

So I think the actual change is that URI 0.13.0 follows (is expected to follow) RFC2396, while 1.0.2 follow RFC3986 , so for ampersand (or other characters) we have to refer to RFC itself.

And:
https://www.ietf.org/rfc/rfc2396.txt
https://www.ietf.org/rfc/rfc3986.txt

Then (maybe I can misread these) rfc2396 seems to be saying that ampersand is in reg_name and host does not contain regname chars.
On the other hand rfc3986 says ampersand is in sub-delims and hostname uses reg-name, and reg-name contains sub-delims, so ampersand is actually allowed for host in rfc3986, if I am not mistaken.

@voxik
Copy link
Author

voxik commented Dec 13, 2024

So there is this commit itroducing the rfc3986 parser and also flipping the "underscore" test case.

This also works for Ruby 3.4 as well as the older rubies the same:

$ ruby -v -ruri -e "p URI::RFC3986_PARSER.parse('http://a_b:80/')"
ruby 3.4.0dev (2024-12-06 master 3901df708d) +PRISM [x86_64-linux]
#<URI::HTTP http://a_b/>

$ ruby -v -ruri -e "p URI::RFC2396_PARSER.parse('http://a_b:80/')"
ruby 3.4.0dev (2024-12-06 master 3901df708d) +PRISM [x86_64-linux]
/usr/share/ruby/uri/generic.rb:207:in 'URI::Generic#initialize': the scheme http does not accept registry part: a_b:80 (or bad hostname?) (URI::InvalidURIError)
        from /usr/share/ruby/uri/common.rb:155:in 'Class#new'
        from /usr/share/ruby/uri/common.rb:155:in 'URI.for'
        from /usr/share/ruby/uri/rfc2396_parser.rb:210:in 'URI::RFC2396_Parser#parse'
        from -e:1:in '<main>'

$ ruby -v -ruri -e "p URI::RFC3986_PARSER.parse('http://a_b:80/')"
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
#<URI::HTTP http://a_b/>

$ ruby -v -ruri -e "p URI::RFC2396_PARSER.parse('http://a_b:80/')"
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
/usr/share/ruby/uri/generic.rb:207:in `initialize': the scheme http does not accept registry part: a_b:80 (or bad hostname?) (URI::InvalidURIError)
	from /usr/share/ruby/uri/common.rb:134:in `new'
	from /usr/share/ruby/uri/common.rb:134:in `for'
	from /usr/share/ruby/uri/rfc2396_parser.rb:210:in `parse'
	from -e:1:in `<main>'

What changes it this:

$ ruby -v -ruri -e "p URI::Parser"
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
URI::RFC2396_Parser

$ ruby -v -ruri -e "p URI::Parser"
ruby 3.4.0dev (2024-12-06 master 3901df708d) +PRISM [x86_64-linux]
URI::RFC3986_Parser

Therefore I'm going back to globalid and one of the possible solutions for them would be to explicitly keep using the URI::RFC2396_Parser

@voxik voxik closed this as completed Dec 13, 2024
@duerst
Copy link
Member

duerst commented Dec 14, 2024

@voxik wrote:

Then (maybe I can misread these) rfc2396 seems to be saying that ampersand is in reg_name and host does not contain regname chars. On the other hand rfc3986 says ampersand is in sub-delims and hostname uses reg-name, and reg-name contains sub-delims, so ampersand is actually allowed for host in rfc3986, if I am not mistaken.

That's correct. The reason is that URIs (in theory even for schemes such as http(s)) may be used with systems that use something else than the DNS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants