-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Smartly join URLs #101
Smartly join URLs #101
Conversation
Thanks for this. Some of what is in here I get, others I do not. I will comment/question inline. |
lib/registry/registry.rb
Outdated
url = parse_link_header(link_header)[:next] | ||
link_header = response.headers[:link] or break | ||
next_url = parse_link_header(link_header)[:next] or break | ||
url = URI.join(response.request.url, next_url) # Interpret the next link relative to the current URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, based on your description. It would be worth adding a larger comment here, something like, "the next
url from the link header could be relative to the current URL, or absolute. URI.join handles both cases cleanly." Or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go. I have made sure in the unit tests that both cases are checked.
@base_uri = "#{@uri.scheme}://#{@uri.host}:#{@uri.port}#{@uri.path}" | ||
@base_uri = +"#{@uri.scheme}://#{@uri.host}:#{@uri.port}#{@uri.path}" | ||
# `URI.join("https://example.com/foo/bar", "v2")` drops `bar` in the base URL. A trailing slash prevents that. | ||
@base_uri << '/' unless @base_uri.end_with? '/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the only change here is ensuring it ends in a /
. Is this because beforehand we always did @base_uri + path
, and so it always treated it as appending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clearer, when an URL does not end with a trailing slash, the last part is assumed to be a file, so joining another path to it will make the path relative to the directory and not the file.
To give a more natural example, /foo/index.html joined with style.css would become /foo/style.css, white foo/index/ joined with style.css becomes /foo/index/style.css.
end | ||
end | ||
|
||
def search(query = '') | ||
all_repos = [] | ||
paginate_doget('/v2/_catalog') do |response| | ||
paginate_doget('v2/_catalog') do |response| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the leading /
here (and in all other cases)? Is this because it used to just @base_uri + path
so it always added it? And thus if the base was http://foo.com/a/b
then this would give http://foo.com/a/b/v2/_catalog
. But with URI.join
, it will treat the leading /
as an absolute path, and so unless you remove the leading /
, you would end up with http://foo.com/v2/_catalog
instead of http://foo.com/a/b/v2/_catalog
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. In convential URL semantics, paths starting with a /
are absolute.
Do you know how to fix it? |
Yes, but that would deserve its own pull request. Rubocop tells what to add in the configuration, and then we will have to fix things in the code that Rubocop used to accept but now doesn’t. |
OK, then it all looks good, except for Rubocop. It has been a while since I was involved with it, I don't remember quite how to fix it. |
Thank you for the review ! |
And that's in. Rebase on that and we can get this in. |
The .each_key that Rubocop suggested is not supported on older Ruby versions, and that breaks the CI again. Rubocop needs to be configured not to suggest changes that would not be supported by the target Ruby version (in our case, 2.7 and 3.1): https://docs.rubocop.org/rubocop/configuration.html#setting-the-target-ruby-version. @fwininger Could you please fix that? |
When paginating, some registries return an absolute URL in the Link HTTP header. This happened on Amazon ECR, and docker_registry2 generated a bad URI exception when trying to request `https://….com:443https://…`. The absolute URL was naively appended to the base URL. URI.join provides a smarter way to concatenate URLs, and behaves pretty much like `<a href="…">` would in an HTML document. To preserve the path of the base URL, I forced a trailing slash and made the API paths relative. Otherwise, the semantic of `/v2` is to go back to the root.
This is the oldest Ruby version the CI is configured to test.
The CI is now all green, except a “Build (2.7)” which never seems to start. I suspect this is a vestige somewhere in the CI’s configuration. The code base does not mention 2.7 anywhere anymore. |
Yeah, that is because you removed 2.7 from the build matrix for testing. I removed it as a required test, but that only applies to future PRs. I can override it for this one. |
And in! Thank you @fmang |
@deitch can you release a minor version of the gem with this fix ? thanks |
Pushed out a patch v1.18.1, should be out shortly. |
Thanks :) |
When paginating, some registries return an absolute URL in the Link HTTP header. This happened on Amazon ECR, and docker_registry2 generated a bad URI exception when trying to request
https://….com:443https://…
. The absolute URL was naively appended to the base URL.URI.join provides a smarter way to concatenate URLs, and behaves pretty much like
<a href="…">
would in an HTML document. To preserve the path of the base URL, I forced a trailing slash and made the API paths relative. Otherwise, the semantic of/v2
is to go back to the root.It looks like the Rubocop configuration in the CI is outdated, causing a failure.