Smartly join URLs #101

fmang · 2024-04-26T05:17:48Z

When paginating, some registries return an absolute URL in the Link HTTP header. This happened on Amazon ECR, and docker_registry2 generated a bad URI exception when trying to request https://….com:443https://…. The absolute URL was naively appended to the base URL.

URI.join provides a smarter way to concatenate URLs, and behaves pretty much like <a href="…"> would in an HTML document. To preserve the path of the base URL, I forced a trailing slash and made the API paths relative. Otherwise, the semantic of /v2 is to go back to the root.

It looks like the Rubocop configuration in the CI is outdated, causing a failure.

deitch · 2024-04-26T08:36:46Z

Thanks for this.

Some of what is in here I get, others I do not. I will comment/question inline.

deitch · 2024-04-26T08:37:51Z

lib/registry/registry.rb

-        url = parse_link_header(link_header)[:next]
+        link_header = response.headers[:link] or break
+        next_url = parse_link_header(link_header)[:next] or break
+        url = URI.join(response.request.url, next_url) # Interpret the next link relative to the current URL.


This makes sense, based on your description. It would be worth adding a larger comment here, something like, "the next url from the link header could be relative to the current URL, or absolute. URI.join handles both cases cleanly." Or something like that.

Here you go. I have made sure in the unit tests that both cases are checked.

deitch · 2024-04-26T08:43:26Z

lib/registry/registry.rb

-      @base_uri = "#{@uri.scheme}://#{@uri.host}:#{@uri.port}#{@uri.path}"
+      @base_uri = +"#{@uri.scheme}://#{@uri.host}:#{@uri.port}#{@uri.path}"
+      # `URI.join("https://example.com/foo/bar", "v2")` drops `bar` in the base URL. A trailing slash prevents that.
+      @base_uri << '/' unless @base_uri.end_with? '/'


If I understand correctly, the only change here is ensuring it ends in a /. Is this because beforehand we always did @base_uri + path, and so it always treated it as appending?

To be clearer, when an URL does not end with a trailing slash, the last part is assumed to be a file, so joining another path to it will make the path relative to the directory and not the file.

To give a more natural example, /foo/index.html joined with style.css would become /foo/style.css, white foo/index/ joined with style.css becomes /foo/index/style.css.

deitch · 2024-04-26T08:45:31Z

lib/registry/registry.rb

      end
    end

    def search(query = '')
      all_repos = []
-      paginate_doget('/v2/_catalog') do |response|
+      paginate_doget('v2/_catalog') do |response|


Why remove the leading / here (and in all other cases)? Is this because it used to just @base_uri + path so it always added it? And thus if the base was http://foo.com/a/b then this would give http://foo.com/a/b/v2/_catalog. But with URI.join, it will treat the leading / as an absolute path, and so unless you remove the leading /, you would end up with http://foo.com/v2/_catalog instead of http://foo.com/a/b/v2/_catalog?

Exactly. In convential URL semantics, paths starting with a / are absolute.

deitch · 2024-04-26T08:45:51Z

It looks like the Rubocop configuration in the CI is outdated, causing a failure

Do you know how to fix it?

fmang · 2024-04-26T09:03:54Z

It looks like the Rubocop configuration in the CI is outdated, causing a failure

Do you know how to fix it?

Yes, but that would deserve its own pull request. Rubocop tells what to add in the configuration, and then we will have to fix things in the code that Rubocop used to accept but now doesn’t.

deitch · 2024-04-26T09:10:01Z

OK, then it all looks good, except for Rubocop. It has been a while since I was involved with it, I don't remember quite how to fix it.

fmang · 2024-04-26T09:10:31Z

Thank you for the review !

fwininger · 2024-04-26T14:42:30Z

@deitch we have now #102 to fix rubocop on master branch.

deitch · 2024-04-26T14:48:08Z

And that's in. Rebase on that and we can get this in.

fmang · 2024-04-27T02:13:15Z

The .each_key that Rubocop suggested is not supported on older Ruby versions, and that breaks the CI again.

Rubocop needs to be configured not to suggest changes that would not be supported by the target Ruby version (in our case, 2.7 and 3.1): https://docs.rubocop.org/rubocop/configuration.html#setting-the-target-ruby-version. @fwininger Could you please fix that?

When paginating, some registries return an absolute URL in the Link HTTP header. This happened on Amazon ECR, and docker_registry2 generated a bad URI exception when trying to request `https://….com:443https://…`. The absolute URL was naively appended to the base URL. URI.join provides a smarter way to concatenate URLs, and behaves pretty much like `<a href="…">` would in an HTML document. To preserve the path of the base URL, I forced a trailing slash and made the API paths relative. Otherwise, the semantic of `/v2` is to go back to the root.

This is the oldest Ruby version the CI is configured to test.

fmang · 2024-04-29T03:16:20Z

The CI is now all green, except a “Build (2.7)” which never seems to start. I suspect this is a vestige somewhere in the CI’s configuration. The code base does not mention 2.7 anywhere anymore.

deitch · 2024-04-30T11:58:34Z

The code base does not mention 2.7 anywhere anymore.

Yeah, that is because you removed 2.7 from the build matrix for testing. I removed it as a required test, but that only applies to future PRs. I can override it for this one.

deitch · 2024-04-30T11:59:10Z

And in! Thank you @fmang

fwininger · 2024-05-13T07:51:08Z

@deitch can you release a minor version of the gem with this fix ?

thanks

deitch · 2024-05-13T11:00:56Z

Pushed out a patch v1.18.1, should be out shortly.

fwininger · 2024-05-13T12:25:28Z

Thanks :)

fmang force-pushed the url_join branch from 9116f8e to 46ba37d Compare April 26, 2024 05:22

deitch reviewed Apr 26, 2024

View reviewed changes

fmang force-pushed the url_join branch from 46ba37d to b7f6542 Compare April 26, 2024 09:08

fmang marked this pull request as ready for review April 26, 2024 09:10

fwininger mentioned this pull request Apr 26, 2024

Fix rubocop on master #102

Merged

fmang force-pushed the url_join branch from b7f6542 to 5061335 Compare April 27, 2024 02:07

fmang force-pushed the url_join branch from 5061335 to 846e8b3 Compare April 29, 2024 02:34

Set Rubocop’s target Ruby version to 3.0

4faefe3

This is the oldest Ruby version the CI is configured to test.

fmang force-pushed the url_join branch from 494cc52 to 4faefe3 Compare April 29, 2024 03:06

deitch approved these changes Apr 30, 2024

View reviewed changes

deitch merged commit 4d4b968 into deitch:master Apr 30, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smartly join URLs #101

Smartly join URLs #101

fmang commented Apr 26, 2024 •

edited

Loading

deitch commented Apr 26, 2024

deitch Apr 26, 2024

fmang Apr 26, 2024

deitch Apr 26, 2024

fmang Apr 26, 2024

fmang Apr 26, 2024

deitch Apr 26, 2024

fmang Apr 26, 2024

deitch commented Apr 26, 2024

fmang commented Apr 26, 2024

deitch commented Apr 26, 2024

fmang commented Apr 26, 2024

fwininger commented Apr 26, 2024

deitch commented Apr 26, 2024

fmang commented Apr 27, 2024

fmang commented Apr 29, 2024

deitch commented Apr 30, 2024

deitch commented Apr 30, 2024

fwininger commented May 13, 2024

deitch commented May 13, 2024

fwininger commented May 13, 2024

Smartly join URLs #101

Smartly join URLs #101

Conversation

fmang commented Apr 26, 2024 • edited Loading

deitch commented Apr 26, 2024

deitch Apr 26, 2024

Choose a reason for hiding this comment

fmang Apr 26, 2024

Choose a reason for hiding this comment

deitch Apr 26, 2024

Choose a reason for hiding this comment

fmang Apr 26, 2024

Choose a reason for hiding this comment

fmang Apr 26, 2024

Choose a reason for hiding this comment

deitch Apr 26, 2024

Choose a reason for hiding this comment

fmang Apr 26, 2024

Choose a reason for hiding this comment

deitch commented Apr 26, 2024

fmang commented Apr 26, 2024

deitch commented Apr 26, 2024

fmang commented Apr 26, 2024

fwininger commented Apr 26, 2024

deitch commented Apr 26, 2024

fmang commented Apr 27, 2024

fmang commented Apr 29, 2024

deitch commented Apr 30, 2024

deitch commented Apr 30, 2024

fwininger commented May 13, 2024

deitch commented May 13, 2024

fwininger commented May 13, 2024

fmang commented Apr 26, 2024 •

edited

Loading