Skip to content

Remove "non-canonical downloads" feature #7341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Turbo87 opened this issue Oct 23, 2023 · 8 comments
Closed

Remove "non-canonical downloads" feature #7341

Turbo87 opened this issue Oct 23, 2023 · 8 comments
Labels
A-backend ⚙️ C-tracking-issue Category: A tracking issue for an RFC, an unstable feature, or an issue made of many parts

Comments

@Turbo87
Copy link
Member

Turbo87 commented Oct 23, 2023

tl;dr

  • we want to improve the reliability and performance of crate downloads
  • "non-canonical downloads" are blocking these plans
  • cargo users are unaffected and only very few custom scripts are currently relying on this

What are "non-canonical downloads"?

The "non-canonical downloads" feature allows people to download the serde_derive crate from https://crates.io/api/v1/crates/serde_derive/1.0.189/download, but also from https://crates.io/api/v1/crates/serde-derive/1.0.189/download, where the underscore was replaced with a hyphen. The same also works vice versa, if the crate name uses hyphens and the download URL uses underscores instead, and it even works with any other combinations.

Why remove it?

Such non-canonical download requests require our backend to perform a database lookup to figure out the canonical crate name. The canonical crate name is then used to construct a download URL and the client is HTTP-redirected to the URL.

While we are using caching to address some of the performance concerns, having all download requests go through our backend servers has started to become problematic and at the current rate of growth will not become any easier in the future.

Having to support "non-canonical downloads" however prevents us from using CDNs for all of the download requests, so if we can remove support for these requests, we can significantly improve the performance and reliability of crate downloads.

Who is using "non-canonical downloads"?

cargo always uses the canonical crate name to construct such download URLs, so if support was removed for this on the crates.io side then cargo should still work exactly the same as before.

Looking at the crates.io request logs, the following user-agents are currently relying on non-canonical downloads working:

  • cargo-binstall/1.1.2
  • Faraday v0.17.6
  • Go-http-client/2.0
  • GNU Guile
  • python-requests/2.31.0

Three of these are just generic HTTP client libraries. GNU Guile is apparently a programming language, so most likely this is also a generic user-agent from a custom user program.

cargo-binstall refers to https://github.com/cargo-bins/cargo-binstall. From the low number of non-canonical download requests it is unclear at this point how and why they might be affected, but we will let the maintainers know about this issue and our plans.

What is the plan?

  1. Announce the removal of support for non-canonical downloads on the main Rust blog.
  2. Wait one month.
  3. Disable support for non-canonical downloads and return a migration error message instead.
  4. Wait one month.
  5. Return a regular 404 error instead of the migration error message, allowing us to get rid of (parts of) the database query.

Note that we will still need the database query for download counting purposes for now. We have plans to remove this requirement as well, but those efforts are blocked by us needing to support non-canonical downloads.

@Turbo87
Copy link
Member Author

Turbo87 commented Oct 24, 2023

@Turbo87 Turbo87 added the C-tracking-issue Category: A tracking issue for an RFC, an unstable feature, or an issue made of many parts label Oct 24, 2023
@8573
Copy link

8573 commented Oct 30, 2023

GNU Guile is apparently a programming language, so most likely this is also a generic user-agent from a custom user program.

I would guess that program is the GNU Guix package manager, although I am not familiar with how it manages Cargo-based packages.

@Turbo87
Copy link
Member Author

Turbo87 commented Oct 30, 2023

@8573 thanks, you're actually not the first one to suggest this (see rust-lang/blog.rust-lang.org#1156 (comment)). I've just sent an email to the Guix maintainers to make them aware of the blog post.

@8573
Copy link

8573 commented Oct 30, 2023

Ah, I had searched the Zulip stream but had not seen rust-lang/blog.rust-lang.org#1156.

@raykrueger
Copy link

raykrueger commented Nov 3, 2023

Hello friends, I am not really a rust dev, but I stay up to date. I was just reading about this issue and had a thought. I don’t know what CDN we’re looking at but many support request processing of some kind. AWS (my employer) Cloudfront supports Lambda@Edge for example. Which essentially just runs like request middleware as part of the CDN edge request, before anything would go to the origin.

A solution like this could be used to normalize/canonicalize the url before it any file or origin request. Uploads may be a bit more work. Happy to talk this through with someone.

edit: forgot the url. https://aws.amazon.com/lambda/edge/

@Turbo87
Copy link
Member Author

Turbo87 commented Nov 4, 2023

@raykrueger yeah, it's a possibility we discussed, but it would lock us into CDNs with such functionality and the corresponding additional architectural complexity. we felt that the trade-off is not worth it since cargo and other well-behaving clients don't need this functionality and we're running on multiple CDNs in parallel at the moment, so we would have to build this into all of them.

@raykrueger
Copy link

Understood. I was going to say getting rid of the non-canonical support is probably better in the long term.

@Turbo87
Copy link
Member Author

Turbo87 commented Feb 12, 2024

#7751 removed the feature flag introduced in #7549. In other words: this issue has been completed :)

@Turbo87 Turbo87 closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-tracking-issue Category: A tracking issue for an RFC, an unstable feature, or an issue made of many parts
Projects
None yet
Development

No branches or pull requests

3 participants