-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When crates.io gives 429, cargo should back off and retry later #13530
Comments
People will be more likely to hit this with #1169 (since we'd likely move forward on that without the batch publish on crates.io's side)
As for strategies to deal with this, I'd want input from crates.io to know what fits with their intent of the rate limit. Ideas brought up
Technically, packages can contain multiple crates but only one |
👍 ISTM that batch uploading is nontrivial. Not only is it a substantial protocol change, but it possibly adds coherency demands to the crates.io system, which may be difficult to fulfil in an ACID way. I'm guessing that a backoff and retry strategy is likely to be relatively simple. The only question is whether to apply it only to publish (where we know that we want rate limits low enough that reasonable non-abusive use cases can reach them), or all operations. I think applying it to all operations risks exacerbating operational problems from wayward automation. I don't know if we have non-abusive operations which risk hitting rate limits. (Last week I ran Retrying on 429 only on publish is a conservative choice which would solve the real-world operational problem. |
That critically depends on what the rate limit is intended to accomplish. If the point of the rate limit is to make sure there is a personal connection between crates.io and it's power users, than any automated fix is just circumventing. Similarly if the expensive part of the operation is receiving and processing the publisher request, then a acceptable retry strategy is just automating the DDOS they were trying to avoid. We should talk to the crates.io team before making technical changes. It could be that the best compromise here is that cargo has a retry strategy that is ridiculously slow. For example it gets a 429, and prints out a message saying "you're being rate limited please talk to the registry about acceptable use in the future, but for now we are going retry your request After a one minute delay." this reduces the chance of a user intentionally relying on this behaviour, because it's so painfully slow, but also it does not break the automation that assumed that when "cargo publish" finished the crate was published. |
(This just happened to me again. We have 55 packages now. It was less troublesome this time round because after the discouraging response to #13397 we wrote a python script to publish idempotently,)
This would meet our needs very nicely. Publication of our 55-package workspace takes a fair while in any case. |
The cargo team discussed this today, but didn't have any specific conclusions. Some notes:
|
We discussed it in today’s crates.io weekly meeting. First, I raised a few questions:
During these discussions, crates.io also proposed two potential solutions:
|
probably easiest to ping |
Problem
Our workspace contains 46 cargo packages. (Because cargo insists that each crate must be a separate package, and we want to split up crates for code sanity and compilation time reasons.)
This means that in our recent release, our on-duty release technician hit the rate limit. This aborted publication of the workspace, requiring manual retries and wrangling.
Steps
Have a workspace with more than 30 (the current burst rate limit) crates. Try to publish it by publising each crate, in topo order, with
cargo publish
(using some automated tool).Possible Solution(s)
cargo should handle a 429 response by backing off and retrying, using an exponential backoff algorithm.
In rust-lang/crates.io#1643 the crates.io team report already having raised the rate limit. In the error message from crates.io they suggest emailing
help@
to ask for a rate limit increase. Such a workflow is IMO undesirable, especially as Rust gets more adoption.Notes
I don't think increasing the rate limit (globally, or on request) is the right fix. If 429 is a hard error there is a tension between preventing misuse, and not breaking large projects' releases. But this tension can be abolished by handling 429 gracefully.
#13397 would probably have assisted the recovery from this situation (and also the local disk space problem our releasae technician also ran into).
See also: rust-lang/crates.io#3229 (requesting docs) #6714 (requesting better error message display).
Version
(edited to fix ticket links)
The text was updated successfully, but these errors were encountered: