Rate limit the crate publish endpoint #1596

sgrif · 2019-01-16T21:11:56Z

This adds a restriction to the number of crates that can be published
from a single IP address in a single time period. This is done per IP
instead of per-user, since that's what we can do with nginx alone. We
may want to do additional rate limiting per token to force spammers to
register additional accounts, and not just change their IP.

This will use what's called the "leaky bucket" strategy for rate
limiting. Basically every user has a bucket of tokens they use to
publish crates. They start with 10 tokens available. Every time they hit
this endpoint, they use one of the tokens. We give them a new token each
minute.

What this means is that you can upload 1 crate per minute, but we allow
you to upload up to 10 in a short period of time before we start
enforcing the rate limit.

When someone does hit the rate limit, they will receive a 429 response.
We could also allow it to instead just slow down the requests, refusing
to process them until a token is available (queueing a max of 10 requests).

This reserves 10 megabyte of memory for the IP table, which means we can
hold about 80000 IPs. When the table is full, it will attempt to drop
the oldest record, and if that doesn't give enough space, it'll give a
503. Keep in mind this is all in memory, not shared between our servers.
This means that it is possible (but not guaranteed) that someone can
upload 20 crates, and then send 2 requests per minute.

On Heroku and any system that involves proxies, the remote_addr of the
request becomes useless, as it will be the proxy forwarding your request
rather than the client itself. Most proxies will set or append an
X-Forwarded-For header, putting whatever remote_addr it received at
the end of the the header. So if there are 3 proxies, the header will
look like

real_ip, proxy1, proxy2

However, if we're not careful this makes us vulnerable to IP spoofing. A
lot of implementations just look at the first value in the header. All
the proxies just append the header though (they don't know if they're
getting traffic from the origin or a proxy), so you end up with

spoofed_ip, real_ip, proxy1, proxy2

nginx, Rails, and many other implementations avoid this by requiring an
allowlist of trusted IPs. With this configuration, the realip only kicks
in if remote_addr is a trusted proxy, and then it will replace it with
the last non-trusted IP from the header.

This adds a restriction to the number of crates that can be published from a single IP address in a single time period. This is done per IP instead of per-user, since that's what we can do with nginx alone. We may want to do additional rate limiting per token to force spammers to register additional accounts, and not just change their IP. This will use what's called the "leaky bucket" strategy for rate limiting. Basically every user has a bucket of tokens they use to publish crates. They start with 10 tokens available. Every time they hit this endpoint, they use one of the tokens. We give them a new token each minute. What this means is that you can upload 1 crate per minute, but we allow you to upload up to 10 in a short period of time before we start enforcing the rate limit. When someone does hit the rate limit, they will receive a 429 response (which is TOO MANY REQUESTS but since it's a newer status code I'm not sure if nginx actually knows that's the text description). We could also allow it to instead just slow down the requests, refusing to process them until a token is available (queueing a max of 10 requests). This reserves 10 megabyte of memory for the IP table, which means we can hold about 80000 IPs. When the table is full, it will attempt to drop the oldest record, and if that doesn't give enough space, it'll give a 503. Keep in mind this is all in memory, not shared between our servers. This means that it is possible (but not guaranteed) that someone can upload 20 crates, and then send 2 requests per minute. On Heroku and any system that involves proxies, the remote_addr of the request becomes useless, as it will be the proxy forwarding your request rather than the client itself. Most proxies will set or append an `X-Forwarded-For` header, putting whatever remote_addr it received at the end of the the header. So if there are 3 proxies, the header will look like real_ip, proxy1, proxy2 However, if we're not careful this makes us vulnerable to IP spoofing. A lot of implementations just look at the first value in the header. All the proxies just append the header though (they don't know if they're getting traffic from the origin or a proxy), so you end up with spoofed_ip, real_ip, proxy1, proxy2 nginx, Rails, and many other implementations avoid this by requiring an allowlist of trusted IPs. With this configuration, the realip only kicks in if remote_addr is a trusted proxy, and then it will replace it with the last non-trusted IP from the header.

sgrif · 2019-01-16T21:13:08Z

I've given this some local testing to make sure everything is working properly (as best as I'm able to locally). I've confirmed that it is in fact setting the X-Real-Ip header to what it's expected to be, that the publish endpoint is rate limited, that all other endpoints are not rate limited, and that the changes to where the various proxy headers still works as before.

We'll still want to test this on staging before it gets deployed though

This was not intended to be deleted

ashleygwilliams

adjacent to this code- as per our convo in the triage meeting i think it would be beneficial to document this on the policies page!

sgrif · 2019-01-17T22:21:05Z

Note on above: I agree, but the wording is tricky and IMO should not block landing this

ashleygwilliams · 2019-01-17T22:21:57Z

i always prefer to use PRs as a forcing function for docs- very happy to help wordsmith here if you'd like (or in PMs)

jtgeibel · 2019-01-24T00:48:39Z

I've tested this in my staging area. The IP address shows correctly in the logs and after making 10 requests to the publish endpoint I started getting responses with a 429 status.

This was on the heroku-16 stack. The build pack pulls in nginx-1.9.5 on that stack. I'm not sure what stack we are on in production, but it looks like Heroku updated the buildpack to nginx-1.14.1 on heroku-18, which is nice.

sgrif · 2019-01-28T20:58:50Z

We talked a while back about policies, but it sorta stalled out. I've pushed up what I think would make the most sense here. I've put it under "Package Ownership" since this doesn't really have anything to do with crawlers. The wording "to claim ownership of a large number of packages" is deliberate, as it's the only wording I could come up with that separates the cases we care about from "we use a CI tool or a script to publish our suite of 10 crates" which we don't want to prevent.

jtgeibel · 2019-02-02T04:51:28Z

I'm r+ on the policy language, and this PR overall.

sgrif · 2019-02-14T15:03:24Z

@bors r+

bors · 2019-02-14T15:03:25Z

📌 Commit ac4d090 has been approved by sgrif

Rate limit the crate publish endpoint This adds a restriction to the number of crates that can be published from a single IP address in a single time period. This is done per IP instead of per-user, since that's what we can do with nginx alone. We may want to do additional rate limiting per token to force spammers to register additional accounts, and not just change their IP. This will use what's called the "leaky bucket" strategy for rate limiting. Basically every user has a bucket of tokens they use to publish crates. They start with 10 tokens available. Every time they hit this endpoint, they use one of the tokens. We give them a new token each minute. What this means is that you can upload 1 crate per minute, but we allow you to upload up to 10 in a short period of time before we start enforcing the rate limit. When someone does hit the rate limit, they will receive a 429 response. We could also allow it to instead just slow down the requests, refusing to process them until a token is available (queueing a max of 10 requests). This reserves 10 megabyte of memory for the IP table, which means we can hold about 80000 IPs. When the table is full, it will attempt to drop the oldest record, and if that doesn't give enough space, it'll give a 503. Keep in mind this is all in memory, not shared between our servers. This means that it is possible (but not guaranteed) that someone can upload 20 crates, and then send 2 requests per minute. On Heroku and any system that involves proxies, the remote_addr of the request becomes useless, as it will be the proxy forwarding your request rather than the client itself. Most proxies will set or append an `X-Forwarded-For` header, putting whatever remote_addr it received at the end of the the header. So if there are 3 proxies, the header will look like real_ip, proxy1, proxy2 However, if we're not careful this makes us vulnerable to IP spoofing. A lot of implementations just look at the first value in the header. All the proxies just append the header though (they don't know if they're getting traffic from the origin or a proxy), so you end up with spoofed_ip, real_ip, proxy1, proxy2 nginx, Rails, and many other implementations avoid this by requiring an allowlist of trusted IPs. With this configuration, the realip only kicks in if remote_addr is a trusted proxy, and then it will replace it with the last non-trusted IP from the header.

bors · 2019-02-14T15:33:09Z

⌛ Testing commit ac4d090 with merge 1aa4b1d...

bors · 2019-02-14T15:56:50Z

☀️ Test successful - checks-travis
Approved by: sgrif
Pushing 1aa4b1d to master...

In rust-lang/crates.io#1596 it was added a rate limit for crate publishing endpoint connections (10 components/min). As is said in the PR. The strategy used allows to upload 10 components first and then the ratio starts to be applied. Now, if the rate is surpassed, crates.io returns a 429 HTTP response. Cargo, tries to publish all of the workspace/crate components as fast as possible (as we should expect) but when we have more than 10 components and we try to publish the 11th, usually a minute hasn't passed yet and therefore we get a HTTP-429. This made some users to experience confusing/not well explained error messages as seen in rust-lang/crates.io#1643. The limit got increased to 30 components/min. But who knows.. So this closes rust-lang#6714 while a better solution is not found.

The configuration for this nginx module was original added in rust-lang#1596, in combination with an IP-based `limit_req` rate limit. The rate limit was lately moved to the application layer (see rust-lang#6875). Since we no longer use the `limit_req` rate limit in the nginx config, it looks like we don't need the `real_ip` config anymore either.

Re-add https rewriting

41c44b4

This was not intended to be deleted

ashleygwilliams reviewed Jan 17, 2019

View reviewed changes

Add policy around automated tools claiming names

ac4d090

sgrif force-pushed the sg-realip branch from 7a3c210 to ac4d090 Compare January 28, 2019 21:02

bors merged commit ac4d090 into rust-lang:master Feb 14, 2019

This was referenced Mar 3, 2019

Publishing workspaces with large numbers of crates hits the rate limit #1643

Closed

Crates.io rate limiting response display isn't ideal rust-lang/cargo#6714

Open

sgrif deleted the sg-realip branch March 9, 2019 01:34

This was referenced Jan 31, 2021

Improve Crates.io rate limiting response display rust-lang/cargo#9119

Closed

Add documentation about the publishing rate limit #3229

Open

This was referenced Oct 29, 2022

ci: Release to crates.io langston-barrett/treeedb#22

Open

Publish v0.1.0 to crates.io langston-barrett/treeedb#24

Open

Turbo87 mentioned this pull request Oct 17, 2023

nginx: Remove obsolete real_ip configuration #7302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rate limit the crate publish endpoint #1596

Rate limit the crate publish endpoint #1596

Uh oh!

sgrif commented Jan 16, 2019 •

edited

Loading

Uh oh!

sgrif commented Jan 16, 2019 •

edited

Loading

Uh oh!

ashleygwilliams left a comment

Uh oh!

sgrif commented Jan 17, 2019

Uh oh!

ashleygwilliams commented Jan 17, 2019

Uh oh!

jtgeibel commented Jan 24, 2019

Uh oh!

sgrif commented Jan 28, 2019

Uh oh!

jtgeibel commented Feb 2, 2019

Uh oh!

sgrif commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

Uh oh!

Rate limit the crate publish endpoint #1596

Rate limit the crate publish endpoint #1596

Uh oh!

Conversation

sgrif commented Jan 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgrif commented Jan 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashleygwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

sgrif commented Jan 17, 2019

Uh oh!

ashleygwilliams commented Jan 17, 2019

Uh oh!

jtgeibel commented Jan 24, 2019

Uh oh!

sgrif commented Jan 28, 2019

Uh oh!

jtgeibel commented Feb 2, 2019

Uh oh!

sgrif commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

bors commented Feb 14, 2019

Uh oh!

Uh oh!

sgrif commented Jan 16, 2019 •

edited

Loading

sgrif commented Jan 16, 2019 •

edited

Loading