Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limiting for connections #119

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vysotskylev
Copy link

No description provided.

@vysotskylev vysotskylev requested a review from a team as a code owner April 14, 2022 21:29
@google-cla
Copy link

google-cla bot commented Apr 14, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

For more information, open the CLA check for this pull request.

@vysotskylev vysotskylev force-pushed the vysotskylev/simple_rate_limits branch from 2e2e15d to 3986229 Compare April 14, 2022 21:59
@bemasc
Copy link

bemasc commented Apr 15, 2022

@vysotskylev You wrote:

But we find the instantaneous limits very useful in our services (with many keys and connections on a single node), as the "built-in" flow control is far from perfect. Moreover, it seems to fail in case of UDP service (AFAIK).

Thanks for the clarification. I would still like to understand more about what you are seeing that motivates this change.

I just wrote a quick microbenchmark for the TCP and UDP download throughput. It shows 13+ Gbps for TCP and 900+ Mbps for UDP on my 5-year-old laptop, fast enough to saturate a 1 Gbps link. Do you think your server is CPU-limited? Or do you think users are saturating the network link, resulting in packet loss outside of your VM?

Also, why do you think that UDP is particularly problematic? Have users reported worse behavior with UDP-based applications? If so, which protocols or applications are causing a problem?

We definitely want to make sure that this server scales well to large numbers of users and access keys, but I want to make sure that we are solving the right problem first.

@krakazyabra
Copy link

krakazyabra commented Apr 17, 2022

Hello, @bemasc
Let me answer you these questions.

We provide free service to pass all censorship. Anyone can create key and use free internet without any restrictions. But some users use the service for destruction purposes like downloading torrents, using our proxy for attacks (like LOIC). We cannot block their traffic, so only one possible way is to limit their speed to make their job more complicated.
Also we, as provider of service, have to buy traffic for our exit-nodes with Outline. Of course, we can limit usage, but user will re-create key using temp mail. From my point of view better will be to give them more data limit, but less speed.
Also not all hosting providers can offer even 1G port, more often it is 100M or 200M. And just several of users on the server can utilize whole channel with torrents or another downloading.

With large number of keys prometheus is bottle-neck, but it is up to Outline Server, but not ShadowSocks. Without it SS works pretty fast and shows stable graphs for cpu/mem

@bemasc
Copy link

bemasc commented Apr 21, 2022

We are considering how to proceed here. We want to make sure that outline-ss-server provides any anti-abuse capabilities required by your service, but we may want to take a different approach than this PR. Before we ask for changes, we want to make sure that your effort won't be wasted.

We will try to think about this problem more and get back to you in the next week or two.

@bemasc
Copy link

bemasc commented Apr 27, 2022

I've written a prototype that takes an alternative approach to this problem, using the kernel to enforce fair sharing of bandwidth between users. You can find the code in the bemasc-somark branch. To use it, run the server as a privileged user, and also issue the following commands as root:

tc qdisc add dev eth0 handle 1: root fq_codel interval 200ms flows 2048
tc filter add dev eth0 parent 1: handle 1 protocol all flow hash keys mark perturb 600 divisor 2048

(replacing eth0 with your network interface name if necessary)

This change groups all traffic for each client IP into a single "flow", designated by its socket mark, and instructs the kernel to share bandwidth fairly among these flows. Thus, one user with 100 open sockets, or badly behaved UDP traffic, cannot get more than 1/10th of the bandwidth if there are 10 active users, even if the others only have one socket each.

Would this change help to address your concerns? Are you able to test it and see if it performs acceptably?

@krakazyabra
Copy link

hello, @bemasc
I understand your solution, but have few questions:
Who is active user? one, who has active upload/download session right now? Or just key with some used traffic?
How will it work, if there will be 3-4k key per server with 1Gbit link? will it split the bandwidth between all of them, or only active sessions will be in deal? sorry, I'm not programmer at all, rtfsc shows nothing for me)))

@bemasc
Copy link

bemasc commented Apr 28, 2022

hello, @bemasc I understand your solution, but have few questions: Who is active user? one, who has active upload/download session right now? Or just key with some used traffic?

The former: a user who is currently sending data through the proxy. ("Currently" meaning roughly "within the last 200 milliseconds".)

How will it work, if there will be 3-4k key per server with 1Gbit link? will it split the bandwidth between all of them, or only active sessions will be in deal?

The latter: only active users. If there are 3-4k keys but only one active user, that user should have the full 1 Gbit.

sorry, I'm not programmer at all, rtfsc shows nothing for me)))

The underlying principle is Fair Queueing, treating all traffic for a given client as a single "flow".

@krakazyabra
Copy link

Sounds good, that's really fair queueing. How can I test it?

@bemasc
Copy link

bemasc commented Apr 29, 2022

I don't have any advice about how to test it. Perhaps you can set up a server and direct some user traffic to it, or try a load test running on several VMs simultaneously.

If you do find a way to test it, you can see some statistics by running this command:

tc -s qdisc show dev eth0

@krakazyabra
Copy link

I mean, if it has no package, then I should compile it, then replace binary inside outline-server, right (probably through updating Dockerfile)?

@bemasc
Copy link

bemasc commented Apr 29, 2022

I assumed that you are already running the code in this PR, and hence must have a system for using alternative branches of outline-ss-server.

I suppose the easiest way to run a modified version inside Docker would be to replace the binary here and then rebuild the Docker image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants