-
Notifications
You must be signed in to change notification settings - Fork 103
Scheduling and Load Balancing
Tempesta FW can work with thousands backend servers. Some of them are interchangeable and proper load balancing is needed. Others provide completely different services, so it is important to chose the right server according to a client request. Tempesta FW supports a set of schedulers to complete all this requirements.
In Tempesta FW configuration file all backend servers must be grouped by single service principle: each server group must contain sorely interchangeable servers. See chapter backend servers for more information on syntax. Server groups in turn are linked with particular virtual hosts and locations (see Virtual hosts and locations and Proxy pass).
Fig. 1: General Scheduling routine
HTTP Request
↓
┌────────────┴────────────┐
┌─────────┴─────────┐ ┌─────────┴─────────┐
│ HTTP tables │ │ Sticky Session │
│ │ │ Scheduler │
└─────────┬─────────┘ └─────────┬─────────┘
│ │
┌─────────┴─────────┐ │
│ Load Balancing │ │
│ Schedulers: │ │
│ Ratio, Hash │ │
└─────────┬─────────┘ │
└────────────┬────────────┘
↓
Backend Server Connection
To forward HTTP request Tempesta FW choses an active backend server connection. First request pass though HTTP tables which analyse an HTTP request to find out the target virtual host and location and then the request is passed to corresponding server group. Then Load Balancing Scheduler attached to the chosen server group picks the most suitable connection to back end server. Ratio and Hash schedulers are available for that role.
If target backend server connection is found, it will be used to forward the request to backend server. If no target server group can be used or no live connection available in target server group, error 502 will be forwarded to HTTP client as a response.
Sticky Sessions scheduler pins HTTP sessions to specific servers and bypasses generic scheduling routine.
Load balancing scheduler controls load distribution within servers in group.
sched
directive is applied in server group context.
Defaults: ratio
scheduler with default options.
sched <SCHED_NAME> [OPTIONS];
SCHED_NAME
: The name of a scheduler available in Tempesta FW.
OPTIONS
: Optional parameters. Not all schedulers have additional options.
Only one sched
directive is allowed per explicit or implicit group.
Example:
# Named server group:
srv_group static_storage {
server 10.10.0.1:8080;
sched ratio dynamic;
}
# Implicit "default" group:
server 192.168.1.1;
sched ratio dynamic;
Ratio scheduler balances the load across servers in a group based on each server's weight. Requests are forwarded more to servers with more weight, and less to servers with less weight. All connections of the same server are loaded equally. Weights can be defined statically by user or counted dynamically according to performance of servers. As a result, each server in a group receives an optimal load.
ratio scheduler may have the following options:
-
static: The weight of each server in a group is defined statically
with
[weight=<NN>]
option of theserver
directive. This is the defaultratio
scheduler option. -
dynamic: The weight of each server in a group is defined dynamically.
Specific type of dynamic weight is specified with additional options:
- minimum: The current minimum response time from a server;
- maximum: The current maximum response time from a server;
- average: The current average response time from a server;
-
percentile
[<NN>]
: The current response time from a server that is within specified percentile. The percentile may be one of 50, 75, 90, 95, 99. If none is given, then the default percentile of 90 is used. If a specific type of dynamic weight is not specified, then the default type ofaverage
is used.
-
predict: The weight of each server in a group is predicted dynamically
for a time in the future, based on server's behavior in the past. Additional
options include those that are defined for dynamic weight, as well as
the following options:
- past: Period of time (in seconds) to keep past response time values from a server. The default value is 30 seconds.
- rate: Rate (times per second) of retrieval of past response time values. The default value is 20 times per second.
- ahead: Period of time (in seconds) for which to make a prediction; It can't be more than half of past. The default value is 15 seconds.
Note, if a Dynamic Scheduler is specified for a group, and there's a server in that group with the
weight
option, then an error is produced as that combination is incompatible. Same is true for Predictive Scheduler.
The following are examples of scheduler specification in configuration.
Again, only one sched
directive is allowed per group.
# Use ratio scheduler. By default, static weight distribution is used.
srv_group static {
server 192.168.1.1; # Server with default weight (50)
server 192.168.1.1 weight=75; # Server with explicit weight 75
sched ratio; # Identical to 'sched ratio static;'
}
# Use dynamic scheduler. By default, current average response time is used
# for weight distribution.
sched dynamic;
# Use dynamic scheduler with maximum response time for weight distribution.
sched dynamic maximum;
# Use dynamic scheduler, default percentile of 90 is used.
sched dynamic percentile;
# Use dynamic scheduler, percentile of 75 is used for weight distribution.
sched dynamic percentile 75;
# Use predictive scheduler, percentile of 75 is used for weight distribution.
# The values of weights of each server are collected for past 60 seconds
# at the rate of 20 times per second, the weight of each server in predicted
# for the time of 2 seconds ahead.
sched predict percentile 75 past=60 rate=20 ahead=2;
All server connections are placed in round-robin buffer according to server's weight. A new request is scheduled to the next connection from the buffer. Server connection can be live or dead, so a few attempts are done to pick a live connection from the round-robin. A lot of attempts doesn't guarantee that picked connection is alive, so a strict limit is used to make processing of each request faster and to improve total throughput.
Servers groups should be created with proper care. Server groups should be created with servers that handle similar resources. For instance, if servers with static content that is served quickly are grouped together with servers with dynamic content that is I/O bound, then the quick response times from servers with static content will be nearly invisible in comparison to longer response times from servers with dynamic content. So it is impossible to find out which servers and connections perform better and the distribution of load with ratio dynamic or predict will be severely skewed.
Reaction to server unavailability. When the static scheduler is used and a lot of server connections are in failovering or dead state, probability to pick not live connection from round-robin buffer is higher. Since only a few scheduling attempts are done, the whole scheduling routine can be failed. That means that static ratio scheduler should be used only reliable backend servers which keeps connections alive as long as possible.
Dynamic and Predict schedulers are more fault tolerant. Servers with better performance get higher weights and are more likely to be chosen during scheduling routine. So even if the majority of servers in a group are not available, alive servers take precedence over not available ones.
Scheduling non-idempotent requests. The presence of a non-idempotent request in a connection means that subsequent requests may not be sent out until a response is received to the non-idempotent request. With that in mind, an attempt is made to put new requests to connections that don't currently have non-idempotent requests. If all connections have a non-idempotent request in them, then such a connection is used as there's no other choice.
Hash scheduler distributes load in completely different way. Requests for the same resources are pinned to the same servers in group by the Rendezvous hashing (Highest Random Weight) principle. The pinning is persistent: failure of one server doesn't cause re-pinning of all resources, only resources of the failed server will be temporaly re-pinned to other servers. When the server will return back online temporal pinning will be removed and the server will proceed serving the same resources.
With this scheduling method each backend server handles relatively small set of resources instead of handling the full web service. Handy if simultaneous access of different backend servers to the same resources requires synchronization or if pinning resources to servers can save unnecessary I/O operations. But the scheduler doesn't track number of requests forwarded to each server.
Hash scheduler don't have any additional options:
sched hash;
Example:
srv_group static_storage {
server 10.10.0.1:8080;
sched hash;
}
Unique hash id is calculated and stored for each server connection. When a new
request is passed to the scheduler Uri
field and Host
header is used to
calculate hash key for the request. The connection with maximum value of
(connection hash) XOR (request hash)
is chosen as the most appropriate
connection for the response. The Hash scheduler provides optimal load for
all servers in a group if number of uri's in web service is many times bigger
than a total number of connections across all servers in the group.
The drawbacks of HRW hashing are:
- The hashing may bring unfairness to the load balancing.
There may be a case when a server pulls all load from all other servers. Although it is quite improbable, such condition is quite stable: it can be fixed only by restarting Tempesta FW. - If requests for some resources of the Web service are much more frequent than any others, then corresponding servers may be overloaded.
- The scheduler is slower on large groups than the
ratio
scheduler.
Reaction to server unavailability. If a server connection goes down it is skipped during scheduling. When it is back online, it continues to serve so other connections or servers
Scheduling non-idempotent requests. The scheduler has no special processing of non-idempotent requests unlike ratio scheduler.
Another way to distribute load among servers - pin client sessions to specific servers. The method is also known as persistent sessions. User session will be processed on the same session from the very beginning to session closing. Backend application may rely on this fact and now special synchronization between backend nodes is needed.
To distinguish client sessions Tempesta FW sets session Sticky Cookies. The cookies can be used to schedule client requests among backend server. Sticky Cookies must be configured properly to use the load balancing algorithm. Sticky Sessions Scheduler cannot be used if client doesn't support cookies.
Example:
srv_group persistent {
server 10.10.0.1:8000;
server 10.10.0.1:8080;
server 10.10.0.1:8081;
}
vhost example.com {
sticky {
cookie;
sticky_secret "f00)9eR59*_/22";
sticky_sessions;
}
proxy_pass persistent;
}
TempestaFW can also learn sessions created from backend servers. In this case it
doesn't set any cookie to the client itself. Insted the Set-Cookie:
header is
learned from backend server and all the requests with the same cookie will be
be delivered to that client.
Example:
srv_group persistent {
server 10.10.0.1:8000;
server 10.10.0.1:8080;
server 10.10.0.1:8081;
}
vhost example.com {
sticky {
learn name=user_id;
sticky_sessions;
}
proxy_pass persistent;
}
allow_failover
option allow Tempesta re-pin sessions to a new server if
the current pinned server went offline. Accident will be logged. Moving client
session from one server to another actually brakes session persistence, so
the backend application must support the feature.
Note, that method does not allow to set different backup server groups for the same primary group in Proxy pass.
First request in a client session to the server group will be forwarded to a server chosen by the group's load balances scheduler (ratio or hash). All the following requests to the server group will be forwarded to the same server.
Reaction to server unavailability. If server goes down (for a maintenance or
due to networking errors) client receives 502
responses. When the server
is back online it will continue serving this client.
Session persistence is the highest priority for the method. So if the whole primary server group is offline new sessions will be pinned to a server in the backup group if applied. Backup server will continue serving the client even when the primary group is back online. That means that switching from backup server group back to the primary group ends only after all the current sessions pinned to backup server group are expired.
Scheduling non-idempotent requests. The scheduler has no special processing of non-idempotent requests unlike ratio scheduler.
Compatibility with Load-Balancing schedulers. Load distribution among server is controlled by distributing the first request in the session. So group's load balancing scheduler controls how sessions but not requests are distributed among servers.
That also means that web applications with the only one entry point shouldn't use hash scheduler to prevent loading only one server from the server group.
There is no silver bullet, choice of the right scheduler depends on backend application. Each scheduler have it's own restrictions and performance recommendations as explained above. Here is a few examples that shows schedulers in action.
Note, that examples below may be extended by enabling other TempestaFW options, such as cache. But this is not the subject of this guide, which covers only different types of schedulers.
In the situation when there is a lot of backend servers and each provides the unique web service HTTP tables is the only option. Put every server in it's own group and add match rules to pick up the right server for every request. Example:
srv_group foo { ... }
srv_group bar { ... }
srv_group buzz { ... }
...
vhost vh_foo {
proxy_pass foo;
...
}
vhost vh_bar {
proxy_pass bar;
...
}
vhost vh_buzz {
proxy_pass buzz;
...
}
...
http_chain {
host == "foo.com" -> vh_foo;
host == "bar.com" -> vh_bar;
host == "buzz.com" -> vh_buzz;
...
}
HTTP tables can also help to organize A/B testing. Group backends in two groups depending on service version and use match rules to chose target group. Example:
srv_group beta { ... }
srv_group stable { ... }
...
vhost vh_beta {
# Forward to "stable" if "beta" is not available:
proxy_pass beta backup=stable;
...
}
vhost vh_stable {
proxy_pass stable;
...
}
http_chain {
host == "beta.*" -> vh_beta;
-> vh_stable;
}
The service have a huge number of dynamic resources and access to every resource is quite a slow operation requiring a lot of I/O or synchronization. I.e. online docs: multiple users can access and modify shared resources. Hash scheduler is the best option in this case. Shared resource will be pinned to specific server so all users working with the resource will be forwarded to the same backend server.
To have optimal load distribution across all backends total number of resources should be much bigger than total number of connections between backends and TempestaFW.
Example:
srv_group lot_of_resources {
server 10.10.0.1;
server 10.10.0.2;
server 10.10.0.3;
...
sched hash;
}
Backends providing static and dynamic content in the same time normally process requests for static content much faster than dynamic ones. It is recommended to divide backends in two groups: ones to handle static content and others for dynamic. It is also possible to create two different groups with the same servers to get independent scheduling contexts for static and dynamic content. With that division ratio scheduler performs much better and servers are loaded more fairly. Example:
srv_group static {
server 10.10.0.1;
server 10.10.0.2;
server 10.10.0.3;
sched ratio dynamic;
}
# Use the same servers like in "static" group:
srv_group dynamic {
server 10.10.0.1;
server 10.10.0.2;
server 10.10.0.3;
sched ratio dynamic;
}
# OR
# Use other servers that provide the same service (better):
srv_group dynamic {
server 10.10.0.4;
server 10.10.0.5;
server 10.10.0.6;
sched ratio dynamic;
}
vhost vh_static {
proxy_pass static;
...
}
vhost vh_dynamic {
proxy_pass dynamic;
...
}
http_chain {
uri == "/static/*" -> vh_static;
-> vh_dynamic;
}
The same configuration can be achieved with help of vhost
directive:
srv_group static {
server 10.10.0.1;
server 10.10.0.2;
server 10.10.0.3;
sched ratio dynamic;
}
srv_group dynamic {
server 10.10.0.4;
server 10.10.0.5;
server 10.10.0.6;
sched ratio dynamic;
}
vhost vh_base {
proxy_pass dynamic;
location prefix "/static/" {
proxy_pass static;
...
}
...
}
http_chain {
-> vh_base;
}