Scheduling and Load Balancing

Tempesta FW can work with thousands backend servers. Some of them are interchangeable and proper load balancing is needed. Others provide completely different services, so it is important to chose the right server according to a client request. Tempesta FW supports a set of schedulers to complete all this requirements.

In Tempesta FW configuration file all backend servers must be grouped by single service principle: each server group must contain sorely interchangeable servers. See chapter backend servers for more information on syntax. Server groups in turn are linked with particular virtual hosts and locations (see Virtual hosts and locations and Proxy pass).

Fig. 1: General Scheduling routine

                      HTTP Request
                           ↓
              ┌────────────┴────────────┐
    ┌─────────┴─────────┐     ┌─────────┴─────────┐
    │    HTTP tables    │     │ Sticky Session    │
    │                   │     │ Scheduler         │
    └─────────┬─────────┘     └─────────┬─────────┘
              │                         │
    ┌─────────┴─────────┐               │
    │ Load Balancing    │               │
    │ Schedulers:       │               │
    │ Ratio, Hash       │               │
    └─────────┬─────────┘               │
              └────────────┬────────────┘
                           ↓
               Backend Server Connection

To forward HTTP request Tempesta FW choses an active backend server connection. First request pass though HTTP tables which analyse an HTTP request to find out the target virtual host and location and then the request is passed to corresponding server group. Then Load Balancing Scheduler attached to the chosen server group picks the most suitable connection to back end server. Ratio and Hash schedulers are available for that role.

If target backend server connection is found, it will be used to forward the request to backend server. If no target server group can be used or no live connection available in target server group, error 502 will be forwarded to HTTP client as a response.

Sticky Sessions scheduler pins HTTP sessions to specific servers and bypasses generic scheduling routine.

Load Balancing Schedulers

Load balancing scheduler controls load distribution within servers in group. sched directive is applied in server group context.
Defaults: ratio scheduler with default options.

sched <SCHED_NAME> [OPTIONS];

SCHED_NAME: The name of a scheduler available in Tempesta FW. OPTIONS: Optional parameters. Not all schedulers have additional options.

Only one sched directive is allowed per explicit or implicit group. Example:

# Named server group:
srv_group static_storage {
	server 10.10.0.1:8080;

	sched ratio dynamic;
}

# Implicit "default" group:
server 192.168.1.1;
sched ratio dynamic;

Ratio scheduler

Ratio scheduler balances the load across servers in a group based on each server's weight. Requests are forwarded more to servers with more weight, and less to servers with less weight. All connections of the same server are loaded equally. Weights can be defined statically by user or counted dynamically according to performance of servers. As a result, each server in a group receives an optimal load.

ratio scheduler may have the following options:

static: The weight of each server in a group is defined statically with [weight=<NN>] option of the server directive. This is the default ratio scheduler option.
dynamic: The weight of each server in a group is defined dynamically. Specific type of dynamic weight is specified with additional options:
- minimum: The current minimum response time from a server;
- maximum: The current maximum response time from a server;
- average: The current average response time from a server;
- percentile [<NN>]: The current response time from a server that is within specified percentile. The percentile may be one of 50, 75, 90, 95, 99. If none is given, then the default percentile of 90 is used. If a specific type of dynamic weight is not specified, then the default type of average is used.
predict: The weight of each server in a group is predicted dynamically for a time in the future, based on server's behavior in the past. Additional options include those that are defined for dynamic weight, as well as the following options:
- past: Period of time (in seconds) to keep past response time values from a server. The default value is 30 seconds.
- rate: Rate (times per second) of retrieval of past response time values. The default value is 20 times per second.
- ahead: Period of time (in seconds) for which to make a prediction; It can't be more than half of past. The default value is 15 seconds.

Note, if a Dynamic Scheduler is specified for a group, and there's a server in that group with the weight option, then an error is produced as that combination is incompatible. Same is true for Predictive Scheduler.

The following are examples of scheduler specification in configuration. Again, only one sched directive is allowed per group.

# Use ratio scheduler. By default, static weight distribution is used.
srv_group static {
	server 192.168.1.1;		# Server with default weight (50)
	server 192.168.1.1 weight=75;	# Server with explicit weight 75

	sched ratio; # Identical to 'sched ratio static;'
}

# Use dynamic scheduler. By default, current average response time is used
# for weight distribution.
sched dynamic;

# Use dynamic scheduler with maximum response time for weight distribution.
sched dynamic maximum;

# Use dynamic scheduler, default percentile of 90 is used.
sched dynamic percentile;

# Use dynamic scheduler, percentile of 75 is used for weight distribution.
sched dynamic percentile 75;

# Use predictive scheduler, percentile of 75 is used for weight distribution.
# The values of weights of each server are collected for past 60 seconds
# at the rate of 20 times per second, the weight of each server in predicted
# for the time of 2 seconds ahead.
sched predict percentile 75 past=60 rate=20 ahead=2;

All server connections are placed in round-robin buffer according to server's weight. A new request is scheduled to the next connection from the buffer. Server connection can be live or dead, so a few attempts are done to pick a live connection from the round-robin. A lot of attempts doesn't guarantee that picked connection is alive, so a strict limit is used to make processing of each request faster and to improve total throughput.

Servers groups should be created with proper care. Server groups should be created with servers that handle similar resources. For instance, if servers with static content that is served quickly are grouped together with servers with dynamic content that is I/O bound, then the quick response times from servers with static content will be nearly invisible in comparison to longer response times from servers with dynamic content. So it is impossible to find out which servers and connections perform better and the distribution of load with ratio dynamic or predict will be severely skewed.

Reaction to server unavailability. When the static scheduler is used and a lot of server connections are in failovering or dead state, probability to pick not live connection from round-robin buffer is higher. Since only a few scheduling attempts are done, the whole scheduling routine can be failed. That means that static ratio scheduler should be used only reliable backend servers which keeps connections alive as long as possible.

Dynamic and Predict schedulers are more fault tolerant. Servers with better performance get higher weights and are more likely to be chosen during scheduling routine. So even if the majority of servers in a group are not available, alive servers take precedence over not available ones.

Scheduling non-idempotent requests. The presence of a non-idempotent request in a connection means that subsequent requests may not be sent out until a response is received to the non-idempotent request. With that in mind, an attempt is made to put new requests to connections that don't currently have non-idempotent requests. If all connections have a non-idempotent request in them, then such a connection is used as there's no other choice.

Hash scheduler

Hash scheduler distributes load in completely different way. Requests for the same resources are pinned to the same servers in group by the Rendezvous hashing (Highest Random Weight) principle. The pinning is persistent: failure of one server doesn't cause re-pinning of all resources, only resources of the failed server will be temporaly re-pinned to other servers. When the server will return back online temporal pinning will be removed and the server will proceed serving the same resources.

With this scheduling method each backend server handles relatively small set of resources instead of handling the full web service. Handy if simultaneous access of different backend servers to the same resources requires synchronization or if pinning resources to servers can save unnecessary I/O operations. But the scheduler doesn't track number of requests forwarded to each server.

Hash scheduler don't have any additional options:

sched hash;

Example:

srv_group static_storage {
	server 10.10.0.1:8080;

	sched hash;
}

Unique hash id is calculated and stored for each server connection. When a new request is passed to the scheduler Uri field and Host header is used to calculate hash key for the request. The connection with maximum value of (connection hash) XOR (request hash) is chosen as the most appropriate connection for the response. The Hash scheduler provides optimal load for all servers in a group if number of uri's in web service is many times bigger than a total number of connections across all servers in the group.

The drawbacks of HRW hashing are:

The hashing may bring unfairness to the load balancing.
There may be a case when a server pulls all load from all other servers. Although it is quite improbable, such condition is quite stable: it can be fixed only by restarting Tempesta FW.
If requests for some resources of the Web service are much more frequent than any others, then corresponding servers may be overloaded.
The scheduler is slower on large groups than the ratio scheduler.

Reaction to server unavailability. If a server connection goes down it is skipped during scheduling. When it is back online, it continues to serve so other connections or servers

Scheduling non-idempotent requests. The scheduler has no special processing of non-idempotent requests unlike ratio scheduler.

Sticky Sessions Scheduler

Another way to distribute load among servers - pin client sessions to specific servers. The method is also known as persistent sessions. User session will be processed on the same session from the very beginning to session closing. Backend application may rely on this fact and now special synchronization between backend nodes is needed.

To distinguish client sessions Tempesta FW sets session Sticky Cookies. The cookies can be used to schedule client requests among backend server. Sticky Cookies must be configured properly to use the load balancing algorithm. Sticky Sessions Scheduler cannot be used if client doesn't support cookies.

Example:

srv_group persistent {
    server 10.10.0.1:8000;
    server 10.10.0.1:8080;
    server 10.10.0.1:8081;

}

vhost example.com {
    sticky {
        cookie;
        sticky_secret "f00)9eR59*_/22";
        sticky_sessions;
    }
    proxy_pass persistent;
}

TempestaFW can also learn sessions created from backend servers. In this case it doesn't set any cookie to the client itself. Insted the Set-Cookie: header is learned from backend server and all the requests with the same cookie will be be delivered to that client.

Example:

srv_group persistent {
    server 10.10.0.1:8000;
    server 10.10.0.1:8080;
    server 10.10.0.1:8081;

}

vhost example.com {
    sticky {
        learn name=user_id;
        sticky_sessions;
    }
    proxy_pass persistent;
}

allow_failover option allow Tempesta re-pin sessions to a new server if the current pinned server went offline. Accident will be logged. Moving client session from one server to another actually brakes session persistence, so the backend application must support the feature.

Note, that method does not allow to set different backup server groups for the same primary group in Proxy pass.

First request in a client session to the server group will be forwarded to a server chosen by the group's load balances scheduler (ratio or hash). All the following requests to the server group will be forwarded to the same server.

Reaction to server unavailability. If server goes down (for a maintenance or due to networking errors) client receives 502 responses. When the server is back online it will continue serving this client.

Session persistence is the highest priority for the method. So if the whole primary server group is offline new sessions will be pinned to a server in the backup group if applied. Backup server will continue serving the client even when the primary group is back online. That means that switching from backup server group back to the primary group ends only after all the current sessions pinned to backup server group are expired.

Scheduling non-idempotent requests. The scheduler has no special processing of non-idempotent requests unlike ratio scheduler.

Compatibility with Load-Balancing schedulers. Load distribution among server is controlled by distributing the first request in the session. So group's load balancing scheduler controls how sessions but not requests are distributed among servers.

That also means that web applications with the only one entry point shouldn't use hash scheduler to prevent loading only one server from the server group.

How to choose the best scheduler

There is no silver bullet, choice of the right scheduler depends on backend application. Each scheduler have it's own restrictions and performance recommendations as explained above. Here is a few examples that shows schedulers in action.

Note, that examples below may be extended by enabling other TempestaFW options, such as cache. But this is not the subject of this guide, which covers only different types of schedulers.

Example 1: Lot of servers with different services

In the situation when there is a lot of backend servers and each provides the unique web service HTTP tables is the only option. Put every server in it's own group and add match rules to pick up the right server for every request. Example:

srv_group foo { ... }
srv_group bar { ... }
srv_group buzz { ... }
...

vhost vh_foo {
	proxy_pass foo;
	...
}
vhost vh_bar {
	proxy_pass bar;
	...
}
vhost vh_buzz {
	proxy_pass buzz;
	...
}
...

http_chain {
	host == "foo.com"  -> vh_foo;
	host == "bar.com"  -> vh_bar;
	host == "buzz.com" -> vh_buzz;
	...
}

Example 2: A/B Testing

HTTP tables can also help to organize A/B testing. Group backends in two groups depending on service version and use match rules to chose target group. Example:

srv_group beta { ... }
srv_group stable { ... }
...
vhost vh_beta {
	# Forward to "stable" if "beta" is not available:
	proxy_pass beta backup=stable;
	...
}
vhost vh_stable {
	proxy_pass stable;
	...
}

http_chain {
	host == "beta.*" -> vh_beta;
			 -> vh_stable;
}

Example 3: Lot of dynamic content in one web service

The service have a huge number of dynamic resources and access to every resource is quite a slow operation requiring a lot of I/O or synchronization. I.e. online docs: multiple users can access and modify shared resources. Hash scheduler is the best option in this case. Shared resource will be pinned to specific server so all users working with the resource will be forwarded to the same backend server.

To have optimal load distribution across all backends total number of resources should be much bigger than total number of connections between backends and TempestaFW.

Example:

srv_group lot_of_resources {
	server 10.10.0.1;
	server 10.10.0.2;
	server 10.10.0.3;
	...

	sched hash;
}

Example 4: Mixed static and dynamic content

Backends providing static and dynamic content in the same time normally process requests for static content much faster than dynamic ones. It is recommended to divide backends in two groups: ones to handle static content and others for dynamic. It is also possible to create two different groups with the same servers to get independent scheduling contexts for static and dynamic content. With that division ratio scheduler performs much better and servers are loaded more fairly. Example:

srv_group static {
	server 10.10.0.1;
	server 10.10.0.2;
	server 10.10.0.3;

	sched ratio dynamic;
}

# Use the same servers like in "static" group:
srv_group dynamic {
	server 10.10.0.1;
	server 10.10.0.2;
	server 10.10.0.3;

	sched ratio dynamic;
}
# OR
# Use other servers that provide the same service (better):
srv_group dynamic {
	server 10.10.0.4;
	server 10.10.0.5;
	server 10.10.0.6;

	sched ratio dynamic;
}

vhost vh_static {
	proxy_pass static;
	...
}
vhost vh_dynamic {
	proxy_pass dynamic;
	...
}

http_chain {
	uri == "/static/*" -> vh_static;
			   -> vh_dynamic;
}

The same configuration can be achieved with help of vhost directive:

srv_group static {
	server 10.10.0.1;
	server 10.10.0.2;
	server 10.10.0.3;

	sched ratio dynamic;
}
srv_group dynamic {
	server 10.10.0.4;
	server 10.10.0.5;
	server 10.10.0.6;

	sched ratio dynamic;
}

vhost vh_base {
	proxy_pass dynamic;
	location prefix "/static/" {
		proxy_pass static;
		...
	} 
	...
}

http_chain {
	-> vh_base;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduling and Load Balancing

Load Balancing Schedulers

Ratio scheduler

Hash scheduler

Sticky Sessions Scheduler

How to choose the best scheduler

Example 1: Lot of servers with different services

Example 2: A/B Testing

Example 3: Lot of dynamic content in one web service

Example 4: Mixed static and dynamic content

Contents

Clone this wiki locally