Fast HTTP match #732

krizhanovsky · 2017-05-13T19:38:11Z

HTTP Load balancing

Separated from #76, in particular from #76 (comment) . A faster implementation of HTTP field matching is required for HTTP load balancing and filtering. There could be a hash table, such that we can make a quick jump by a rule key and the key can be calculate by the string and ID of the HTTP field. And/or BNDM with q-Grams (BG) algorithm can be used to quickly process many strings with common prefix.

Issue #76 works on massive number of backend servers:

srv_group group_0 { server 127.0.0.1:9090 conns_n=1; }
srv_group group_1 { server 127.0.0.1:9090 conns_n=1; }
srv_group group_2 { server 127.0.0.1:9090 conns_n=1; }
....
srv_group group_999 { server 127.0.0.1:9090 conns_n=1; }

sched_http_rules {
match group_0  hdr_host eq "group-0.com";
match group_1  hdr_host eq "group-1.com";
match group_2  hdr_host eq "group-2.com";
....
match group_999  hdr_host eq "group-999.com";
}

Currently all 1000 and more match rules are matched sequentially. The example is quite realistic for massive hosting installations. BG algorithm implemented in #901 must be applied to the matching. Probably matching syntax should be adjusted like (with #731 in mind):

host == {
    "group-0.com" -> group_0;
    "group-1.com" -> group_1;
    "group-2.com" -> group_2;
}

HTTPtables

Strings matching

Also the use case from #731 must be processed in more efficient way, e.g. using hash table or a tree:

http_chain {
        mark == {
                2 -> backend_0;
                3 -> backend_1;
                4 -> backend_2;
                5 -> backend_3;
                ....
        }
}

Memory spacial locality

At the moment kzalloc() is used on configuration phase a lot, so spacial locality on run time can be improved by using more local data structures.

The chains

Currently HTTPtables sequentially scans all the rules in a chain, which isn't efficient. The first option is to run only one per-header match using multi-pattern matching. Probably, there are also other optimization opportunities.

We need some use cases on large chains to understand the typical workload, i.e. whether there are cases with many patterns for the same headers or there are mostly different headers matchers.

Generic strings matching

Actually, Tempesta FW is full of multiple strings matching. E.g. caching policy for content type suffix is performed with FOR loop in tfw_capolicy_match() while a powerfull web resource can have a lot of various suffixes: aif, aiff, au, avi, bin, bmp, cab, carb, cct, cdf, class, css, doc, dcr, dtd, gcf, gff, gif, grv, hdml, hqx, ico, ini, jpeg, jpg, js, mov, mp3, nc, pct, ppc, pws, swa, swf, txt, vbs, w32, wav, wbmp, wml, wmlc, wmls, wmlsc, xsd, zip.

Testing

Functional tests

TBD

Performance

We need a solid estimation on which number of rules and/or chains the performance significantly degrades.

The text was updated successfully, but these errors were encountered:

krizhanovsky added the performance label May 13, 2017

krizhanovsky added this to the 1.0 WebOS milestone May 13, 2017

This was referenced May 13, 2017

Requests scheduling to massive farm of backend servers #76

Closed

HTTP tables #731

Closed

krizhanovsky modified the milestones: backlog, 1.0 Web Operating System Jan 15, 2018

krizhanovsky modified the milestones: 0.5 alpha, 1.0 Tempesta OS, 0.10 Kernel-User Space Transport Feb 5, 2018

krizhanovsky mentioned this issue Feb 9, 2018

Strong Content-Type validation #901

Open

krizhanovsky modified the milestones: 0.10 Kernel-User Space Transport , 0.9 Web server Feb 10, 2018

krizhanovsky mentioned this issue Feb 10, 2018

Variables and conditions for custom HTTP headers #907

Open

krizhanovsky added the crucial label Feb 10, 2018

krizhanovsky self-assigned this Mar 31, 2018

krizhanovsky mentioned this issue Oct 9, 2018

Hardening the sticky cookie module #1075

Open

krizhanovsky mentioned this issue Nov 22, 2018

Multi-pattern regular expressions #496

Open

krizhanovsky added a commit that referenced this issue Nov 24, 2018

Add TODO comment for #732 for caching policy matching

146eecf

krizhanovsky mentioned this issue Sep 29, 2019

HTTP/2 HPACK layer implementation (#309). #1338

Merged

krizhanovsky mentioned this issue Dec 27, 2021

Caching by cookie #1544

Closed

2 tasks

krizhanovsky modified the milestones: 1.3 TBD( Web server & advanced strings), 1.2 TBD Jan 3, 2022

krizhanovsky mentioned this issue Feb 17, 2022

Add option for set cache policy based on cookie name or pattern #1564

Merged

krizhanovsky modified the milestones: 1.xx TBD, 1.x: TBD Apr 19, 2023

krizhanovsky removed their assignment Nov 12, 2023

krizhanovsky modified the milestones: 1.1: TBD, 1.2 - TBD Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast HTTP match #732

Fast HTTP match #732

krizhanovsky commented May 13, 2017 •

edited

Loading

Fast HTTP match #732

Fast HTTP match #732

Comments

krizhanovsky commented May 13, 2017 • edited Loading

HTTP Load balancing

HTTPtables

Strings matching

Memory spacial locality

The chains

Generic strings matching

Testing

Functional tests

Performance

krizhanovsky commented May 13, 2017 •

edited

Loading