-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer for HTTP messages adjustment #1103
Comments
At the moment skb fragmentation looks quite expensive with all the free space lookups, new skbs allocations and fragments readjusting, so
|
In the most cases the most headers are small and doing the complex logic, especially with the proposed optimizer, is overkill and will slow down the logic even further. Instead we just should:
As an optimization we can still use the zero copy technique if the headers don't fit a page and there is a header larger than a page. |
It seems we can not fully avoid the "optimizer" and just use " the next power of 2 for the size from (1)" as in the previous comment: we can delete headers, add headers per vhost and even use variables (e.g. for URI) and we plan to extend the headers operation logic in HTTPtables - all in all there are a lot of variables. From the other side, if we say have a message with 510 bytes of headers, then apparently we fail with the next power of 2. Another solution, at least for HTTP/2, could be to:
BTW can we just use TfwPool associated with the modified HTTP message and jsut |
Just collected PFL statistics on our website installation (HTTPS):
It turns out that we should remove the optimization since it doesn't work in the most cases |
I suggest using following strategy for adding additional headers in HTTP2 messages:
|
Tempesta set `mit->bnd` value in `tfw_h2_resp_next_hdr` function, but in case when there is no hext header `mit->bnd` is stayed equal to zero. It is necessary to check this case and set `mit->bnd` manually before calling `tfw_h2_msg_rewrite_data`. It is temporary fix until #1103 will be implemented. (Necessary for adding hpack dynamic table size before first header).
Minutes for the #1820 benchmark:
|
In #1820 was introduced new approach for HTTP/1.1 <-> HTTP/2 transformations:
Current approach solves skb fragmentation problem, because now headers places into continuous memory blocks, thus we avoid fragments of small size. Also solves problem with too small response, that can't be encoded in-place, because we don't encode headers in-place anymore. HTTP1.1 messages adjustment is in the same state and requires some tests to determine if optimizations are needed. |
One more subject of this issue: During addition headers for HTTP1 response/request SKB may be heavy fragmented that leads to moving fragments from current SKB to next, however if data that contained in these fragments used by |
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
AUTO_SEGS_N not changed after #1103 fix, because most of responses has large size and with this value (8) we can cover higher percentage of responses
Tightly linked with #634 which is required for #515.
Current logic
Currently we adjust HTTP messages like:
This is a bunch of adjustment calls and each of the call does (1) relatively expensive skb fragmentation and (2) generates independent set of new skb frags. So at the end of the call there are many skb fragments.
For example consider we have following HTTP request:
And we need to:
2.2.2.2
toX-Forwarded-For
Connection
toclose
Via: 1.1 foo.com (Tempesta FW)
Keep-Alive
Currently we do following fragmentation steps:
2.2.2.2
and split the request to 2 frags - we have 3 frags now;keep-alive
is shorter thanclose
, so the last current fragment is split into two around the gap - 4 frags now;Via
is inserted before the trailing CRLF - one more split and one more new frag - we come to 6 frags;Keep-Alive
header produces a gap, so we finish with 7 frags for very small HTTP request.We do not move trailing CRLF and do the fragmentation stuff to save data pointers in HTTP message representation,
TfwHttpMsg
.Preliminary logic
Actually we have some rudiments for the logic, in particular we treat all user specified headers modifications in
TfwHdrMods
and process them at once in functions liketfw_h2_resp_adjust_fwd()
andtfw_h2_resp_next_hdr()
.Proposed design
The target of the function is to make only 1 memory (skb frament) allocation in worst case and use current skb overhead in most cases (see Testing section).
The adjustment logic must be split into 3 phases:
identify and collect metadata about all the adjustments - how much data we have to remove, add or modify and at which places;
run an optimizer, a separate and self-sufficient function which from the metadata above generates a vector of data points and lengths to add or remove with pointers to a chain of the adjustments. Some of the logic is already done in
tfw_http_msg_hdr_xfrm_str()
, so the work should affect the function.the vector is used for final data placement in the underlying HTTP message.
This is a generalized and more powerful way for what we currently do in
tfw_http_msg_del_hbh_hdrs()
by leavingConnection
header for further rewrite.The optimizer must coalesce all the data gaps
gap_sz
ignoring small (less thanZCP_MIN_DSZ
) data between them.ZCP_MIN_DSZ
is a define, I'd set it to something like 4 or 8 cache lines (please do a benchmark and find the best number). It must also consider available room in the skb linear dataskb_room
and size of data to insertadd_sz
. Ifgap_sz + add_sz <= skb_room && gap_sz + add_sz < ZCP_MIN_DSZ
, then just completely rewrite the part of the message. Usememmove()
and place new data at the end of the message instead of copying it in a auxiliary buffer. In this example it makes sense to just rewrite the whole data after...1.1.1.1
- usememmove()
to replace current headers if we have enough room or allocate 1 fragment and copyCookie
with CRLF's to the the new fragment.Obviously if we have a long
Cookie
in the example above, then we still can insert all the new headers before theCookie
and split the message, so we'll have 2 frags here.The good news is that we call
tfw_http_adjust_{req,resp}()
just before the message forwarding, so we don't have to fixTfwHttMsg
's headers table.HTTP/2
The issue becomes more crucial with HTTP/2 since HTTP/2 uses encoded HTTP headers, so an HTTP message is smaller and will have more quite small fragments introducing larger overhead than we have for HTTP/1.1.
Moreover, HTTP/1.1 <-> HTTP/2 transformations essentially require rewriting of each HTTP field, so basically for the message transformation it makes sense just to fully rebuild all the headers. In this case we need size of required allocation from the optimizer function.
The only exception when we may want to use skb fragment is too large header values, e.g. big Cookie value or URI. In this case we also make only one memory allocation, place all data before the big header(s) and after the header(s) in separate framents.
Probably, the skb fragmentation code (which is quite overcomplicated) isn't needed any more and can be just removed. Need to review it.
Also see #1368 (comment) for
tfw_h2_resp_next_hdr()
andTfwHdrMods
:Actually, that's fine to implement the task for HTTP/2 only as the main and performance-focused functionality.
Testing
The logic is in the core and involved in almost any proxy test case, so no need for specific tests. However, performance evaluation of current and optimized code is required as well performance measurements for defined parameters.
Choosing the right value for
ZCP_MIN_DSZ
is crucial for the task. It could happen that it makes sense to copy up to 1KB of data or more. So please prove the concept of Copy vs Zero-copy with good benchmarks in the hardware installation and a VM.Next argument to be evaluated is current skb allocation overhead. It may happen that with enlarged default overhead we can avoid additional skb fragments allocations.
Also please evaluate an opportunity to reserve more room in skb head, so that we can place more data for example before Cookie header without fragmentation, just moving network headers and request line.
The text was updated successfully, but these errors were encountered: