Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content compression & decompression #636

Open
krizhanovsky opened this issue Nov 7, 2016 · 0 comments
Open

Content compression & decompression #636

krizhanovsky opened this issue Nov 7, 2016 · 0 comments

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Nov 7, 2016

Depends on #77 (Kernel-User Space Transport).

Content compression and decompression must be implemented. The logic is described by the following options. Using SIMD instruction set if applicable is very wished. See performance benchmarks of vectorization optimizations of zlib.

Since there are many available HTTP compression algorithms, the algorithms must be pluggable. Most probably we should just offload the compression tasks to the 3rd-party libraries.

Consider at least Brotli and Zopfli as more efficient and slow algorithms. gzip should be used for gzip module, brotli for Brotli compression algorithm and so on:

    gzip [0-9]

The option specifies compression level of transferred responses. Default value 0 means no compression at all, other values defines compression level. If compression is enabled, then responses must be stored in cache in compressed form.

    gzip_input [0-9]

Specifies whether to decompress request bodies if Content-Encoding: gzip header is specified. See similar logic in Apache HTTPD.

    gunzip [1,0]

Decompress received responses if they're compressed.

Compression and decompression must be performed in user space using Kernel-User Space Transport. Data compression logic is slow and isn't a mission critical logic. Thus it seems not a good candidate for kernel space. However, there is probable scenario when Tempesta FW is used as a compression offloading proxy, without caching. In this scenario, HTTP messages to be compressed are mapped to user-space for compression and softirq context is switched via GFSM to process other HTTP requests, HTTP error codes and so on.

Web-server mode is assumed to use the loading script which can load two versions, compressed and plain, of a resource.

RFC 7231 5.3.4 says:

    2.  If the representation has no content-coding, then it is
        acceptable by default unless specifically excluded by the
        Accept-Encoding field stating either "identity;q=0" or "*;q=0"
        without a more specific entry for "identity".

Really, if a browsers sends

    Accept-Encoding: gzip, deflate

, then Apache HTTPD still can send Content-Type: text/html; charset=UTF-8, i.e. plain text representation w/o compression. Thus, if the user space compression/decompression threads are overloaded, then in most cases (unless a client explicitly prohibits plain text with identity;q=0) we should send uncompressed content.

Responses must be compressed/decomressed with full skb granularity (i.e. skb with all filled page fragments). So compressing threads return results when full skb is ready or http processing code passes the last HTTP response chunk (i.e. when the response is fully read).

    gzip_type <MIME type>

Defines which content types must be compressed. Default value is text/html.

    gzip_length <min> <max>

Defines response lengths range which must be compressed/decompressed. Default values are 128 and 1400. Maximum size has sense with gzip_threads 0.

Data compression is CPU intensive tasks which can lead to DoS, so following considerations must be taken into account (see the referenced paper for details):

  • TfwClient must be accounted and limited by Frang in how many bytes were compressed and decompressed (either at request or response processing times) by the client;

  • There should be one more configuration option gzip_buffer specifying mximum size of decompressed data (see 4.1.1 in the paper). Decompression must be performed by chanks using a buffer of the specified size.

  • Compression and decompression must be done as late as possible, after all verification tasks.

References

@krizhanovsky krizhanovsky added this to the 0.5.0 Web Server milestone Nov 7, 2016
@krizhanovsky krizhanovsky assigned keshonok and unassigned keshonok Jan 4, 2017
@krizhanovsky krizhanovsky modified the milestones: 0.6 WebOS, 0.5.0 Web Server Feb 13, 2017
@krizhanovsky krizhanovsky modified the milestones: backlog, 0.11 Tempesta Language Jan 15, 2018
@krizhanovsky krizhanovsky modified the milestones: 1.xx TBD, backlog Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants