Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support transport-level compression #256

Merged
merged 41 commits into from
Mar 11, 2025
Merged

Conversation

mattrm456
Copy link
Contributor

@mattrm456 mattrm456 commented Mar 3, 2025

Many applications desire to either compress an entire data stream (or all datagrams), or compress the payload portion carried inside a higher-level message protocol. This PR acknowledges this not-usual requirement by providing an API mechanism to deflate/inflate a data stream, and formalizes the requirement that the entire transport is compressed by allowing a user to optionally apply a deflater to all outgoing data sent through an ntci::StreamSocket or ntci::DatagramSocket, and optionally apply an inflater to all incoming data received by an ntci::StreamSocket or ntci::DatagramSocket. Similar to how TLS is integrated, the user "sees" only the uncompressed data. The general idea is that, when a deflater is attached to a socket, all data given to "send" will be first, internally and automatically deflated before attempted to be copied to the socket send buffer/stored on the write queue. Similarly, all data copied from the socket receive buffer will be internally and automatically inflated before being staged in the read queue to be conditionally offered to the user for processing according to their receive criteria and read queue low watermark. Note that compression and encryption can be applied simultaneously; care is taken in the internal implementation to first deflate then encrypt when sending, but first decrypt then inflate when receiving.

As an initial proposal, this PR acknowledges that there are several compression techniques popular when compressing network traffic. To start, this PR supports "zlib", "gzip", "lz4", and "zstd". These techniques are enumerated for ease-of-selection by the user, with a consistent API abstraction over the selected algorithm. For the implementation of these algorithms, this PR is delegates to industry-standard third party libraries to perform the actual compression and decompression. These third-party libraries must be explicitly enabled at build time. If the thirdparty library implementing a selected compression technique was not configured as a dependency at build time, the initialization of the compressor will fail at run-time with a detectable error. Alternatively, users may "plug in" their own compressors implemented however they wish.

To build with internal support for the enumerated compression algorithms, perform the build as:

$ ./configure --with-zlib --with-zstd --with-lz4
$ make
$ make install

This PR introduces the following new components:

  • ntca_deflateoptions: The parameters that influence the behavior of an operation to compress data.
  • ntca_deflatecontext: The context in which a deflate operation completes.
  • ntca_inflateoptions: The parameters that influence the behavior of an operation to decompress data.
  • ntca_inflatecontext: The context in which a inflate operation completes.
  • ntca_compressiontype: Enumeration of well-known compression algorithms
  • ntca_compressiongoal: Enumeration of the desired trade-offs of speed vs. size
  • ntca_checksumtype: Enumeration of checksums used by the supported compression algorithms
  • ntca_checksum: Union of different checksum values and streaming update algorithms.
  • ntci_compression: Abstraction of a mechanism to deflate and inflate data according to a compression algorithm and framing protocol
  • ntci_compressiondriver: Pluggable factory that produces concrete compressors for a particular algorithm and framing protocol
  • ntctlc_plugin: Transport level compression; the concrete implementations of an abstract compressor implemented in terms of the thirdparty libraries zlib, liblz4, and libzstd (if configured at build-time.)

This PR integrates automatic compression of a communication through socket by adding new methods to ntci::StreamSocket and ntci::DatagramSocket called setWriteDeflater and setReadInflater. It is permitted to only apply compression in one direction (i.e. outgoing data is compressed but incoming data is not decompressed.) For example, see the usage of d_sendDeflater_sp in ntcr::StreamSocket::send() at ntcr_streamsocket.cpp:5472 and d_receiveInflater_sp in ntcr::StreamSocket::privateDequeueReceiveBuffer() at ntcr_streamsocket.cpp:2922. But note we have many code paths both in ntc{r,p}_streamsocket and ntc{r,p}_datagramsocket that must handle possible deflation and inflation when there is no encryption, and when encryption is also simultaneously enabled.

Compression support is tested in a new testing framework for the ntcf package. This testing machinery is not compiled into the library nor publically installed. Subsequent work will be performed to try to simply some the low-level tests in ntcf_system to be written in terms of this higher-level testing framework. Consider ntcf_test* to be long-term work in progress.

Comment on lines +57 to +59
const bsl::uint32_t CompressionFrameHeader::k_MAGIC = 1380730184;
#else
const bsl::uint32_t CompressionFrameHeader::k_MAGIC = 1212501074;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining where these magic number comes from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't come from anywhere, that's why they are "magic". Magic numbers correspond to identifiable byte sequences used to help pluck out frame boundaries in hex dumps.

@mattrm456 mattrm456 merged commit b7aeddf into bloomberg:main Mar 11, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants