-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
Related to #112656
Draft implementation PR #118783
Brotli is a flexible compression algorithm which allows specifying custom dictionary to achieve better compression performance on particular types of data. One of the possible uses for this is https://datatracker.ietf.org/doc/draft-ietf-httpbis-compression-dictionary/.
Technical background
When attaching a dictionary to encoder/decoder, brotli
library does not copy the provided dictionary data, so our implementation will have to ensure that
- the dictionary data is kept alive and (preferably) unmodified
- the dictionary data is pinned and is not moved by GC
For the above reason, this proposal creates a new class BrotliDictionary
which user can construct, and then reuse across multiple encoder/decoder/streams.
When constructing a dictionary, the native API accepts an enum specifying the dictionary type
typedef enum BrotliSharedDictionaryType { | |
/** Raw LZ77 prefix dictionary. */ | |
BROTLI_SHARED_DICTIONARY_RAW = 0, | |
/** Serialized shared dictionary. | |
* | |
* DO NOT USE: methods accepting this value will fail. | |
*/ | |
BROTLI_SHARED_DICTIONARY_SERIALIZED = 1 | |
} BrotliSharedDictionaryType; |
In theory, more than one dictionary formats may be supported in the future, so creating BrotliDictionary instances via factory method is preferred.
The way how a dictionary is attached to an encoder or decoder is assymetrical:
Encoder side accepts a separately constructed BrotliEncoderPreparedDictionary
object.
BROTLI_ENC_API BrotliEncoderPreparedDictionary*
BrotliEncoderPrepareDictionary(BrotliSharedDictionaryType type,
size_t data_size, const uint8_t data[BROTLI_ARRAY_PARAM(data_size)],
int quality,
brotli_alloc_func alloc_func, brotli_free_func free_func, void* opaque);
BROTLI_ENC_API BROTLI_BOOL BrotliEncoderAttachPreparedDictionary(
BrotliEncoderState* state,
const BrotliEncoderPreparedDictionary* dictionary);
Decoder side accepts a byte array.
BROTLI_DEC_API BROTLI_BOOL BrotliDecoderAttachDictionary(
BrotliDecoderState* state, BrotliSharedDictionaryType type,
size_t data_size, const uint8_t data[BROTLI_ARRAY_PARAM(data_size)]);
Both sides require the data array to be kept alive.
For simplicity, this proposal seeks to add the same signature for either side.
API Proposal
+ public sealed class BrotliDictionary : System.IDisposable
+ {
+ internal BrotliDictionary() { }
// Alternative name: CreateFromRawBytes or just Create
+ public static System.IO.Compression.BrotliDictionary CreateFromBytes(System.ReadOnlySpan<byte> buffer) { throw null; }
// The underlying native API also accepts a quality parameter, but only for Encoder path (Decoder part hardcodes MAX quality when attaching dictionary.
+ public static System.IO.Compression.BrotliDictionary CreateFromBytes(System.ReadOnlySpan<byte> buffer, int quality) { throw null; }
+ public void Dispose() { }
+ }
public struct BrotliDecoder : System.IDisposable
{
+ public void AttachDictionary(System.IO.Compression.BrotliDictionary dictionary) { }
}
public struct BrotliEncoder : System.IDisposable
{
+ public void AttachDictionary(System.IO.Compression.BrotliDictionary dictionary) { }
}
public class BrotliStream
{
+ public void AttachDictionary(System.IO.Compression.BrotliDictionary dictionary) { }
}
API Usage
BrotliDictionary dictionary = BrotliDictionary.CreateFromBuffer(RawDictionaryData);
BrotliStream stream = new BrotliStream(....);
stream.AttachDictionary(dictionary);
// use stream as usual
Alternative Designs
Accepting dictionaries in Encoder/Decoder ctor would also be possible, but then we need to introduce many constructor overloads (especially for BrotliStream). Also, keep in mind that it is possible to attach multiple dictionaries in Brotli, so additional overloads accepting a dictionary collection might be needed.
An approach requiring fewer allocations is possible for decoders, as the native API only requires a reference to an array (that must be kept alive/pinned for the lifetime of the Decoder). But the design is more complicated and error prone, and if we assume that dictionaries are going to be long-lived and reused, does not offer enough savings.
Risks
Lifetime management of the BrotliDictionary (more specificly, the native memory held by the safe handle within) can be tricky. since brotli does not do any internal refcounting to ensure the dictionary will be alive long enough, we need to perform the refcounting in Managed code on the relevant SafeHandle types.