-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Zstandard (or Zstd) is a fast compression algorithm that was published by Facebook in 2015, and had its first stable release in May 2021.
Their official repo offers a C implementation. https://github.com/facebook/zstd
Data compression mechanism specification: https://datatracker.ietf.org/doc/html/rfc8478
Features:
- It is faster than Deflate, especially in decompression, while offering a similar compression ratio.
- It's maximum compression level is similar to that of lzma and performs better than lza and bzip2.
- It reached the Pareto Frontier, as it decompresses faster than any other currently-available algorithm with similar or worse compression ratio.
- It supports multi-threading.
- It can be saved to a *.zst file.
- It has a dual BSD+GPLv2 license. We would be using the BSD license.
It's used by:
- The Linux Kernel as a compression option for btrfs and SquashFS since 2017.
- FreeBSD for coredumps.
- AWS RedShift for databases.
- Canonical, Fedora and ArchLinux for their package managers.
- Nintendo Switch to compress its files.
We could offer a stream-based class, like we do for Deflate with DeflateStream or GZipStream, but we should also consider offering a stream-less static class, since it's a common request.
API proposal (by @rzikm)
- The API follows the precedents set by
BrotliStream,BrotliEncoder,BrotliDecoder, New additions areZstandardDictionary. and some additional members on the encoder/decoder/options. So far we don't expose custom dictionary support for Brotli, but there is demand for it: [API Proposal]: Custom dictionary support for Brotli #118784- new additions which don't have a precedent are marked with // NEW
- New API available in .NET 11, ships inbox, no OOB shipment for previous releases
- New assembly System.IO.Compression.Zstandard
- Native implementation from facebook/zstd linked to System.IO.Compression.Native (like we do with brotli)
Some notes about Zstandard dictionaries:
- underlying zstd implementation accepts dictionaries in multiple ways:
- prepared dictionaries (ZSTD_(C|D)Dict*) - good for reusal, included in this API proposal as
ZstandardDictionary, these do not benefit from "EnableLongDistanceMatching" - pointers to void* + length - may be more efficient if dictionary is to be used only once, not covered by this proposal
- as a "prefix" of the data (ZSTD_(C|D)Ctx_refPrefix, accept void* + length) - together with EnableLongDistanceMatching and large enough Window enables use of Zstandard as a diff engine to produce binary patches. - included in this proposal
- prepared dictionaries (ZSTD_(C|D)Dict*) - good for reusal, included in this API proposal as
namespace System.IO.Compression
{
// NEW
// represents a prepared dictionary to improve compression, mostly
// useful when the same dictionary can be reused to compress many small files (<100 kB).
//
// Internally wraps a pair of safe handles
// ZSTD_CDict* - dictionary for compression
// ZSTD_DDict* - dictionary for decompression
// and initializes both of them from the given data
// These dictionaries are immutable and thus thread safe and can be reused across encoders/decoders in
// concurrent processing.
public sealed partial class ZstandardDictionary : System.IDisposable
{
internal ZstandardDictionary() { }
public void Dispose() { }
// Creates a new ZstandardDictionary instance. The provided buffer
// `quality` dictates the quality of the compression and overrides the quality setting on the encoder.
// The quality parameter has no effect during decompression.
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlySpan<byte> buffer, int quality) { throw null; }
// like above, but uses default quality
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlySpan<byte> buffer) { throw null; }
// Alternatively, the Create methods can accept ReadOnlyMemory to which a reference is kept internally,
// This can avoid the need for copying the buffer, but the dictionary needs to keep a MemoryHandle, pinning
// the provided memory during the lifetime of the dictionary.
// optional:
// `type` is a new enum (not used elsewhere):
// - Raw - buffer is treated as raw data, implies some small
// processing when loading (processed as if it were a prefix
// of the compressed data), any data can be used as raw
// dictionary, will fail only on very small buffers.
// - Serialized - Assumes serialized version of a preprocessed dictionary
// (magic bytes, entropy tables, raw data), can fail if
// structure is compromised
// - Detect - default behavior, checks for presence of magic bytes, then
// as either Raw or Serialized
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlySpan<byte> buffer, int quality, ZstandardDictionaryType type) { throw null; }
// Members below are for ability to create an optimized dictionary based on
// training data.
//
// Note:
//
// Dict training API taking more detailed options would be more complicated as
// there are multiple training algorithms to chose from and each has
// different tuning parameters. Since it is not clear yet how big a demand
// there is for the training APIs, we think it better to avoid adding large
// dictionary training API surface for it now.
// Create a small dictionary (up to `maxDictionarySize`) that would help efficiently
// zstd suggest max size 100kB dictionaries, and that size of the training data
// to be ~100x the size of the resulting dictionary
// encode given samples.
// samples - all training samples concatenated in one large buffer
// lengths - lengths of the samples
// Uses default training parameters in zstd
public static System.IO.Compression.ZstandardDictionary TrainFromSamples(System.ReadOnlySpan<byte> samples, ReadOnlySpan<int> lengths, int maxDictionarySize) { throw null; }
// alternative to above, more natural API, but does not match the underlying native API (requires temporary copying of data):
public static System.IO.Compression.ZstandardDictionary TrainFromSamples(ReadOnlyMemory<ReadOnlyMemory<byte>> samples, int maxDictionarySize) { throw null; }
// access to raw dictionary bytes (e.g. to be able to store them on disk)
public ReadOnlyMemory<byte> Data { get { throw null; } }
}
// wrapper for multiple compression options, extension point if we decide
// to expose more options in the future
// Note 1: zstd distinguishes between sticky and non-sticky parameters. Sticky parameters are not unset by Reset()
// and are carried over to the next compressed frame if the encoder instance is reused.
// This class is supposed to contain only the sticky parameters, while non-sticky parameters should be set
// via individual instance members on the Encoder
// Note 2: Some parameters are dynamically adjusted acording to the other parameters (e.g. according to Quality),
// Being able to represent "don't explicitly set anything" is desirable in many cases
public sealed partial class ZstandardCompressionOptions
{
// NEW
public static int DefaultWindow { get { throw null; } } // 23
public static int MinWindow { get { throw null; } } // 10
public static int MaxWindow { get { throw null; } } // 31
// NEW
public static int DefaultQuality { get { throw null; } } // defined by zstd implementation as 3
public static int MaxQuality { get { throw null; } } // ZSTD_maxCLevel
public static int MinQuality { get { throw null; } } // ZSTD_minCLevel
// quality parameter, higher -> slower and better compression
// name chosen for parity with other compression APIs
// alternatively, we can call this Level to match zstd terminology
// 0 = implementation default (3)
public int Quality { get { throw null; } set { throw null; } };
// optional custom dictionary. If set, the Quality parameter is ignored and the quality set during
// dictionary creation takes precedence
public System.IO.Compression.ZstandardDictionary? Dictionary { get { throw null; } set { throw null; } }
// NEW (BrotliCompressionOptions does not expose this value yet, but there is ask for it)
// size of the backreference window *in bits* (same name as for ZLib), actual size is (1 << Window)
// 0 = "use default window"
public int Window { get { throw null; } set { throw null; } };
// below are some more advanced parameters, these are not necessary for MVP, all of which are NEW
// hint for the size of the block sizes that the encoder will output
// smaller size => more frequent outputs => lower latency when streaming
// valid range = [1340 .. 1<<17]
// 0 = no hint, implementation defined behavior
public int TargetBlockSize { get { throw null; } set { throw null; } }
// appends 32-bit cheksum at the end of the compressed content. This checksum is checked during
// decompression can lead to failures if data are corrupted.
// default false
public bool AppendChecksum { get { throw null; } set { throw null; } }
// Enable long distance matching. This parameter is designed to improve compression ratio
// for large inputs, by finding large matches at long distance. It increases memory usage and
// default window size.
// Together with ZstandardEncoder.ReferencePrefix() enable use for zstd binary diffing of (potentially large) files
public bool EnableLongDistanceMatching { get { throw null; } set { throw null; } }
}
// standalone decoder implementation, closely copies BrotliDecoder design
public partial struct ZstandardDecoder : System.IDisposable
{
private object _dummy;
private int _dummyPrimitive;
// Decoder can be also default-initialized => no dictionary is used, default parameters are used
// public ZstandardDecoder()
// specify decompression dictionary
public ZstandardDecoder(System.IO.Compression.ZstandardDictionary dictionary) { throw null; }
// allow specify max window for decompression, decompression requiring larger window (=> more memory)
// will fail
public ZstandardDecoder(int maxWindow) { throw null; }
// combined ctor for the above
// Question: There are currently no other stable parameters exposed by ZSTD, do we need ZstandardDecompressionOptions?
public ZstandardDecoder(System.IO.Compression.ZstandardDictionary dictionary, int maxWindow) { throw null; }
// Question: how do we access the specific error code in case of InvalidData? e.g. how does user know that data is valid,
// but would require more memory to decompress and thus larger maxWindow?
public System.Buffers.OperationStatus Decompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { throw null; }
public void Dispose() { }
// NEW
// Resets the decoder state so that it can be reused for more frames
public void Reset() { }
// NEW
// sets a dictionary in a prefix mode, exposes ZSTD_CCtx_setPrefix. Internally pins the memory until
// disposed/GCd or Reset() (the prefix is "non-sticky" parameter which is cleared by Reset)
public void ReferencePrefix(ReadOnlyMemory<byte> prefix);
// NEW
// This exposes ZSTD_decompressBound, which
// - reads the size from the header, if present, or
// - estimates the the upper bound based on the information found in header
// NOTE: malicious input may edit the header to report arbitrary values, but zstd validates this bound set in header during decompression
// Question: should this return long?
public static int GetMaxDecompressedLength(System.ReadOnlySpan<byte> data) { throw null; }
// Alternative to above:
public static bool TryGetDecompressedLength(System.ReadOnlySpan<byte> data, out int length) { throw null; }
// one-off decompressing functions
// note that these don't need maxWindow as the parameter is relevant only during streaming compression
public static bool TryDecompress(System.ReadOnlySpan<byte> source, System.IO.Compression.ZstandardDictionary dictionary, System.Span<byte> destination, out int bytesWritten) { throw null; }
public static bool TryDecompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten) { throw null; }
}
// symmetric API to ZstandardDecoder
public partial struct ZstandardEncoder : System.IDisposable
{
private object _dummy;
private int _dummyPrimitive;
public ZstandardEncoder(int quality) { throw null; }
public ZstandardEncoder(System.IO.Compression.ZstandardDictionary dictionary) { throw null; }
public ZstandardEncoder(int quality, int window) { throw null; }
public ZstandardEncoder(System.IO.Compression.ZstandardDictionary dictionary, int window) { throw null; }
// NEW
// does not store reference to options, only reads the data, most flexible options that can replace all the above
public ZstandardEncoder(ZstandardCompressionOptions options) { throw null; }
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinalBlock) { throw null; }
public System.Buffers.OperationStatus Flush(System.Span<byte> destination, out int bytesWritten) { throw null; }
public void Dispose() { }
// NEW
// Resets the decoder state so that it can be reused for more frames
public void Reset() { }
// NEW (symmetry with ZstandardDecoder)
// sets a dictionary in a prefix mode, exposes ZSTD_CCtx_setPrefix. Internally pins the memory until
// disposed/GCd or Reset() (the prefix is "non-sticky" parameter which is cleared by Reset)
public void ReferencePrefix(ReadOnlyMemory<byte> prefix);
// NEW
// ZSTD_CCtx_setPledgedSrcSize, sets size of the compressed data (so that it can be written into the header)
// May be called only before the first Compress method, or after Reset(). Calling Reset() clears the size.
// The size is validated during compression, not respecting the value causes OperationStatus.InvalidData.
// QUESTION: Should this accept long?
public void SetSourceSize(int size);
public static int GetMaxCompressedLength(int inputSize) { throw null; }
// one-off compression functions
// note that `quality` and `dictionary` are mutually exclusive
public static bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten) { throw null; }
public static bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten, int quality, int window) { throw null; }
// NEW (dictionary support)
public static bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten, System.IO.Compression.ZstandardDictionary dictionary, int window) { throw null; }
// NEW
// this one is the most flexible, but can be ommited as it is just a wrapper for creating Encoder and single call to Compress
public static bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesWritten, System.IO.Compression.ZstandardCompressionOptions options) { throw null; }
}
// Wrapper around ZstandardEncoder/Decoder to provide Stream API
public sealed partial class ZstandardStream : System.IO.Stream
{
// similar ctor members as BrotliStream
// QUESTION: why don't we make leaveOpen always default to null?
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode) { }
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, bool leaveOpen) { }
// this ctor is needed to perform decompression with dictionary (but can be achieved by a ctor taking ZstandardDecoder listed below)
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, ZstandardDictionary dictionary, bool leaveOpen = false) { }
// these imply CompressionMode.Compress
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel) { }
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel, bool leaveOpen) { }
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.ZstandardCompressionOptions compressionOptions, bool leaveOpen = false) { }
// NEW
// These constructors allow reuse of ZstandardEncoder/Decoder instances
// Disposing of the stream `Reset()`s the encoder/decoder
// Note that this works even though Encoder/Decoder are structs.
// QUESTION: should we add `bool ownsEncoder = false` parameter to allow passing ownership?
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.ZstandardDecoder decoder, bool leaveOpen = false) { }
public ZstandardStream(System.IO.Stream stream, System.IO.Compression.ZstandardEncoder encoder, bool leaveOpen = false) { }
// below are usual compression stream members
public System.IO.Stream BaseStream { get { throw null; } }
public override bool CanRead { get { throw null; } }
public override bool CanSeek { get { throw null; } }
public override bool CanWrite { get { throw null; } }
public override long Length { get { throw null; } }
public override long Position { get { throw null; } set { } }
public override System.IAsyncResult BeginRead(byte[] buffer, int offset, int count, System.AsyncCallback? callback, object? state) { throw null; }
public override System.IAsyncResult BeginWrite(byte[] buffer, int offset, int count, System.AsyncCallback? callback, object? state) { throw null; }
protected override void Dispose(bool disposing) { }
public override System.Threading.Tasks.ValueTask DisposeAsync() { throw null; }
public override int EndRead(System.IAsyncResult asyncResult) { throw null; }
public override void EndWrite(System.IAsyncResult asyncResult) { }
public override void Flush() { }
public override System.Threading.Tasks.Task FlushAsync(System.Threading.CancellationToken cancellationToken) { throw null; }
public override int Read(byte[] buffer, int offset, int count) { throw null; }
public override int Read(System.Span<byte> buffer) { throw null; }
public override System.Threading.Tasks.Task<int> ReadAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken) { throw null; }
public override System.Threading.Tasks.ValueTask<int> ReadAsync(System.Memory<byte> buffer, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public override int ReadByte() { throw null; }
public override long Seek(long offset, System.IO.SeekOrigin origin) { throw null; }
public override void SetLength(long value) { }
public override void Write(byte[] buffer, int offset, int count) { }
public override void Write(System.ReadOnlySpan<byte> buffer) { }
public override System.Threading.Tasks.Task WriteAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken) { throw null; }
public override System.Threading.Tasks.ValueTask WriteAsync(System.ReadOnlyMemory<byte> buffer, System.Threading.CancellationToken cancellationToken = default(System.Threading.CancellationToken)) { throw null; }
public override void WriteByte(byte value) { }
}
}API Usage
Request decompression in ASP.NET Core
using System.IO.Compression;
using Microsoft.AspNetCore.RequestDecompression;
namespace Example;
// See: https://learn.microsoft.com/en-us/aspnet/core/fundamentals/middleware/request-decompression
sealed class ZstandardDecompressionProvider : IDecompressionProvider
{
public Stream GetDecompressionStream(Stream stream) => new ZstandardStream(stream, CompressionMode.Decompress, leaveOpen: true);
}
// Register in Program.cs
// ...
builder.Services.AddRequestDecompression(x =>
{
x.DecompressionProviders.Add("zstd", new ZstandardDecompressionProvider());
});
// ...Response compression in ASP.NET Core
using System.IO.Compression;
using Microsoft.AspNetCore.ResponseCompression;
using Microsoft.Extensions.Options;
using Microsoft.Extensions.Options;
namespace Example;
// See: https://learn.microsoft.com/en-us/aspnet/core/performance/response-compression
sealed class ZstandardCompressionProviderOptions : IOptions<ZstandardCompressionProviderOptions>
{
public CompressionLevel Level { get; set; } = CompressionLevel.Fastest;
ZstandardCompressionProviderOptions IOptions<ZstandardCompressionProviderOptions>.Value => this;
}
sealed class ZstandardCompressionProvider : ICompressionProvider
{
public ZstandardCompressionProvider(IOptions<ZstandardCompressionProviderOptions> options)
{
Options = options.Value;
}
private ZstandardCompressionProviderOptions Options { get; }
public Stream CreateStream(Stream outputStream) => new ZstandardStream(outputStream, Options.Level, leaveOpen: true);
public string EncodingName { get; } = "zstd";
public bool SupportsFlush { get; } = true;
}
// Register in Program.cs:
// ...
builder.Services.AddOptions<ZstandardCompressionProviderOptions>()
.Configure(zstd =>
{
zstd.Level = CompressionLevel.Optimal;
});
builder.Services.AddResponseCompression(x =>
{
x.EnableForHttps = true;
x.Providers.Add<ZstandardCompressionProvider>();
});One-shot APIs
byte[] source = new byte[256000];
Random.Shared.NextBytes(source);
int maxLength = ZstandardEncoder.GetMaxCompressedLength(source.Length);
var resultBuffer = new byte[maxLength];
Assert.True(ZstandardEncoder.TryCompress(source, resultBuffer, out int bytesWritten));
Assert.True(maxLength >= bytesWritten);
int decompressedLength = ZstandardDecoder.GetMaxDecompressedLength(resultBuffer.AsSpan(0, bytesWritten));
var decompressedBuffer = new byte[decompressedLength];
Assert.True(ZstandardDecoder.TryDecompress(resultBuffer.AsSpan(0, bytesWritten), decompressedBuffer.AsSpan(), out var bytesDecompressed));
Assert.True(decompressedBuffer.AsSpan(0, bytesDecompressed).SequenceEqual(source.AsSpan()));Compression using dictionaries
byte[] originalData = "Hello, World! This is a test string for Zstandard compression and decompression."u8.ToArray();
byte[] compressedBuffer = new byte[ZstandardEncoder.GetMaxCompressedLength(originalData.Length)];
byte[] decompressedBuffer = new byte[originalData.Length * 2];
using ZstandardDictionary dictionary = ZstandardDictionary.Create(CreateSampleDictionary(), quality);
int bytesWritten;
int bytesConsumed;
{
using var encoder = new ZstandardEncoder(dictionary, ZstandardCompressionOptions.DefaultWindow);
OperationStatus compressResult = encoder.Compress(originalData, compressedBuffer, out bytesConsumed, out bytesWritten, true);
Assert.Equal(OperationStatus.Done, compressResult);
}
Assert.Equal(originalData.Length, bytesConsumed);
Assert.True(bytesWritten > 0);
int compressedLength = bytesWritten;
{
using var decoder = new ZstandardDecoder(dictionary);
OperationStatus decompressResult = decoder.Decompress(compressedBuffer.AsSpan(0, compressedLength), decompressedBuffer, out bytesConsumed, out bytesWritten);
Assert.Equal(OperationStatus.Done, decompressResult);
}
Assert.Equal(compressedLength, bytesConsumed);
Assert.Equal(originalData.Length, bytesWritten);
Assert.Equal(originalData, decompressedBuffer.AsSpan(0, bytesWritten));Opening compressed TAR archives
using FileStream compressedStream = File.OpenRead("/home/dotnet/SourceDirectory/compressed.tar.zst");
using ZstandardStream decompressor = new(compressedStream, CompressionMode.Decompress);
TarFile.ExtractToDirectory(source: decompressor, destinationDirectoryName: "/home/dotnet/DestinationDirectory/", overwriteFiles: false);Using Encoder/Decoder with Pipes
static async Task CompressPipelineDataAsync(PipeReader reader, PipeWriter writer)
{
// this code is a bit naive, but illustrates the usage
using var encoder = new ZstandardEncoder(quality: 6, window: 22);
while (true)
{
var result = await reader.ReadAsync();
var buffer = result.Buffer;
if (buffer.IsEmpty && result.IsCompleted)
{
var finalMemory = writer.GetMemory(1024);
encoder.Compress(ReadOnlySpan<byte>.Empty, finalMemory.Span, out _, out int finalBytes, isFinalBlock: true);
writer.Advance(finalBytes);
break;
}
foreach (var segment in buffer)
{
var outputMemory = writer.GetMemory(segment.Length * 2);
encoder.Compress(segment.Span, outputMemory.Span, out int consumed, out int written, isFinalBlock: false);
writer.Advance(written);
}
reader.AdvanceTo(buffer.End);
await writer.FlushAsync();
}
await reader.CompleteAsync();
await writer.CompleteAsync();
}Using ReferencePrefix to produce binary patches
// this is the "base" file, the starting point for the patch
byte[] fromFileBytes = File.ReadAllBytes(fromFile.FullName);
ZstandardCompressionOptions options = new ZstandardCompressionOptions()
{
Window = (int)Math.Log2(fromFileBytes.Length) + 1, // Allow using entire prefix file as the window
EnableLongDistanceMode = true, // needed for efficient diffs of large files
};
using ZstandardEncoder encoder = new ZstandardEncoder(options);
encoder.ReferencePrefix(fromFileBytes);
using Stream inputStream = inputFile.OpenRead(); // target file
using Stream outputStream = outputFile.Create(); // patch file
// pass configured encoder to ZstandardStream
using ZstandardStream zstandardStream = new ZstandardStream(outputStream, encoder);
await inputStream.CopyToAsync(zstandardStream);Applying the patch is similar
byte[] fromFileBytes = File.ReadAllBytes(fromFile.FullName);
int window = (int)Math.Log2(fromFile.Length) + 1;
using ZstandardDecoder decoder = new ZstandardDecoder(window);
decoder.ReferencePrefix(fromFileBytes!);
using Stream inputStream = inputFile.OpenRead(); // patch file
using Stream outputStream = outputFile.Create(); // target file
using ZstandardStream zstandardStream = new ZstandardStream(inputStream, decoder);
await zstandardStream.CopyToAsync(outputStream, cancellationToken);below are example file sizes when producing patch between tarballs of Linux Kernel source code, .patch.zst file is binary patch that can produce linux-6.17-rc7.tar from linux-6.16.tar`
-rwxr-xr-x 1 rzikm rzikm 1592842240 Sep 26 14:27 linux-6.16.tar
-rwxr-xr-x 1 rzikm rzikm 1600378880 Sep 26 14:27 linux-6.17-rc7.tar
-rw-r--r-- 1 rzikm rzikm 214944413 Sep 29 13:58 linux-6.17-rc7.tar.zst
-rwxr-xr-x 1 rzikm rzikm 7387096 Sep 26 14:27 linux-6.17-rc7.tar.patch.zst
Alternative Designs
Encoder/Decoder types as class instead of structs
Structs chosen for similarity with Brotli.
Separate Compression and Decompression Dictionary
Rename ZstandardDictionary to ZstandardCompressionDictionary
Add ZstandardDecompressionDictionary
public sealed partial class ZstandardDecompressionDictionary : System.IDisposable
{
internal ZstandardDecompressionDictionary() { }
public void Dispose() { }
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlySpan<byte> buffer) { throw null; }
// optional:
// `type` is a new enum (not used elsewhere):
// - Raw - buffer is treated as raw data, implies some small
// processing when loading (processed as if it were a prefix
// of the compressed data), any data can be used as raw
// dictionary, will fail only on very small buffers.
// - Serialized - Assumes serialized version of a preprocessed dictionary
// (magic bytes, entropy tables, raw data), can fail if
// structure is compromised
// - Detect - default behavior, checks for presence of magic bytes, then
// as either Raw or Serialized
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlySpan<byte> buffer, ZstandardDictionaryType type) { throw null; }
// Optional: access to raw dictionary bytes, for symmetry with compression dictionary
public ReadOnlyMemory<byte> Data { get { throw null; } }
}Duplicate respective constructors on ZstandardStream, adjust ctors on encoder/decoder/options as needed.
Don't copy data passed to ZstandardDictionary
Ctors in the proposal accept ReadOnlySpan<byte> for flexibility, but that means that an internal copy of the data needs to be created. This could be avoided by accepting ReadOnlyMemory<byte> instead (The memory would be pinned for the lifetime of the dictionary by ReadOnlyMemory.Pin()`). This can save some allocations (recommended dictionary size is <100kB) but risks accidental overwriting the dictionary data after they were loaded by zstd.
public sealed partial class ZstandardDictionary : System.IDisposable
{
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlyMemory<byte> buffer, int quality) { throw null; }
// like above, but uses default quality
public static System.IO.Compression.ZstandardDictionary Create(System.ReadOnlyMemory<byte> buffer) { throw null; }
}