-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Add ZlibEncoder and ZlibDecoder to System.IO.Compression. #62113
Comments
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsBackground and motivationCurrently there is no non-stream based apis to zlib compression. As such I feel like an encoder / decoder implementation is needed similar to the Brotli implementation. The brotli implementation also uses the encoder / decoder internally in the streams, which can help make the implementations of the zlib based streams (GZipStream, DeflateStream, and ZLibStream) better. A single ZlibEncoder and ZlibDecoder that takes a class of values (ZlibOptions), where ZlibOptions also has subclasses named DeflateOptions, and GZipOptions where only the window bits are different. Dependency issues that will need to be addressed before this: Implementing this issue should also resolve this one as well: I currently have a baseline implementation locally of this (except for the stream changes that would need to be done), and it should be ready by the time .NET 7 goes into an api freeze (until .NET 8's development starts). API Proposal public enum ZlibOperationStatus
{
VersionError = -6,
BufError,
MemError,
DataError,
StreamError,
ErrorNo,
Ok,
Done = Ok,
StreamEnd,
NeedDictionary,
DestinationTooSmall,
OperationNotCompression,
OperationNotDecompression,
Disposed,
}
public struct ZlibEncoder : System.IDisposable
{
public ZlibEncoder(ZlibOptions options) { }
public bool IsDisposed { get { throw null; } }
public void Dispose() { throw null; }
#pragma warning disable CS3001 // Argument type is not CLS-compliant
public bool TryCompress(System.ReadOnlySpan<byte> source, System.Span<byte> dest, out uint adler32, out uint bytesWritten) { throw null; }
#pragma warning restore CS3001 // Argument type is not CLS-compliant
#pragma warning disable CS3001 // Argument type is not CLS-compliant
public ZlibOperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> dest, out uint adler32, out uint bytesWritten) { throw null; }
#pragma warning restore CS3001 // Argument type is not CLS-compliant
}
public struct ZlibDecoder : System.IDisposable
{
public ZlibDecoder(ZlibOptions options) { }
public bool IsDisposed { get { throw null; } }
public void Dispose() { throw null; }
#pragma warning disable CS3001 // Argument type is not CLS-compliant
public bool TryDecompress(ReadOnlySpan<byte> source, Span<byte> dest, out uint adler32, out uint bytesWritten, out uint avail) { throw null; }
#pragma warning restore CS3001 // Argument type is not CLS-compliant
#pragma warning disable CS3001 // Argument type is not CLS-compliant
public ZlibOperationStatus Decompress(ReadOnlySpan<byte> source, Span<byte> dest, out uint adler32, out uint bytesWritten, out uint avail) { throw null; }
#pragma warning restore CS3001 // Argument type is not CLS-compliant
} API Usage// allocate source and dest arrays.
using var zlibEncoder = new ZlibEncoder(new ZlibOptions(ZlibCompressionLevel.Level5));
bool result = zlibEncoder.TryCompress(source, dest, out uint adler, out uint bytesWritten);
if (!result)
{
Console.WriteLine("Compression failed.");
}
else
{
Console.WriteLine($"Adler-32: {adler}, Bytes Compressed: {bytesWritten}.");
}
// allocate new source array to compare against the original one.
using var zlibDecoder = new ZlibDecoder(new ZlibOptions());
result = zlibDecoder.TryDecompress(dest, decSource, out adler, out bytesWritten, out uint avail);
if (!result)
{
Console.WriteLine("Decompression failed.");
}
else
{
Console.WriteLine($"Adler-32: {adler}, Bytes Decompressed: {bytesWritten}, Available: {avail}.");
} Alternative DesignsI wanted to use the System.Buffers.OperationStatus enum, but I felt it lacked enough members to denote the zlib specific status codes (where they would be used for zlib, deflate, and gzip compression / decompression). Likewise adler32's and the total_out member from zlib are represented as unsigned, I do not know if I should have the encoder / decoder output signed, or leave them unsigned and keepthe CLS-Compliancy issues suppressed from it. RisksMinimal
|
I have modified the proposal somewhat to add a few things I missed to ensure that TryDecompress will never throw or get exceptions that it would need to catch. |
@AraHaan thank you so much for the detailed proposal. This proposal would partially address both #42820 and #39327 . I don't think this PR depends on the first one, we can consider that one the "uber issue", because it was attempting to address the same you're addressing, but for all compression stream classes. I see you added your own ZLib-specific operation status enums. Contrary to runtime/src/libraries/System.Private.CoreLib/src/System/Buffers/OperationStatus.cs Line 10 in 57bfe47
Some feedback on the new enum values:
What's preventing us from just reusing |
I guess OperationStatus could add an generic According to the Manual DataError is for when "inflate detects an error in the zlib compressed data format, which means that either the data is not a zlib stream to begin with, or that the data was corrupted somewhere along the way since it was compressed." Also in that manual StreamEnd is returned when inflate/deflate is at the end of the zlib/deflate/gzip stream. https://zlib.net/zlib_how.html is where I get some of the details as well too. As for BufError, that might fall under Edit: I have also removed ZlibOperationStatus with a minor change to System.Buffers.OperationStatus. cc: @carlossanlop I think this might just now be ready for review (hopefully). |
cc: @dotnet/area-system-io-compression I think this is ready for review. |
Anything else stalling this? |
I would like to see some way how many bytes of the original source were read to generate the output, to allow the next decompress to continue right there. Currently the implementation only shows the output buffer results, while the input buffer is just as useful. This is especially so in case some other information (e.g. not compressed) is written right after the zlib data, such as within Git package files. |
I think that can be accessed with the However I do agree with another thing as well, perhaps there should be an And possibly an |
@AraHaan I also need the LastBytesRead for this case. (The NextInIndex/ next_in_index as it is called in most zlib wrappers), to allow continuing right after the input that was processed. Not just NextOut*. |
Using streams, we need to scavenge the input buffer for raw processing as GIT packs delimit compressed data with uncompressed headers without specifying the compressed size. See #61405 for a working hack. public sealed struct OperationResult
{
public readonly OperationStatus Status;
public readonly int Read, Written;
} |
Also we have to consider if Read/Written should be However the native implementation I believe uses uint's for the sizes ( So it is something to consider to support files over 2 gigabytes in length as well, however such support is tricky to get correct. Also the implementation will also require that I would love to move ZlibEncoder and ZlibDecoder to System.IO.Compression.ZlibEncoder but it would need |
Both Stream and Span use Int as transfer format. ZLib uses an 32 bit value, which overflows in some well defined cases where third party code depends on. Crc32 is now also exposed via System.IO.Hashing, so maybe Adler32 belongs there too? |
crc32 is exported in System.IO.Compression.Native as well, but with my changes I also added adler32 which is also a part of the native zlib impl as well. |
I guess an option could be to replace the proposed public sealed struct ZlibResult
{
public readonly OperationStatus Status { get; } // internal setter.
public readonly int Read { get; } // internal setter.
public readonly int Written { get; } // internal setter.
} And then the operation function could return an instance of the struct that contains the information they need. |
That's what I said 8 days ago. |
I am going to work on this a bit more and will consider moving the properties to an |
Alright just got done with modifying the API locally. |
New public types: - System.IO.Compression.ZlibResult - System.IO.Compression.ZlibEncoder - System.IO.Compression.ZlibDecoder Implements: dotnet#62113.
|
Background and motivation
Currently there is no non-stream based apis to zlib compression. As such I feel like an encoder / decoder implementation is needed similar to the Brotli implementation.
The brotli implementation also uses the encoder / decoder internally in the streams, which can help make the implementations of the zlib based streams (GZipStream, DeflateStream, and ZLibStream) better.
A single ZlibEncoder and ZlibDecoder that takes a class of values (ZlibOptions), where ZlibOptions also has subclasses named DeflateOptions, and GZipOptions where only the window bits are different.
This issue partially addresses:
Implementing this issue should also resolve this one as well:
I currently have a baseline implementation locally of this
(except for the stream changes that would need to be done),and it should be ready by the time .NET 7 goes into an api freeze (until .NET 8's development starts).API Proposal
API Usage
Alternative Designs
I wanted to use the System.Buffers.OperationStatus enum, but I felt it lacked enough members to denote the zlib specific status codes (where they would be used for zlib, deflate, and gzip compression / decompression).Likewise adler32's and the total_out member from zlib are represented as unsigned, I do not know if I should have the encoder / decoder output signed, or leave them unsigned and keep the CLS-Compliancy issues suppressed from it.Risks
Minimal
Changelog
CalculateChecksum
method.bytesWritten
unsigned output parameter from (Try)Compress and (Try)Decompress and moved it to a property that stores the last amount of bytes written by (Try)Compress and (Try)Decompress.uint
toint
.The text was updated successfully, but these errors were encountered: