Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deflate implemented from first principles #18923

Merged
merged 12 commits into from
Feb 15, 2024
Merged

Conversation

ianic
Copy link
Contributor

@ianic ianic commented Feb 13, 2024

Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.

Fixes #18062.

This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.

There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).

Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.

API

Here is current API in the compress package:

// deflate
    fn compressor(allocator, writer, options) !Compressor(@TypeOf(writer))
    fn Compressor(comptime WriterType) type

    fn decompressor(allocator, reader, null) !Decompressor(@TypeOf(reader))
    fn Decompressor(comptime ReaderType: type) type

// gzip
    fn compress(allocator, writer, options) !Compress(@TypeOf(writer))
    fn Compress(comptime WriterType: type) type

    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// zlib
    fn compressStream(allocator, writer, options) !CompressStream(@TypeOf(writer))
    fn CompressStream(comptime WriterType: type) type

    fn decompressStream(allocator, reader) !DecompressStream(@TypeOf(reader))
    fn DecompressStream(comptime ReaderType: type) type

// xz
   fn decompress(allocator: Allocator, reader: anytype) !Decompress(@TypeOf(reader))
   fn Decompress(comptime ReaderType: type) type

// lzma
    fn decompress(allocator, reader) !Decompress(@TypeOf(reader))
    fn Decompress(comptime ReaderType: type) type

// lzma2
    fn decompress(allocator, reader, writer !void

// zstandard:
    fn DecompressStream(ReaderType, options) type
    fn decompressStream(allocator, reader) DecompressStream(@TypeOf(reader), .{})
    struct decompress

The proposed naming convention:

  • Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
  • compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
  • compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type

/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type

Benchmark

Comparing this implementation with the one we currently have in Zig's standard library (std). Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases. More resutls in this repo.

Memory usage

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K. For inflate this library allocates 74.5K while std around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

Upgrade

If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:

const std = @import("std");

// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedFile("war_and_peace.txt");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    try oldDeflate(allocator);
    try new(std.compress.flate, allocator);

    try oldZlib(allocator);
    try new(std.compress.zlib, allocator);

    try oldGzip(allocator);
    try new(std.compress.gzip, allocator);
}

pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    var cmp = try pkg.compressor(buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    var dcp = pkg.decompressor(fbs.reader());

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldDeflate(allocator: std.mem.Allocator) !void {
    const deflate = std.compress.v1.deflate;

    // Compressor
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();
    // Remove allocator
    // Rename deflate -> flate
    var cmp = try deflate.compressor(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finish
    cmp.deinit(); // Remove

    // Decompressor
    var fbs = std.io.fixedBufferStream(buf.items);
    // Remove allocator and last param
    // Rename deflate -> flate
    // Remove try
    var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldZlib(allocator: std.mem.Allocator) !void {
    const zlib = std.compress.v1.zlib;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compressStream => compressor
    // Remove allocator
    var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // decompressStream => decompressor
    // Remove allocator
    // Remove try
    var dcp = try zlib.decompressStream(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldGzip(allocator: std.mem.Allocator) !void {
    const gzip = std.compress.v1.gzip;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compress => compressor
    // Remove allocator
    var cmp = try gzip.compress(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finisho
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // Rename decompress => decompressor
    // Remove allocator
    // Remove try
    var dcp = try gzip.decompress(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Feb 14, 2024
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

Your proposed API and naming conventions are spot-on. Let's proceed immediately.

Thanks also to @squeek502 for helping fuzz test this code.

.@"tar.gz" => try unpackTarballCompressed(f, tmp_directory.handle, resource, std.compress.gzip),
.@"tar.gz" => {
const reader = resource.reader();
var br = std.io.bufferedReaderSize(std.crypto.tls.max_ciphertext_record_len, reader);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your implementation actually require an additional buffered reader? This was here because the previous implementation did many small reads (1-8 bytes). If your implementation calls read() with a buffer size closer to 16K (the value of max_ciphertext_record_len) then this is not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first test fetching without buffered reader it failed with EndOfStream so I just put it because it was in previous implementation. Now I check why that is necessary and found bug in reading input stream in decompression. I was using read instead of readAll and getting less bytes then available in the stream. Now when fixed buffered reader is not needed any more.

Decompression is doing many small reads, as in previous implementation. Internal bit reader has buffer of 8 bytes. Decompressor has output buffer of 64K (it needs history data of at least 32K) but there is no input buffer it only gets as many bytes as needed to fill 8 byte bit buffer.

Compressor is other way around. It has 64K input buffer, reads in chunks of 32K and small output buffer of 248 bytes.

Currently I didn't remove buffered reader reasoning that behavior is same as previous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I check why that is necessary and found bug in reading input stream in decompression. I was using read instead of readAll and getting less bytes then available in the stream. Now when fixed buffered reader is not needed any more.

Note that calling read() instead of readAll() is appropriate for implementation of a stream. This will block on the underlying read() call only once, allowing users of the stream to possibly do other things with the thread while waiting for more data to be available to read.

However, make sure to notice correctly when the end of stream occurs. A short read does not mean the end of stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing the above comment, I looked at the commit you were referring to, and I see that in this case the amount read is very small (8 bytes or less). In this case it is OK to use readAll.

A potentially better solution would be to still use only 1 call to read(), and have an internal small buffer of those 8 bytes, and then only emit the bytes from read when the buffer is filled, however, this will require an updated definition of std.io.Reader.read because currently it says if the number of bytes read is 0 it means end of stream.

Comment on lines 12 to 18
// Version 1 interface
pub const v1 = struct {
pub const deflate = @import("compress/deflate.zig");
pub const gzip = @import("compress/gzip.zig");
pub const zlib = @import("compress/zlib.zig");
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very considerate of you to do, however, please go ahead and delete the old code. At this point in time in the zig project, it's the best path forward.

I cannot wait until I don't have to see "Go non-regression test..." flash in front of my eyes every time I run the std lib tests 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I was expecting from you ;-)

@andrewrk
Copy link
Member

@jacobly0 would you prefer @ianic to work with you on resolving the x86 backend crashes found by this branch (example below), or disable the corresponding tests?

example:

/home/ci/actions-runner5/_work/zig/zig/src/arch/x86_64/CodeGen.zig:379:25: 0x6f7fb46 in offset (zig)
                else => unreachable, // not offsettable
                        ^
/home/ci/actions-runner5/_work/zig/zig/src/arch/x86_64/CodeGen.zig:7279:54: 0x7498064 in packedLoad (zig)
        try self.load(dst_mcv, ptr_ty, ptr_mcv.offset(@intCast(@divExact(ptr_bit_off, 8))));
                                                     ^
/home/ci/actions-runner5/_work/zig/zig/src/arch/x86_64/CodeGen.zig:7432:32: 0x74972da in airLoad (zig)
            try self.packedLoad(dst_mcv, ptr_ty, ptr_mcv);
                               ^
/home/ci/actions-runner5/_work/zig/zig/src/arch/x86_64/CodeGen.zig:2019:49: 0x6f840b0 in genBody (zig)
            .load            => try self.airLoad(inst),
                                                ^

jacobly0 added a commit to ianic/zig that referenced this pull request Feb 14, 2024
@kristoff-it
Copy link
Member

I've fixed the autodoc related CI failure in a23ab33

ianic and others added 6 commits February 14, 2024 18:28
Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format.

Fixes ziglang#18062.

This PR replaces current compress/gzip and compress/zlib packages. Deflate package is renamed to flate. Flate is common name for deflate/inflate where deflate is compression and inflate decompression.

There are breaking change. Methods signatures are changed because of removal of the allocator, and I also unified API for all three namespaces (flate, gzip, zlib).

Currently I put old packages under v1 namespace they are still available as compress/v1/gzip, compress/v1/zlib, compress/v1/deflate. Idea is to give users of the current API little time to postpone analyzing what they had to change. Although that rises question when it is safe to remove that v1 namespace.

Here is current API in the compress package:

```Zig
// deflate
    fn compressor(allocator, writer, options) !Compressor(@typeof(writer))
    fn Compressor(comptime WriterType) type

    fn decompressor(allocator, reader, null) !Decompressor(@typeof(reader))
    fn Decompressor(comptime ReaderType: type) type

// gzip
    fn compress(allocator, writer, options) !Compress(@typeof(writer))
    fn Compress(comptime WriterType: type) type

    fn decompress(allocator, reader) !Decompress(@typeof(reader))
    fn Decompress(comptime ReaderType: type) type

// zlib
    fn compressStream(allocator, writer, options) !CompressStream(@typeof(writer))
    fn CompressStream(comptime WriterType: type) type

    fn decompressStream(allocator, reader) !DecompressStream(@typeof(reader))
    fn DecompressStream(comptime ReaderType: type) type

// xz
   fn decompress(allocator: Allocator, reader: anytype) !Decompress(@typeof(reader))
   fn Decompress(comptime ReaderType: type) type

// lzma
    fn decompress(allocator, reader) !Decompress(@typeof(reader))
    fn Decompress(comptime ReaderType: type) type

// lzma2
    fn decompress(allocator, reader, writer !void

// zstandard:
    fn DecompressStream(ReaderType, options) type
    fn decompressStream(allocator, reader) DecompressStream(@typeof(reader), .{})
    struct decompress
```

The proposed naming convention:
 - Compressor/Decompressor for functions which return type, like Reader/Writer/GeneralPurposeAllocator
 - compressor/compressor for functions which are initializers for that type, like reader/writer/allocator
 - compress/decompress for one shot operations, accepts reader/writer pair, like read/write/alloc

```Zig
/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@typeof(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type

/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@typeof(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type

```

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.
More resutls in [this](https://github.com/ianic/flate) repo.

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

If merged existing usage of compress gzip/zlib/deflate need some changes. Here is example with necessary changes in comments:

```Zig

const std = @import("std");

// To get this file:
// wget -nc -O war_and_peace.txt https://www.gutenberg.org/ebooks/2600.txt.utf-8
const data = @embedfile("war_and_peace.txt");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer std.debug.assert(gpa.deinit() == .ok);
    const allocator = gpa.allocator();

    try oldDeflate(allocator);
    try new(std.compress.flate, allocator);

    try oldZlib(allocator);
    try new(std.compress.zlib, allocator);

    try oldGzip(allocator);
    try new(std.compress.gzip, allocator);
}

pub fn new(comptime pkg: type, allocator: std.mem.Allocator) !void {
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    var cmp = try pkg.compressor(buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    var dcp = pkg.decompressor(fbs.reader());

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldDeflate(allocator: std.mem.Allocator) !void {
    const deflate = std.compress.v1.deflate;

    // Compressor
    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();
    // Remove allocator
    // Rename deflate -> flate
    var cmp = try deflate.compressor(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finish
    cmp.deinit(); // Remove

    // Decompressor
    var fbs = std.io.fixedBufferStream(buf.items);
    // Remove allocator and last param
    // Rename deflate -> flate
    // Remove try
    var dcp = try deflate.decompressor(allocator, fbs.reader(), null);
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldZlib(allocator: std.mem.Allocator) !void {
    const zlib = std.compress.v1.zlib;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compressStream => compressor
    // Remove allocator
    var cmp = try zlib.compressStream(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.finish();
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // decompressStream => decompressor
    // Remove allocator
    // Remove try
    var dcp = try zlib.decompressStream(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

pub fn oldGzip(allocator: std.mem.Allocator) !void {
    const gzip = std.compress.v1.gzip;

    var buf = std.ArrayList(u8).init(allocator);
    defer buf.deinit();

    // Compressor
    // Rename compress => compressor
    // Remove allocator
    var cmp = try gzip.compress(allocator, buf.writer(), .{});
    _ = try cmp.write(data);
    try cmp.close(); // Rename to finisho
    cmp.deinit(); // Remove

    var fbs = std.io.fixedBufferStream(buf.items);
    // Decompressor
    // Rename decompress => decompressor
    // Remove allocator
    // Remove try
    var dcp = try gzip.decompress(allocator, fbs.reader());
    defer dcp.deinit(); // Remove

    const plain = try dcp.reader().readAllAlloc(allocator, std.math.maxInt(usize));
    defer allocator.free(plain);
    try std.testing.expectEqualSlices(u8, data, plain);
}

```
Testing on windows is failing because line ending are changed on some
binary files during git checkout.
By using read instead of readAll decompression reader could get bytes
then available in the stream and then later wrongly failed with end of
stream.
It was usefull during development.

From andrewrk code review comment:
In fact, Zig does not guarantee the @sizeof structs, and so these tests are not valid.
@andrewrk
Copy link
Member

           0: 0xa766b0 - <unknown>!compress.flate.bit_writer.BitWriter(io.GenericWriter(*io.fixed_buffer_stream.FixedBufferStream([]u8),error{NoSpaceLeft},(function 'write'))).init
           1: 0xa76395 - <unknown>!compress.flate.block_writer.BlockWriter(io.GenericWriter(*io.fixed_buffer_stream.FixedBufferStream([]u8),error{NoSpaceLeft},(function 'write'))).init
           2: 0xa76042 - <unknown>!compress.flate.deflate.Deflate(.gzip,io.GenericWriter(*io.fixed_buffer_stream.FixedBufferStream([]u8),error{NoSpaceLeft},(function 'write')),compress.flate.block_writer.BlockWriter(io.GenericWriter(*io.fixed_buffer_stream.FixedBufferStream([]u8),error{NoSpaceLeft},(function 'write')))).init
           3: 0xa771a5 - <unknown>!compress.flate.deflate.compressor__anon_39980
           4: 0xa834ba - <unknown>!compress.flate.deflate.compress__anon_39738
           5: 0xa8366a - <unknown>!compress.flate.gzip.compress__anon_39737
           6: 0xa87d9e - <unknown>!compress.flate.root.testInterface__anon_39595
           7: 0xa9b428 - <unknown>!compress.flate.root.test.flate public interface
           8: 0xa939 - <unknown>!test_runner.mainServer
           9: 0x87cb - <unknown>!test_runner.main
          10: 0x83fd - <unknown>!_start
       note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information
    2: memory fault at wasm address 0xffdaf590 in linear memory of size 0x13af0000
    3: wasm trap: out of bounds memory access

Are you sure that is not a bug? Perhaps running the same test in Valgrind would turn up a similar issue for x86_64-linux.

@squeek502
Copy link
Collaborator

Note that the WASI error looks very similar to #18885. I ran the test from #18885 through Valgrind and it didn't find anything.

@andrewrk
Copy link
Member

andrewrk commented Feb 15, 2024

Indeed it does, so I would not consider those disabled tests to be a merge blocker. However, it would be nice to link to the github issue next to where they are being skipped.

@ianic
Copy link
Contributor Author

ianic commented Feb 15, 2024

    2: memory fault at wasm address 0xffdaf590 in linear memory of size 0x13af0000
    3: wasm trap: out of bounds memory access

Are you sure that is not a bug?

I don't understand the problem here.
This is minimal test which still fails:

test "flate.wasm" {
    const D = struct {
        lookup: Lookup = .{},
        win: SlidingWindow = .{},
        tokens: Tokens = .{},
        level: LevelArgs,

        const Self = @This();

        pub fn init() !Self {
            return Self{
                .level = LevelArgs.get(.default),
            };
        }
    };
    _ = try D.init();
}

Now anything I touch makes it pass.

  • changing return Self{ to return .{
  • making init not return error (remove ! and try)
  • putting .level = .{... instead of calling LevelArgs.get
  • removing any struct field: lookup, win...

And the failure is:

$ zig test lib/std/std.zig --zig-lib-dir lib -target wasm32-wasi --test-cmd wasmtime --test-cmd-bin --test-filter flate

60/97 test.flate wasm... Error: failed to run main module `/home/ianic/zig/zig/zig-cache/o/8700c887ca2f62bb556bdf9f18c61564/test.wasm`

Caused by:
    0: failed to invoke command default
    1: error while executing at wasm backtrace:
           0: 0x6c007 - compress.flate.deflate.LevelArgs.get
                           at /home/ianic/zig/zig/lib/std/compress/flate/deflate.zig:41
           1: 0x6be85 - compress.flate.deflate.test.flate wasm.D.init
                           at /home/ianic/zig/zig/lib/std/compress/flate/deflate.zig:589:39
           2: 0x6c14f - compress.flate.deflate.test.flate wasm
                           at /home/ianic/zig/zig/lib/std/compress/flate/deflate.zig:593:19
           3: 0x3d86 - test_runner.mainTerminal
                           at /home/ianic/zig/zig/lib/test_runner.zig:158:25
           4:  0xd01 - test_runner.main
                           at /home/ianic/zig/zig/lib/test_runner.zig:35:28
           5:  0x8f4 - start.callMain
                           at /home/ianic/zig/zig/lib/std/start.zig:501:22              - _start
                           at /home/ianic/zig/zig/lib/std/start.zig:211:42
    2: memory fault at wasm address 0xfffdfc9f in linear memory of size 0x360000
    3: wasm trap: out of bounds memory access
error: the following test command failed with exit code 134:
wasmtime /home/ianic/zig/zig/zig-cache/o/8700c887ca2f62bb556bdf9f18c61564/test.wasm

@andrewrk
Copy link
Member

Ah I wonder if it is stack overflow. Perhaps this option could be illuminating:

        --max-wasm-stack <MAX_WASM_STACK>
            Maximum stack size, in bytes, that wasm is allowed to consume before a stack overflow is
            reported

@andrewrk andrewrk added the release notes This PR should be mentioned in the release notes. label Feb 15, 2024
@andrewrk andrewrk merged commit 57d6f78 into ziglang:master Feb 15, 2024
10 checks passed
@ianic
Copy link
Contributor Author

ianic commented Feb 15, 2024

Running wasmtime with various options didn't make the difference.

But I got all test passing by raising stack to 8MB from default 1MB (#12589)

zig test lib/std/std.zig --zig-lib-dir lib -target wasm32-wasi --test-cmd wasmtime --test-cmd-bin --test-filter flate  --stack 8388608

@andrewrk
Copy link
Member

Ah nice find. I think this is a good reason to bump the default. I see no reason for WASI to not use the same default as the other operating systems.

@ianic
Copy link
Contributor Author

ianic commented Feb 15, 2024

Thanks @andrewrk for merging this and all the guys here who helped, especially to @squeek502 who found many bugs and did great work with fuzz testing.

@squeek502
Copy link
Collaborator

squeek502 commented Feb 15, 2024

Great work on this @ianic, thanks for being so quick with the fixes on all the stuff found by fuzzing!

By the way, I'm planning on running the fuzzers continuously for a few days just to have more confidence everything's good, but I'll be away from my normal setup for a little while. Will follow up once I run the fuzzers again.

@squeek502
Copy link
Collaborator

squeek502 commented Feb 17, 2024

I left the roundtrip fuzzer going while I was away and it's found some stuff (but need to confirm they're real bugs; it was using the code before ianic/flate@902ee48)

EDIT: They were all fixed after updating to the latest implementation 😄, still interested in the answer to this question though:

@ianic where would you like me to report any bugs found? The flate repository or the Zig issue tracker?

@ianic
Copy link
Contributor Author

ianic commented Feb 17, 2024

If I can choose then Zig issue tracker, because anyway it will result in PR.

@squeek502
Copy link
Collaborator

91 hours of continuous fuzzing later, nothing new found. That's good enough for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

deflate/inflate implementations are ported rather than implemented from first principles
6 participants