Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.tar: copy another project's full test suite and make them all pass #14310

Closed
Tracked by #14265
andrewrk opened this issue Jan 14, 2023 · 5 comments · Fixed by #18261
Closed
Tracked by #14265

std.tar: copy another project's full test suite and make them all pass #14310

andrewrk opened this issue Jan 14, 2023 · 5 comments · Fixed by #18261
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library. zig build system std.Build, the build runner, `zig build` subcommand, package management
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jan 14, 2023

The implementation is the easy part!

Perhaps https://cs.opensource.google/go/go/+/refs/tags/go1.19.5:src/archive/tar/

Perhaps https://github.com/python/cpython/blob/3.11/Lib/test/test_tarfile.py

Make sure the license is MIT-compatible so we can copy the test data.

Depending on how bulky the test data is, maybe this would be better to go into test/standalone rather than lib/std/

I'm marking this as "bug" since it's extremely likely that adding these tests will find bugs.

Bonus: write a handy benchmark script that we can use to find out if any implementation changes to std.tar improved performance.

Please enjoy this inspirational material, while searching for some other project's data to plunder.

@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior enhancement Solving this issue will likely involve adding new logic or components to the codebase. contributor friendly This issue is limited in scope and/or knowledge of Zig internals. standard library This issue involves writing Zig code for the standard library. zig build system std.Build, the build runner, `zig build` subcommand, package management labels Jan 14, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Jan 14, 2023
@nektro
Copy link
Contributor

nektro commented Jan 14, 2023

The Go team has an extensive set of tar test data and a harness that shows how its used, and is BSD-3 licensed

https://go.dev/src/archive/tar/testdata/

https://go.dev/src/archive/tar/tar_test.go
https://go.dev/src/archive/tar/fuzz_test.go
https://go.dev/src/archive/tar/reader_test.go
https://go.dev/src/archive/tar/writer_test.go

@andrewrk andrewrk mentioned this issue Jan 14, 2023
32 tasks
@jhuntwork
Copy link

Just to offer some feedback and testing related to archives created with libarchive's bsdtar...

Using this sample zig code:

const std = @import("std");

pub fn main() !void {
    var file = try std.fs.cwd().openFile("test.tar", .{});
    defer file.close();
    const outdir = try std.fs.cwd().openDir(
        "out",
        .{},
    );

    try std.tar.pipeToFileSystem(
        outdir,
        file.reader(),
        .{ .strip_components = 1, .mode_mode = .ignore },
    );
}

If test.tar is created with the tar command from busybox, the archive can be extracted successfully. When created with bsdtar with the same content, I receive the following error:

$ ./zig-out/bin/mere-zig
error: InvalidCharacter
/usr/lib/zig/std/fmt.zig:1930:17: 0x232a77 in charToDigit (mere-zig)
        else => return error.InvalidCharacter,
                ^
/usr/lib/zig/std/fmt.zig:1858:23: 0x2124ea in parseWithSign__anon_4202 (mere-zig)
        const digit = try charToDigit(c, buf_radix);
                      ^
/usr/lib/zig/std/fmt.zig:1757:5: 0x20c3bc in parseInt__anon_3712 (mere-zig)
    return parseWithSign(T, buf, radix, .Pos);
    ^
/usr/lib/zig/std/tar.zig:40:9: 0x20bc80 in fileSize (mere-zig)
        return std.fmt.parseInt(u64, rtrimmed, 8);
        ^
/usr/lib/zig/std/tar.zig:117:27: 0x20d7af in pipeToFileSystem__anon_3316 (mere-zig)
        const file_size = try header.fileSize();
                          ^
/home/jeremy/mere-zig/src/main.zig:11:5: 0x20ec1c in main (mere-zig)
    try std.tar.pipeToFileSystem(
    ^

busybox tar:

$ tar -cf test.tar zig-out
$ file test.tar
test.tar: POSIX tar archive (GNU)

libarchive bsdtar:

$ bsdtar -cf test.tar zig-out
$ file test.tar
test.tar: POSIX tar archive

@ianic
Copy link
Contributor

ianic commented Dec 1, 2023

I have implementation which passes all relevant Go tests. Handling sparse files is not implemented so skipping that cases.

New functionality needed to pass tests:

  • reading link_name from pax attribute
  • reading file size from pax attribute
  • handling sizes greater than 8GB in gnu format
  • gnu extended headers for name, link_name; L and K header types
  • calculating and checking header checksum
  • few pax attribute checks: no null in key, break value on first null, attribute ending with newline

I had to reorganize code to make it more testable. pipeToFileSystem is the same but some functionality moved from it. I'm using zig source code tarball to test that previous and new implementation have same results.

I didn't move test cases to the Zig repo. Need some advice of where to put them.
There are 28 test files, cumulative size 92250 bytes.

Currently location of the Go test cases is passed by environment variable.
I'm running tests by some variant of:

GO_TAR_TESTDATA_PATH=/usr/local/go/src/archive/tar/testdata zig test --test-filter tar lib/std/std.zig  --zig-lib-dir lib --main-mod-path lib/std

@FnControlOption
Copy link
Contributor

I didn't move test cases to the Zig repo. Need some advice of where to put them.

Maybe std/tar/testdata/? std.compress has some directories like this, such as std/compress/xz/testdata/ used in std/compress/xz/test.zig. Similarly, you could move relevant tests to a test.zig file in std/tar/. Just don’t forget to add to tar.zig:

test {
    _ = @import("tar/test.zig");
}

@ianic
Copy link
Contributor

ianic commented Dec 11, 2023

Thanks @FnControlOption

I saw your comment just a few seconds after I make pr. In the pr I decide to put files in test/cases/tar. But this makes sense, somehow I didn't notice compress/testdata, compress/xz/testdata although I was searching for examples.

ianic added a commit to ianic/zig that referenced this issue Dec 18, 2023
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
ianic added a commit to ianic/zig that referenced this issue Dec 19, 2023
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
ianic added a commit to ianic/zig that referenced this issue Dec 20, 2023
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
andrewrk pushed a commit to ianic/zig that referenced this issue Jan 14, 2024
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
bilaliscarioth pushed a commit to bilaliscarioth/zig that referenced this issue Jan 27, 2024
Using Python testtar file (mentioned in ziglang#14310) to test diagnostic
reporting.
Added computing checksum by using both unsigned and signed header bytes
values.
Added skipping gnu exteneded sparse headers while reporting unsupported
header in diagnostic.

Note on testing:

wget https://github.com/python/cpython/raw/3.11/Lib/test/testtar.tar -O
/tmp/testtar.tar

```
test "Python testtar.tar file" {
    const file_name = "testtar.tar";

    var file = try std.fs.cwd().openFile("/tmp/" ++ file_name, .{});
    defer file.close();

    var diag = Options.Diagnostics{ .allocator = std.testing.allocator };
    defer diag.deinit();

    var iter = iterator(file.reader(), &diag);
    while (try iter.next()) |f| {
        std.debug.print("supported: {} {s} {d}\n", .{ f.kind, f.name, f.size });
        try f.skip();
    }
    for (diag.errors.items) |e| {
        switch (e) {
            .unsupported_file_type => |u| {
                std.debug.print("unsupported: {} {s}\n", .{ u.file_type, u.file_name });
            },
            else => unreachable,
        }
    }
}
```
@mlugg mlugg moved this to Quality Assurance in Package Manager Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior contributor friendly This issue is limited in scope and/or knowledge of Zig internals. enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants