-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std.tar: add writer #19603
std.tar: add writer #19603
Conversation
Can you share the performance difference before/after? |
Generating sources.tar 10 times in the loop: before:
after:
The difference is because I used smaller buffer when writing file content 512 bytes instead of 4000 as in any().writeFile.
Let me do something about that. |
Fixing performance problem: ziglang#19603 (comment)
Fixing performance problem: ziglang#19603 (comment)
Fixing performance problem: ziglang#19603 (comment)
Comparing previous and this in creating sources.tar
Code used for benchmark: prevpub fn main() !void {
var gpa_instance = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa_instance.deinit() == .ok);
const gpa = gpa_instance.allocator();
var lib_dir = try std.fs.cwd().openDir("/home/ianic/Code/zig/lib", .{});
var out_dir = try std.fs.cwd().openDir("/home/ianic/Code/tmp", .{});
// previous version
for (0..10) |_| {
var out_file = try out_dir.createFile("sources.tar", .{});
defer out_file.close();
var w = out_file.writer();
var std_dir = try lib_dir.openDir("std", .{ .iterate = true });
defer std_dir.close();
var walker = try std_dir.walk(gpa);
defer walker.deinit();
while (try walker.next()) |entry| {
switch (entry.kind) {
.file => {
if (!std.mem.endsWith(u8, entry.basename, ".zig"))
continue;
if (std.mem.endsWith(u8, entry.basename, "test.zig"))
continue;
},
else => continue,
}
var file = try std_dir.openFile(entry.path, .{});
defer file.close();
const stat = try file.stat();
const padding = p: {
const remainder = stat.size % 512;
break :p if (remainder > 0) 512 - remainder else 0;
};
var file_header = std.tar.output.Header.init();
file_header.typeflag = .regular;
try file_header.setPath("std", entry.path);
try file_header.setSize(stat.size);
try file_header.updateChecksum();
try w.writeAll(std.mem.asBytes(&file_header));
try w.any().writeFile(file);
try w.writeByteNTimes(0, padding);
}
{
// Since this command is JIT compiled, the builtin module available in
// this source file corresponds to the user's host system.
const builtin_zig = @embedFile("builtin");
var file_header = std.tar.output.Header.init();
file_header.typeflag = .regular;
try file_header.setPath("builtin", "builtin.zig");
try file_header.setSize(builtin_zig.len);
try file_header.updateChecksum();
try w.writeAll(std.mem.asBytes(&file_header));
try w.writeAll(builtin_zig);
const padding = p: {
const remainder = builtin_zig.len % 512;
break :p if (remainder > 0) 512 - remainder else 0;
};
try w.writeByteNTimes(0, padding);
}
}
}
thispub fn main() !void {
var gpa_instance = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa_instance.deinit() == .ok);
const gpa = gpa_instance.allocator();
var lib_dir = try std.fs.cwd().openDir("/home/ianic/Code/zig/lib", .{});
var out_dir = try std.fs.cwd().openDir("/home/ianic/Code/tmp", .{});
for (0..10) |_| {
var out_file = try out_dir.createFile("sources_new.tar", .{});
defer out_file.close();
var w = std.tar.writer(out_file.writer().any());
try w.setRoot("std");
var std_dir = try lib_dir.openDir("std", .{ .iterate = true });
defer std_dir.close();
var walker = try std_dir.walk(gpa);
defer walker.deinit();
while (try walker.next()) |entry| {
switch (entry.kind) {
.file => {
if (!std.mem.endsWith(u8, entry.basename, ".zig"))
continue;
if (std.mem.endsWith(u8, entry.basename, "test.zig"))
continue;
},
else => continue,
}
var file = try entry.dir.openFile(entry.basename, .{});
defer file.close();
try w.writeFile(entry.path, file);
}
{
// Since this command is JIT compiled, the builtin module available in
// this source file corresponds to the user's host system.
const builtin_zig = @embedFile("builtin");
w.prefix = "builtin";
try w.writeFileBytes("builtin.zig", builtin_zig, .{});
}
}
} There were two use cases which pushed me into creating tar.writer:
I started modifying output.Header to support tarballs without prefix. Then found a file while creating tarball of the zig source, which has too long name. Realized that we need to support pax header or gnu long names. Wanted to have api where user don't need to think about adding checksum to the header, adding padding at file content. So instead of modifying output.Header moved all that to tar.writer. |
The computer is doing a lot more work than before. I have a strong suspicion that a different API could result in equivalent performance, while still providing the desired abstraction. |
This version is also setting mode and mtime for tar files. Not needed in this case. I removed that and did some small optimization.
|
Fixing performance problem: ziglang#19603 (comment)
I did few optimizations.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Would you mind doing the rebase?
Current `setPath` doesn't handle non prefix cases.
So we can write filenames (and links) of arbitrary length.
Move writer to tar/writer.zig
Compare output of writer with current version.
Replaced with writer.
/home/ci/actions-runner1/_work/zig/zig/lib/std/tar/writer.zig:112:54: error: expected type 'usize', found 'u64' /home/ci/actions-runner1/_work/zig/zig/lib/std/tar/writer.zig:112:54: note: unsigned 32-bit int cannot represent all possible unsigned 64-bit values /home/ci/actions-runner1/_work/zig/zig/lib/std/tar/writer.zig:54:65: note: parameter type declared here
Fixing performance problem: ziglang#19603 (comment)
Use AnyWriter.writeFile if file system file is provided to the writer.
Init Header with file defaults. Writing file is most common case. Conversion to octal without bufPrint. Checksum calculation without branching.
Thanks for the rebase :) |
Simplifies code in docs creation where we used `std.tar.output.Header`. Writer uses that Header internally and provides higher level interface. Updates checksum on write, handles long file names, allows setting mtime and file permission mode. Provides handy interface for passing `Dir.WalkerEntry`.
Simplifies code in docs creation where we used
std.tar.output.Header.
Writer uses that Header internally and provides higher level interface.
Updates checksum on write, handles long file names, allows setting mtime and file permission mode. Provides handy interface for passing Dir.WalkerEntry.
Tested that
zig std
andzig test -femit-docs
are creatingsources.tar
as before this change.