-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type-erased writer and GenericWriter #17634
Conversation
One optimisation that could be done is for Type erasing the writer should eliminate one of these instantiations. Pro is hopefully better compile times (the zig compiler calls |
Thanks, that's a great idea! To get an idea of the benefit of such a change, I applied the following patch (#17458 is relevant here: the diff --git a/lib/std/fmt.zig b/lib/std/fmt.zig
index 611737161..31ce66b7c 100644
--- a/lib/std/fmt.zig
+++ b/lib/std/fmt.zig
@@ -1993,7 +1993,10 @@ pub const BufPrintError = error{
/// Returns a slice of the bytes printed to.
pub fn bufPrint(buf: []u8, comptime fmt: []const u8, args: anytype) BufPrintError![]u8 {
var fbs = std.io.fixedBufferStream(buf);
- try format(fbs.writer(), fmt, args);
+ format(fbs.writer().any(), fmt, args) catch |e| switch (e) {
+ error.NoSpaceLeft => return error.NoSpaceLeft,
+ else => unreachable,
+ };
return fbs.getWritten();
}
@@ -2005,7 +2008,7 @@ pub fn bufPrintZ(buf: []u8, comptime fmt: []const u8, args: anytype) BufPrintErr
/// Count the characters needed for format. Useful for preallocating memory
pub fn count(comptime fmt: []const u8, args: anytype) u64 {
var counting_writer = std.io.countingWriter(std.io.null_writer);
- format(counting_writer.writer(), fmt, args) catch |err| switch (err) {};
+ format(counting_writer.writer().any(), fmt, args) catch unreachable;
return counting_writer.bytes_written;
}
The results are certainly positive: it seems like this patch is able to make up for much of the compilation performance regression noted in the PR. Some benchmark results:
tetris zig build - master vs unpatched PR
tetris zig build - master vs patched PR
stage4 compiler build - master vs unpatched PR
stage4 compiler build - master vs patched PRWhat I'd really like to do is erase the writer type for pub fn format(writer: anytype, comptime fmt: []const u8, args: anytype) @TypeOf(writer).Error!void {
return @errorCast(formatImpl(writer.any(), fmt, args));
}
fn formatImpl(writer: io.AnyWriter, comptime fmt: []const u8, args: anytype) anyerror!void {
// Original format implementation goes here
} But with this solution, it is safety-checked illegal behavior for any user-defined format function to return any error outside the error set of the writer, which is a regression from the current behavior of allowing this (and one which would not be caught at compile time). I personally don't like this, and feel like it would be more worthwhile to invest time in a better version of |
Maybe I'm missing something obvious but couldn't you make the format function continue to have an inferred error set like status quo? Edit: oh, I see the problem now - the inferred error set is |
I bet the C backend would benefit from a type-erased writer too because there are a lot of functions like this: Lines 1771 to 1779 in 22a6a5d
And they are currently being instantiated 2x. |
f387f5a
to
0483cdb
Compare
After thinking for a while about how to integrate a type-erased writer into the C backend without making the error handling unmanageable ( Types of writers passed to
After:
Compared to current master, this does appear to have some benefit (this benchmark is for an
I also added the Unfortunately, one of the
This is the only failing test; if I comment it out, Edit: bisected to 405ba26 (applying the first commit in this PR at each step), though this doesn't particularly help me reduce the issue yet. |
I have a strong suspicion that a type erased writer is a good idea and we just need to find the handful problematic places where some code is reading 1 byte at a time through a function pointer |
I agree. Based on the benchmarks I did in the PR description, the bulk of the performance hit from the initial change was in the build runner, and invoking |
0483cdb
to
9e1d304
Compare
I rebased on master to resolve conflicts with the latest changes to Current benchmark results (including all commits in this PR so far) don't seem to show a significant performance difference vs master:
Tetris zig build
Compiler zig buildI tried using callgrind to look for performance hotspots related to writer usage, to see if there was anything being done inefficiently with writers when performing Lines 869 to 876 in f24ceec
This call to Lines 818 to 839 in f24ceec
|
Could you also share the code sizes of the generated executables? This is an important benchmarking size here as well. You can query them with the It might make sense to decide into either always use generic reader/writer in print() for ReleaseSmall, but use several instantiations in ReleaseFast |
9e1d304
to
324e59c
Compare
Sorry for the delay. I rebased on the latest master (to resolve a minor conflict) and took some size measurements of the results. The differences look fairly small, although the type-erased writer for some reason seems to lead to slightly larger sizes than the generic version: Tetris (comparing
Zig (comparing
I also made a very simple implementation of I included a comparison of the disassembly for one of the It's also interesting to note how many functions depend on writer functionality even though it might not seem like it at first glance, such as As a final observation on size comparisons, it might be relevant to point out that the helper functions in the |
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
Since `bufPrint` and `count` both control the writers used internally, they can leverage type-erased writers while maintaining correct error handling. This reduces generic instantiations when using `allocPrint`, which calls both `count` and `bufPrint` internally.
This reduces generic instantiations of several write functions. Before: ``` @as(type, io.writer.Writer(*array_list.ArrayListAligned(u8,null),error{OutOfMemory},(function 'appendWrite'))) @as(type, io.writer.Writer(*codegen.c.IndentWriter(io.writer.Writer(*array_list.ArrayListAligned(u8,null),error{OutOfMemory},(function 'appendWrite'))),error{OutOfMemory},(function 'write'))) ``` After: ``` @as(type, io.GenericWriter(io.Writer,error{OutOfMemory},(function 'write'))) ```
324e59c
to
2b2c9c5
Compare
This is now unblocked thanks to #18729 🚀 |
Thanks for all the investigations into this. Let's move forward with it. I think it's nice to have the option to provide a non-generic API. For example, I think I want to use it in |
This is a companion to #17344 to apply the same change to the
std.io.Writer
interface.Latest benchmarks here: #17634 (comment) Old benchmarks follow:
Benchmarks
Building tetris
I tried three different variations of this, shown below in the wrapper script I used. The first variation yields similar results as the compiler benchmark I tried in my comment linked above, while subsequent variations do not show as much difference (implying that the main performance hit here might be coming from the build runner and not the actual compiler itself).
zig build
zig build-exe
zig build-exe -fno-emit-bin
Artificial benchmarks
I'd be happy to try out other benchmark ideas here as well. Both of these were compiled with
ReleaseFast
.Write 1GB to a heap buffer one byte at a time
In my first attempt at this one, I was just writing
'A'
bytes, but there wasn't a noticeable difference there (from what I could tell from the disassembly of that first version, LLVM had optimized both versions to the equivalent of amemset
, with the "old" version being what appeared to be an inlined vectorizedmemset
and the "new" version being literally a call tomemset
).write_bytes
Write a 1MB file one byte at a time
Most of the impact the writer would have is likely dwarfed by the cost of the write syscalls. I'm just including this because it's something I initially thought might yield some useful results.
write_file