-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: add type-erased reader; base GenericReader on it #17344
Conversation
Before merging this, I think we should do some perf analysis. I'm concerned that the loss of implementation-specific inlining here will result in certain reader functions becoming slower, particularly those which read data one byte at a time. Also, if we're making this change to |
While this idea is clever, I think a proliferation of I will be interested to see if continued, exclusive use of |
I agree, let's definitely understand the performance characteristics of this before merging (and quantify the effect on bloat).
Yes, I just wanted to get a proof-of-concept up for discussion. I probably would have kept it local until I got more work done but I got stuck on #17343, so tossing it up here until that yak is shaven.
I'm interested to see that as well. Some related things: This code: zig/lib/std/crypto/tls/Client.zig Line 142 in 9a001e1
zig/lib/std/crypto/tls/Client.zig Line 920 in 9a001e1
This is some low level API that is already suffering from bloat as it stands and probably would want to take advantage of a type-erased reader. |
Related #13808 (comment) ? :
|
Yes exactly the same idea as @Vexu! |
01ccc73
to
5405444
Compare
5405444
to
a6e915c
Compare
std lib tests are passing for me locally, so, the current status of this PR is data collection:
Note that these data points above do not include actually taking advantage of the type erased reader; everything is still using the GenericReader wrapper. |
a6e915c
to
2c2a5e6
Compare
The idea here is to avoid code bloat by having only one actual io.Reader implementation, which is type erased, and then implement a GenericReader that preserves type information on top of that as thin glue code. The strategy here is for that glue code to `@errSetCast` the result of the type-erased reader functions, however, while trying to do that I ran into #17343.
Notably, this contains bug fixes related to `@errorCast` which are required by the changes to `std.io.Reader` in this branch, and the compiler source code has a dependency on `std.io.Reader`.
2c2a5e6
to
9a09651
Compare
self-hosted compiler building itself (insignificant perf difference):
|
Regarding concerns about reading one byte at a time, here are some measurements from one of my projects which relies on that extensively (XML parser internally implemented as a state machine). The short version is that there doesn't seem to be any noticeable difference in performance (wall time) with this change. Repository: https://github.com/ianprime0509/zig-xml (commit
token_reader
reader |
I'm going to merge this because I think that is the best way to evaluate it. Plus, I want to start playing with it in side projects. So far it looks like it doesn't really affect perf or bloat for existing codebases (please share if you find a counter example), and we don't really know how beneficial If this causes immediate issues for anyone, please mention it and this can be reverted (or fixed). |
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
I was interested in the corresponding change for However, I was rather surprised by the negative performance results, which is why I haven't opened a PR for this:
For completeness, my #!/bin/sh
cd "$1"/build
rm -rf stage4 ../zig-cache
stage3/bin/zig build -p stage4 -Dno-lib -Denable-llvm I tried the benchmark several times, including in the other order (to make sure the results were inverted), with pretty consistent results. It's surprising to me that |
Could it perhaps be that there is some logic writing 1 byte at a time, and should be switched to a buffered writer? I'm curious where all the writers are in the zig compiler; I can't imagine there being that many in the hot path. Have you tried a micro benchmark of the new code? If you're up for it, I'd be interested in a PR where we can dissect it even if it doesn't end up getting merged. |
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
I opened #17634 with some more benchmark information in the PR description. In addition to trying some build variations on another project, I added a couple micro-benchmarks I tried out. |
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.
The idea here is to avoid code bloat by having only one actual io.Reader implementation, which is type erased, and then implement a GenericReader that preserves type information on top of that as thin glue code. API users which want to continue using generic context and error set types can continue to use
reader: anytype
, however, API users now have an additional option which is to use TypeErasedReader which can be accepted as a parameter of a non-generic function.The strategy here is for that glue code to
@errorCast
the result of the type-erased reader functions.