std: add type-erased reader; base GenericReader on it #17344

andrewrk · 2023-10-01T01:13:11Z

The idea here is to avoid code bloat by having only one actual io.Reader implementation, which is type erased, and then implement a GenericReader that preserves type information on top of that as thin glue code. API users which want to continue using generic context and error set types can continue to use reader: anytype, however, API users now have an additional option which is to use TypeErasedReader which can be accepted as a parameter of a non-generic function.

The strategy here is for that glue code to @errorCast the result of the type-erased reader functions.

mlugg · 2023-10-01T02:04:39Z

Before merging this, I think we should do some perf analysis. I'm concerned that the loss of implementation-specific inlining here will result in certain reader functions becoming slower, particularly those which read data one byte at a time.

Also, if we're making this change to Reader, should we not make an analogous change to Writer at the same time?

jayschwa · 2023-10-01T02:47:40Z

While this idea is clever, I think a proliferation of GenericFoo vs TypeErasedFoo across the standard library would be unfortunate. Having to choose between precise error-handling or "not being bloated" feels like a bad compromise. I want to have my cake and eat it too. 👼🏻 🍰

I will be interested to see if continued, exclusive use of [Generic]Reader still benefits from "de-bloating" after this changes. It sounds like it would if it's only a thin wrapper now.

andrewrk · 2023-10-01T03:00:09Z

Before merging this, I think we should do some perf analysis. I'm concerned that the loss of implementation-specific inlining here will result in certain reader functions becoming slower, particularly those which read data one byte at a time.

I agree, let's definitely understand the performance characteristics of this before merging (and quantify the effect on bloat).

Also, if we're making this change to Reader, should we not make an analogous change to Writer at the same time?

Yes, I just wanted to get a proof-of-concept up for discussion. I probably would have kept it local until I got more work done but I got stuck on #17343, so tossing it up here until that yak is shaven.

I will be interested to see if continued, exclusive use of [Generic]Reader still benefits from "de-bloating" after this changes. It sounds like it would if it's only a thin wrapper now.

I'm interested to see that as well. Some related things:

In/OutStream design flaw introduced with new error set changes. #764

This code:

zig/lib/std/crypto/tls/Client.zig

Line 142 in 9a001e1

    
           pub fn init(stream: anytype, ca_bundle: Certificate.Bundle, host: []const u8) InitError(@TypeOf(stream))!Client {

zig/lib/std/crypto/tls/Client.zig

Line 920 in 9a001e1

    
           pub fn readvAdvanced(c: *Client, stream: anytype, iovecs: []const std.os.iovec) !usize {

This is some low level API that is already suffering from bloat as it stands and probably would want to take advantage of a type-erased reader.

BratishkaErik · 2023-10-01T06:31:39Z

The strategy here is for that glue code to @errSetCast the result of the type-erased reader functions, however, while trying to do that I ran into #17343.

Related #13808 (comment) ? :

You could further reduce redundant generic instantiations by implementing base reader and writer types and then using @errSetCast and semantic inlining:

This would be much nicer if @errSetCast worked on error unions.

andrewrk · 2023-10-01T07:03:55Z

Yes exactly the same idea as @Vexu!

andrewrk · 2023-10-01T20:14:12Z

Thanks @Vexu for the initial implementation of #17343. This PR is now blocking on #17355.

andrewrk · 2023-10-03T20:30:59Z

std lib tests are passing for me locally, so, the current status of this PR is data collection:

does this help with bloat?
- data point: std lib tests with -OReleaseSmall: no significant difference
how is the runtime performance affected?
- data point: self-hosted compiler building itself: no significant difference
is the type-erased reader API useful to use?
- I for one want to use it in a few places

Note that these data points above do not include actually taking advantage of the type erased reader; everything is still using the GenericReader wrapper.

The idea here is to avoid code bloat by having only one actual io.Reader implementation, which is type erased, and then implement a GenericReader that preserves type information on top of that as thin glue code. The strategy here is for that glue code to `@errSetCast` the result of the type-erased reader functions, however, while trying to do that I ran into #17343.

Notably, this contains bug fixes related to `@errorCast` which are required by the changes to `std.io.Reader` in this branch, and the compiler source code has a dependency on `std.io.Reader`.

andrewrk · 2023-10-03T22:29:06Z

self-hosted compiler building itself (insignificant perf difference):

Benchmark 1 (3 runs): build-exe before
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          68.9s  ± 1.85s     66.8s  … 70.5s           0 ( 0%)        0%
  peak_rss           4.53GB ±  356KB    4.53GB … 4.53GB          0 ( 0%)        0%
  cpu_cycles          275G  ±  579M      275G  …  276G           0 ( 0%)        0%
  instructions        387G  ±  127M      387G  …  387G           0 ( 0%)        0%
  cache_references   17.0G  ± 45.0M     16.9G  … 17.0G           0 ( 0%)        0%
  cache_misses       1.35G  ± 5.26M     1.35G  … 1.36G           0 ( 0%)        0%
  branch_misses      1.90G  ± 2.89M     1.90G  … 1.91G           0 ( 0%)        0%
Benchmark 2 (3 runs): build-exe after
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          69.6s  ± 2.75s     66.7s  … 72.1s           0 ( 0%)          +  1.1% ±  7.7%
  peak_rss           4.54GB ±  432KB    4.54GB … 4.54GB          0 ( 0%)          +  0.1% ±  0.0%
  cpu_cycles          277G  ± 2.01G      276G  …  279G           0 ( 0%)          +  0.8% ±  1.2%
  instructions        389G  ± 67.0M      389G  …  389G           0 ( 0%)          +  0.5% ±  0.1%
  cache_references   17.0G  ± 38.0M     17.0G  … 17.1G           0 ( 0%)          +  0.6% ±  0.6%
  cache_misses       1.36G  ± 14.0M     1.35G  … 1.37G           0 ( 0%)          +  0.4% ±  1.8%
  branch_misses      1.90G  ± 2.92M     1.90G  … 1.90G           0 ( 0%)          -  0.1% ±  0.3%

ianprime0509 · 2023-10-04T00:31:45Z

Regarding concerns about reading one byte at a time, here are some measurements from one of my projects which relies on that extensively (XML parser internally implemented as a state machine). The short version is that there doesn't seem to be any noticeable difference in performance (wall time) with this change.

Repository: https://github.com/ianprime0509/zig-xml (commit dfdc044f3271641c7d428dc8ec8cd46423d8b8b6, see the bench directory)
Test file: https://github.com/ianprime0509/gir-files/blob/dabd4d36bf2d6008754d90b392cc007a683cde0c/Gtk-4.0.gir (5.7MiB XML file)
The benchmark is compiled using ReleaseFast and reads the entire file into memory before using std.io.fixedBufferStream to read it into the parser. Two implementations are compared below (reader does more work than token_reader):

Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           115ms ± 16.1ms     101ms …  153ms          0 ( 0%)        0%
  peak_rss           7.29MB ± 64.6KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          409M  ± 12.9M      400M  …  463M           3 ( 7%)        0%
  instructions       1.01G  ± 27.5      1.01G  … 1.01G           3 ( 7%)        0%
  cache_references    262K  ± 66.9K      226K  …  618K           4 ( 9%)        0%
  cache_misses       11.1K  ± 6.80K     7.69K  … 41.7K           5 (11%)        0%
  branch_misses       753K  ± 6.95K      750K  …  797K           3 ( 7%)        0%
Benchmark 2 (44 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           115ms ± 14.2ms     101ms …  136ms          0 ( 0%)          +  0.0% ±  5.6%
  peak_rss           7.30MB ± 61.8KB    7.21MB … 7.34MB          0 ( 0%)          +  0.1% ±  0.4%
  cpu_cycles          409M  ± 6.95M      400M  …  424M           0 ( 0%)          +  0.2% ±  1.1%
  instructions       1.01G  ± 30.4      1.01G  … 1.01G           0 ( 0%)          -  0.0% ±  0.0%
  cache_references    258K  ± 23.9K      229K  …  321K           3 ( 7%)          -  1.6% ±  8.1%
  cache_misses       12.1K  ± 3.79K     7.85K  … 21.7K           0 ( 0%)          +  8.7% ± 21.0%
  branch_misses       842K  ±  751       841K  …  844K           0 ( 0%)        💩+ 11.9% ±  0.3%

token_reader

Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           146ms ± 14.9ms     129ms …  165ms          0 ( 0%)        0%
  peak_rss           7.26MB ± 90.4KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          531M  ± 8.13M      520M  …  546M           0 ( 0%)        0%
  instructions       1.23G  ± 32.7      1.23G  … 1.23G           0 ( 0%)        0%
  cache_references    577K  ±  271K      335K  … 1.89M           1 ( 3%)        0%
  cache_misses       21.4K  ± 7.81K     12.6K  … 39.1K           0 ( 0%)        0%
  branch_misses      1.10M  ± 2.13K     1.09M  … 1.10M           0 ( 0%)        0%
Benchmark 2 (34 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           149ms ± 14.3ms     129ms …  166ms          0 ( 0%)          +  1.7% ±  4.8%
  peak_rss           7.25MB ± 82.5KB    7.08MB … 7.34MB          0 ( 0%)          -  0.1% ±  0.6%
  cpu_cycles          531M  ± 8.07M      517M  …  543M           0 ( 0%)          -  0.1% ±  0.7%
  instructions       1.23G  ± 30.9      1.23G  … 1.23G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    592K  ±  263K      368K  … 1.52M           2 ( 6%)          +  2.5% ± 22.3%
  cache_misses       22.6K  ± 10.3K     12.3K  … 57.0K           1 ( 3%)          +  5.9% ± 20.6%
  branch_misses       995K  ± 2.83K      992K  … 1.00M           0 ( 0%)        ⚡-  9.2% ±  0.1%

reader

andrewrk · 2023-10-04T18:13:34Z

I'm going to merge this because I think that is the best way to evaluate it. Plus, I want to start playing with it in side projects.

So far it looks like it doesn't really affect perf or bloat for existing codebases (please share if you find a counter example), and we don't really know how beneficial AnyReader is because nobody is using it yet.

If this causes immediate issues for anyone, please mention it and this can be reverted (or fixed).

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 · 2023-10-14T03:43:31Z

I was interested in the corresponding change for std.io.Writer, so I tried implementing it here, following the same strategy as this PR: https://github.com/ianprime0509/zig/tree/type-erased-writer

However, I was rather surprised by the negative performance results, which is why I haven't opened a PR for this:

poop './bench-zig /home/ian/src/zig' './bench-zig /home/ian/src/zig-worktrees/type-erased-writer'
Benchmark 1 (3 runs): ./bench-zig /home/ian/src/zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          86.2s  ±  164ms    86.0s  … 86.4s           0 ( 0%)        0%
  peak_rss           5.31GB ±  628KB    5.31GB … 5.31GB          0 ( 0%)        0%
  cpu_cycles          340G  ±  545M      340G  …  341G           0 ( 0%)        0%
  instructions        436G  ± 76.1M      435G  …  436G           0 ( 0%)        0%
  cache_references   33.6G  ± 76.5M     33.5G  … 33.6G           0 ( 0%)        0%
  cache_misses       7.98G  ± 20.5M     7.96G  … 8.00G           0 ( 0%)        0%
  branch_misses      2.12G  ± 9.77M     2.11G  … 2.13G           0 ( 0%)        0%
Benchmark 2 (3 runs): ./bench-zig /home/ian/src/zig-worktrees/type-erased-writer
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          91.3s  ±  302ms    91.0s  … 91.6s           0 ( 0%)        💩+  6.0% ±  0.6%
  peak_rss           5.53GB ±  308KB    5.53GB … 5.53GB          0 ( 0%)        💩+  4.1% ±  0.0%
  cpu_cycles          360G  ± 1.12G      359G  …  362G           0 ( 0%)        💩+  5.9% ±  0.6%
  instructions        462G  ± 56.1M      462G  …  462G           0 ( 0%)        💩+  6.0% ±  0.0%
  cache_references   35.4G  ± 28.7M     35.4G  … 35.5G           0 ( 0%)        💩+  5.6% ±  0.4%
  cache_misses       8.41G  ± 20.7M     8.38G  … 8.42G           0 ( 0%)        💩+  5.3% ±  0.6%
  branch_misses      2.27G  ± 9.01M     2.26G  … 2.28G           0 ( 0%)        💩+  7.1% ±  1.0%

For completeness, my bench-zig script is as follows:

#!/bin/sh
cd "$1"/build
rm -rf stage4 ../zig-cache
stage3/bin/zig build -p stage4 -Dno-lib -Denable-llvm

I tried the benchmark several times, including in the other order (to make sure the results were inverted), with pretty consistent results. It's surprising to me that Writer has these results while Reader wasn't impacted at all, but I also don't want to discourage anyone else from trying this out, since it's possible I'm doing something wrong here.

andrewrk · 2023-10-18T23:55:31Z

Could it perhaps be that there is some logic writing 1 byte at a time, and should be switched to a buffered writer?

I'm curious where all the writers are in the zig compiler; I can't imagine there being that many in the hot path.

Have you tried a micro benchmark of the new code?

If you're up for it, I'd be interested in a PR where we can dissect it even if it doesn't end up getting merged.

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 · 2023-10-20T02:13:09Z

I opened #17634 with some more benchmark information in the PR description. In addition to trying some build variations on another project, I added a couple micro-benchmarks I tried out.

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

andrewrk mentioned this pull request Oct 1, 2023

Support fetching dependencies over git+http(s) #17277

Merged

andrewrk force-pushed the type-erased-reader branch from 01ccc73 to 5405444 Compare October 1, 2023 20:13

andrewrk force-pushed the type-erased-reader branch from 5405444 to a6e915c Compare October 3, 2023 20:28

andrewrk force-pushed the type-erased-reader branch from a6e915c to 2c2a5e6 Compare October 3, 2023 20:31

andrewrk added 3 commits October 3, 2023 14:55

std.macho: remove alignment from LoadCommandIterator

8ebebbd

update zig1.wasm

9a09651

Notably, this contains bug fixes related to `@errorCast` which are required by the changes to `std.io.Reader` in this branch, and the compiler source code has a dependency on `std.io.Reader`.

andrewrk force-pushed the type-erased-reader branch from 2c2a5e6 to 9a09651 Compare October 3, 2023 22:15

andrewrk mentioned this pull request Oct 3, 2023

Draft: Have Reader/Writer be a runtime instead of comptime interface #13808

Closed

andrewrk merged commit a306bfc into master Oct 4, 2023

andrewrk deleted the type-erased-reader branch October 4, 2023 18:21

andrewrk mentioned this pull request Oct 4, 2023

In/OutStream design flaw introduced with new error set changes. #764

Closed

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023

Add type-erased writer and GenericWriter

2784091

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023

Add type-erased writer and GenericWriter

18afcce

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023

Add type-erased writer and GenericWriter

f2f1a07

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 14, 2023

Add type-erased writer and GenericWriter

6d10444

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 20, 2023

Add type-erased writer and GenericWriter

f387f5a

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 mentioned this pull request Oct 20, 2023

Add type-erased writer and GenericWriter #17634

Merged

travisstaloch mentioned this pull request Oct 26, 2023

new generic reader hides error.EndOfStream, resulting in 'panic: invalid error code' #17733

Closed

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 28, 2023

Add type-erased writer and GenericWriter

9846918

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Nov 5, 2023

Add type-erased writer and GenericWriter

c824478

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

mlugg mentioned this pull request Nov 13, 2023

Performance penalty of using std.io.Reader #17985

Open

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Dec 2, 2023

Add type-erased writer and GenericWriter

fc40a07

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Jan 30, 2024

Add type-erased writer and GenericWriter

e392c1a

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this pull request Feb 8, 2024

Add type-erased writer and GenericWriter

03ac8ba

This is a companion to ziglang#17344 to apply the same change to the `std.io.Writer` interface.

tomc1998 mentioned this pull request Dec 26, 2024

The type erased reader changes massively impact zig's IO performance for buffered readers, fixed buffer streams, etc. #22322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std: add type-erased reader; base GenericReader on it #17344

std: add type-erased reader; base GenericReader on it #17344

andrewrk commented Oct 1, 2023 •

edited

Loading

mlugg commented Oct 1, 2023

jayschwa commented Oct 1, 2023 •

edited

Loading

andrewrk commented Oct 1, 2023

BratishkaErik commented Oct 1, 2023

andrewrk commented Oct 1, 2023

andrewrk commented Oct 1, 2023

andrewrk commented Oct 3, 2023 •

edited

Loading

andrewrk commented Oct 3, 2023

ianprime0509 commented Oct 4, 2023

andrewrk commented Oct 4, 2023

ianprime0509 commented Oct 14, 2023

andrewrk commented Oct 18, 2023 •

edited

Loading

ianprime0509 commented Oct 20, 2023

std: add type-erased reader; base GenericReader on it #17344

std: add type-erased reader; base GenericReader on it #17344

Conversation

andrewrk commented Oct 1, 2023 • edited Loading

mlugg commented Oct 1, 2023

jayschwa commented Oct 1, 2023 • edited Loading

andrewrk commented Oct 1, 2023

BratishkaErik commented Oct 1, 2023

andrewrk commented Oct 1, 2023

andrewrk commented Oct 1, 2023

andrewrk commented Oct 3, 2023 • edited Loading

andrewrk commented Oct 3, 2023

ianprime0509 commented Oct 4, 2023

andrewrk commented Oct 4, 2023

ianprime0509 commented Oct 14, 2023

andrewrk commented Oct 18, 2023 • edited Loading

ianprime0509 commented Oct 20, 2023

andrewrk commented Oct 1, 2023 •

edited

Loading

jayschwa commented Oct 1, 2023 •

edited

Loading

andrewrk commented Oct 3, 2023 •

edited

Loading

andrewrk commented Oct 18, 2023 •

edited

Loading