Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: add type-erased reader; base GenericReader on it #17344

Merged
merged 3 commits into from
Oct 4, 2023

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Oct 1, 2023

The idea here is to avoid code bloat by having only one actual io.Reader implementation, which is type erased, and then implement a GenericReader that preserves type information on top of that as thin glue code. API users which want to continue using generic context and error set types can continue to use reader: anytype, however, API users now have an additional option which is to use TypeErasedReader which can be accepted as a parameter of a non-generic function.

The strategy here is for that glue code to @errorCast the result of the type-erased reader functions.

@mlugg
Copy link
Member

mlugg commented Oct 1, 2023

Before merging this, I think we should do some perf analysis. I'm concerned that the loss of implementation-specific inlining here will result in certain reader functions becoming slower, particularly those which read data one byte at a time.

Also, if we're making this change to Reader, should we not make an analogous change to Writer at the same time?

@jayschwa
Copy link
Contributor

jayschwa commented Oct 1, 2023

While this idea is clever, I think a proliferation of GenericFoo vs TypeErasedFoo across the standard library would be unfortunate. Having to choose between precise error-handling or "not being bloated" feels like a bad compromise. I want to have my cake and eat it too. 👼🏻 🍰

I will be interested to see if continued, exclusive use of [Generic]Reader still benefits from "de-bloating" after this changes. It sounds like it would if it's only a thin wrapper now.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2023

Before merging this, I think we should do some perf analysis. I'm concerned that the loss of implementation-specific inlining here will result in certain reader functions becoming slower, particularly those which read data one byte at a time.

I agree, let's definitely understand the performance characteristics of this before merging (and quantify the effect on bloat).

Also, if we're making this change to Reader, should we not make an analogous change to Writer at the same time?

Yes, I just wanted to get a proof-of-concept up for discussion. I probably would have kept it local until I got more work done but I got stuck on #17343, so tossing it up here until that yak is shaven.

I will be interested to see if continued, exclusive use of [Generic]Reader still benefits from "de-bloating" after this changes. It sounds like it would if it's only a thin wrapper now.

I'm interested to see that as well. Some related things:

This code:

pub fn init(stream: anytype, ca_bundle: Certificate.Bundle, host: []const u8) InitError(@TypeOf(stream))!Client {

pub fn readvAdvanced(c: *Client, stream: anytype, iovecs: []const std.os.iovec) !usize {

This is some low level API that is already suffering from bloat as it stands and probably would want to take advantage of a type-erased reader.

@BratishkaErik
Copy link
Contributor

The strategy here is for that glue code to @errSetCast the result of the type-erased reader functions, however, while trying to do that I ran into #17343.

Related #13808 (comment) ? :

You could further reduce redundant generic instantiations by implementing base reader and writer types and then using @errSetCast and semantic inlining:

This would be much nicer if @errSetCast worked on error unions.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2023

Yes exactly the same idea as @Vexu!

@andrewrk andrewrk force-pushed the type-erased-reader branch from 01ccc73 to 5405444 Compare October 1, 2023 20:13
@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2023

Thanks @Vexu for the initial implementation of #17343. This PR is now blocking on #17355.

@andrewrk andrewrk force-pushed the type-erased-reader branch from 5405444 to a6e915c Compare October 3, 2023 20:28
@andrewrk
Copy link
Member Author

andrewrk commented Oct 3, 2023

std lib tests are passing for me locally, so, the current status of this PR is data collection:

  • does this help with bloat?
    • data point: std lib tests with -OReleaseSmall: no significant difference
  • how is the runtime performance affected?
    • data point: self-hosted compiler building itself: no significant difference
  • is the type-erased reader API useful to use?
    • I for one want to use it in a few places

Note that these data points above do not include actually taking advantage of the type erased reader; everything is still using the GenericReader wrapper.

@andrewrk andrewrk force-pushed the type-erased-reader branch from a6e915c to 2c2a5e6 Compare October 3, 2023 20:31
The idea here is to avoid code bloat by having only one actual io.Reader
implementation, which is type erased, and then implement a GenericReader
that preserves type information on top of that as thin glue code.

The strategy here is for that glue code to `@errSetCast` the result of
the type-erased reader functions, however, while trying to do that I
ran into #17343.
Notably, this contains bug fixes related to `@errorCast` which are
required by the changes to `std.io.Reader` in this branch, and the
compiler source code has a dependency on `std.io.Reader`.
@andrewrk andrewrk force-pushed the type-erased-reader branch from 2c2a5e6 to 9a09651 Compare October 3, 2023 22:15
@andrewrk
Copy link
Member Author

andrewrk commented Oct 3, 2023

self-hosted compiler building itself (insignificant perf difference):

Benchmark 1 (3 runs): build-exe before
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          68.9s  ± 1.85s     66.8s  … 70.5s           0 ( 0%)        0%
  peak_rss           4.53GB ±  356KB    4.53GB … 4.53GB          0 ( 0%)        0%
  cpu_cycles          275G  ±  579M      275G  …  276G           0 ( 0%)        0%
  instructions        387G  ±  127M      387G  …  387G           0 ( 0%)        0%
  cache_references   17.0G  ± 45.0M     16.9G  … 17.0G           0 ( 0%)        0%
  cache_misses       1.35G  ± 5.26M     1.35G  … 1.36G           0 ( 0%)        0%
  branch_misses      1.90G  ± 2.89M     1.90G  … 1.91G           0 ( 0%)        0%
Benchmark 2 (3 runs): build-exe after
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          69.6s  ± 2.75s     66.7s  … 72.1s           0 ( 0%)          +  1.1% ±  7.7%
  peak_rss           4.54GB ±  432KB    4.54GB … 4.54GB          0 ( 0%)          +  0.1% ±  0.0%
  cpu_cycles          277G  ± 2.01G      276G  …  279G           0 ( 0%)          +  0.8% ±  1.2%
  instructions        389G  ± 67.0M      389G  …  389G           0 ( 0%)          +  0.5% ±  0.1%
  cache_references   17.0G  ± 38.0M     17.0G  … 17.1G           0 ( 0%)          +  0.6% ±  0.6%
  cache_misses       1.36G  ± 14.0M     1.35G  … 1.37G           0 ( 0%)          +  0.4% ±  1.8%
  branch_misses      1.90G  ± 2.92M     1.90G  … 1.90G           0 ( 0%)          -  0.1% ±  0.3%

@ianprime0509
Copy link
Contributor

Regarding concerns about reading one byte at a time, here are some measurements from one of my projects which relies on that extensively (XML parser internally implemented as a state machine). The short version is that there doesn't seem to be any noticeable difference in performance (wall time) with this change.

Repository: https://github.com/ianprime0509/zig-xml (commit dfdc044f3271641c7d428dc8ec8cd46423d8b8b6, see the bench directory)
Test file: https://github.com/ianprime0509/gir-files/blob/dabd4d36bf2d6008754d90b392cc007a683cde0c/Gtk-4.0.gir (5.7MiB XML file)
The benchmark is compiled using ReleaseFast and reads the entire file into memory before using std.io.fixedBufferStream to read it into the parser. Two implementations are compared below (reader does more work than token_reader):

Benchmark 1 (44 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           115ms ± 16.1ms     101ms …  153ms          0 ( 0%)        0%
  peak_rss           7.29MB ± 64.6KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          409M  ± 12.9M      400M  …  463M           3 ( 7%)        0%
  instructions       1.01G  ± 27.5      1.01G  … 1.01G           3 ( 7%)        0%
  cache_references    262K  ± 66.9K      226K  …  618K           4 ( 9%)        0%
  cache_misses       11.1K  ± 6.80K     7.69K  … 41.7K           5 (11%)        0%
  branch_misses       753K  ± 6.95K      750K  …  797K           3 ( 7%)        0%
Benchmark 2 (44 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           115ms ± 14.2ms     101ms …  136ms          0 ( 0%)          +  0.0% ±  5.6%
  peak_rss           7.30MB ± 61.8KB    7.21MB … 7.34MB          0 ( 0%)          +  0.1% ±  0.4%
  cpu_cycles          409M  ± 6.95M      400M  …  424M           0 ( 0%)          +  0.2% ±  1.1%
  instructions       1.01G  ± 30.4      1.01G  … 1.01G           0 ( 0%)          -  0.0% ±  0.0%
  cache_references    258K  ± 23.9K      229K  …  321K           3 ( 7%)          -  1.6% ±  8.1%
  cache_misses       12.1K  ± 3.79K     7.85K  … 21.7K           0 ( 0%)          +  8.7% ± 21.0%
  branch_misses       842K  ±  751       841K  …  844K           0 ( 0%)        💩+ 11.9% ±  0.3%
token_reader
Benchmark 1 (35 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           146ms ± 14.9ms     129ms …  165ms          0 ( 0%)        0%
  peak_rss           7.26MB ± 90.4KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          531M  ± 8.13M      520M  …  546M           0 ( 0%)        0%
  instructions       1.23G  ± 32.7      1.23G  … 1.23G           0 ( 0%)        0%
  cache_references    577K  ±  271K      335K  … 1.89M           1 ( 3%)        0%
  cache_misses       21.4K  ± 7.81K     12.6K  … 39.1K           0 ( 0%)        0%
  branch_misses      1.10M  ± 2.13K     1.09M  … 1.10M           0 ( 0%)        0%
Benchmark 2 (34 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           149ms ± 14.3ms     129ms …  166ms          0 ( 0%)          +  1.7% ±  4.8%
  peak_rss           7.25MB ± 82.5KB    7.08MB … 7.34MB          0 ( 0%)          -  0.1% ±  0.6%
  cpu_cycles          531M  ± 8.07M      517M  …  543M           0 ( 0%)          -  0.1% ±  0.7%
  instructions       1.23G  ± 30.9      1.23G  … 1.23G           0 ( 0%)          +  0.0% ±  0.0%
  cache_references    592K  ±  263K      368K  … 1.52M           2 ( 6%)          +  2.5% ± 22.3%
  cache_misses       22.6K  ± 10.3K     12.3K  … 57.0K           1 ( 3%)          +  5.9% ± 20.6%
  branch_misses       995K  ± 2.83K      992K  … 1.00M           0 ( 0%)        ⚡-  9.2% ±  0.1%
reader

@andrewrk
Copy link
Member Author

andrewrk commented Oct 4, 2023

I'm going to merge this because I think that is the best way to evaluate it. Plus, I want to start playing with it in side projects.

So far it looks like it doesn't really affect perf or bloat for existing codebases (please share if you find a counter example), and we don't really know how beneficial AnyReader is because nobody is using it yet.

If this causes immediate issues for anyone, please mention it and this can be reverted (or fixed).

@andrewrk andrewrk merged commit a306bfc into master Oct 4, 2023
@andrewrk andrewrk deleted the type-erased-reader branch October 4, 2023 18:21
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 9, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 14, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
@ianprime0509
Copy link
Contributor

I was interested in the corresponding change for std.io.Writer, so I tried implementing it here, following the same strategy as this PR: https://github.com/ianprime0509/zig/tree/type-erased-writer

However, I was rather surprised by the negative performance results, which is why I haven't opened a PR for this:

poop './bench-zig /home/ian/src/zig' './bench-zig /home/ian/src/zig-worktrees/type-erased-writer'
Benchmark 1 (3 runs): ./bench-zig /home/ian/src/zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          86.2s  ±  164ms    86.0s  … 86.4s           0 ( 0%)        0%
  peak_rss           5.31GB ±  628KB    5.31GB … 5.31GB          0 ( 0%)        0%
  cpu_cycles          340G  ±  545M      340G  …  341G           0 ( 0%)        0%
  instructions        436G  ± 76.1M      435G  …  436G           0 ( 0%)        0%
  cache_references   33.6G  ± 76.5M     33.5G  … 33.6G           0 ( 0%)        0%
  cache_misses       7.98G  ± 20.5M     7.96G  … 8.00G           0 ( 0%)        0%
  branch_misses      2.12G  ± 9.77M     2.11G  … 2.13G           0 ( 0%)        0%
Benchmark 2 (3 runs): ./bench-zig /home/ian/src/zig-worktrees/type-erased-writer
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          91.3s  ±  302ms    91.0s  … 91.6s           0 ( 0%)        💩+  6.0% ±  0.6%
  peak_rss           5.53GB ±  308KB    5.53GB … 5.53GB          0 ( 0%)        💩+  4.1% ±  0.0%
  cpu_cycles          360G  ± 1.12G      359G  …  362G           0 ( 0%)        💩+  5.9% ±  0.6%
  instructions        462G  ± 56.1M      462G  …  462G           0 ( 0%)        💩+  6.0% ±  0.0%
  cache_references   35.4G  ± 28.7M     35.4G  … 35.5G           0 ( 0%)        💩+  5.6% ±  0.4%
  cache_misses       8.41G  ± 20.7M     8.38G  … 8.42G           0 ( 0%)        💩+  5.3% ±  0.6%
  branch_misses      2.27G  ± 9.01M     2.26G  … 2.28G           0 ( 0%)        💩+  7.1% ±  1.0%

For completeness, my bench-zig script is as follows:

#!/bin/sh
cd "$1"/build
rm -rf stage4 ../zig-cache
stage3/bin/zig build -p stage4 -Dno-lib -Denable-llvm

I tried the benchmark several times, including in the other order (to make sure the results were inverted), with pretty consistent results. It's surprising to me that Writer has these results while Reader wasn't impacted at all, but I also don't want to discourage anyone else from trying this out, since it's possible I'm doing something wrong here.

@andrewrk
Copy link
Member Author

andrewrk commented Oct 18, 2023

Could it perhaps be that there is some logic writing 1 byte at a time, and should be switched to a buffered writer?

I'm curious where all the writers are in the zig compiler; I can't imagine there being that many in the hot path.

Have you tried a micro benchmark of the new code?

If you're up for it, I'd be interested in a PR where we can dissect it even if it doesn't end up getting merged.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 20, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
@ianprime0509
Copy link
Contributor

I opened #17634 with some more benchmark information in the PR description. In addition to trying some build variations on another project, I added a couple micro-benchmarks I tried out.

ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Oct 28, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Nov 5, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Dec 2, 2023
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
ianprime0509 added a commit to ianprime0509/zig that referenced this pull request Jan 30, 2024
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
RossComputerGuy pushed a commit to ExpidusOS-archive/zig that referenced this pull request Feb 8, 2024
This is a companion to ziglang#17344 to apply the same change to the
`std.io.Writer` interface.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants