Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary/stream/reader: Fast-path offsetReader skips #519

Merged
merged 2 commits into from
Jul 8, 2021

Conversation

abhinav
Copy link
Contributor

@abhinav abhinav commented Jul 8, 2021

Before streaming, when we were working off an in-memory buffer, skipping
bytes was very cheap: offset += n. The streaming implementation relies
on io.CopyN(ioutil.Discard) to skip bytes.

This is comparatively inefficient because:

  1. We have to copy bytes for them to be discarded
  2. io.CopyN makes a new buffer for each call

This change addresses (1) for the common case of using the non-streaming
reader: if the io.Reader that we're streaming from is an
offsetReader, we optimize our skips by simply incrementing its offset.

Naively, we might do this by adding two implementations of discard on
StreamReader and picking between them at initialization.

type StreamReader struct {
    // ...
    discard func(int64) error
}

func NewStreamReader(r io.Reader) *StreamReader {
    sr := StreamReader{...}
    if condition {
        sr.discard = sr.discardStream
    } else {
        sr.discard = sr.discardOffset
    }
    // ...
}

Except that alone isn't enough. The sr.discard = sr.discardStream
assignment causes an allocation because we're referencing a bound method
of an instance of the object. It's roughly equivalent to allocating a
closure:

sr.discard = func(n int64) error { return sr.discardStream(n) }

To do this more efficiently, we'll hold onto the pointers to the bound
methods discardStream and discardOffset on the pooled StreamReader
object. This is the same technique employed in binary.Writer to avoid
similar allocations for writeValue and writeMapItem.

name                                                  old time/op    new time/op    delta
RoundTrip/PrimitiveOptionalStruct/Decode-4              2.91µs ± 6%    2.90µs ± 1%     ~     (p=0.497 n=10+9)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4    1.85µs ± 4%    1.84µs ± 2%     ~     (p=0.691 n=9+8)
RoundTrip/Graph/Decode-4                                9.64µs ± 1%    6.36µs ± 2%  -34.02%  (p=0.000 n=10+9)
RoundTrip/Graph/Streaming_Decode-4                      2.69µs ± 4%    2.69µs ± 4%     ~     (p=0.495 n=8+9)
RoundTrip/ContainersOfContainers/Decode-4               79.1µs ±27%    56.6µs ± 1%  -28.50%  (p=0.000 n=10+10)
RoundTrip/ContainersOfContainers/Streaming_Decode-4     30.3µs ± 2%    30.5µs ± 5%     ~     (p=0.696 n=8+10)

name                                                  old alloc/op   new alloc/op   delta
RoundTrip/PrimitiveOptionalStruct/Decode-4              1.40kB ± 0%    1.40kB ± 0%     ~     (all equal)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4     56.0B ± 0%     56.0B ± 0%     ~     (all equal)
RoundTrip/Graph/Decode-4                                3.52kB ± 0%    2.80kB ± 0%  -20.54%  (p=0.000 n=7+10)
RoundTrip/Graph/Streaming_Decode-4                        168B ± 0%      168B ± 0%     ~     (all equal)
RoundTrip/ContainersOfContainers/Decode-4               15.7kB ± 0%    13.2kB ± 0%  -16.28%  (p=0.000 n=10+8)
RoundTrip/ContainersOfContainers/Streaming_Decode-4     10.1kB ± 0%    10.1kB ± 0%     ~     (p=0.059 n=8+10)

name                                                  old allocs/op  new allocs/op  delta
RoundTrip/PrimitiveOptionalStruct/Decode-4                14.0 ± 0%      14.0 ± 0%     ~     (all equal)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4      10.0 ± 0%      10.0 ± 0%     ~     (all equal)
RoundTrip/Graph/Decode-4                                  63.0 ± 0%      33.0 ± 0%  -47.62%  (p=0.000 n=10+10)
RoundTrip/Graph/Streaming_Decode-4                        10.0 ± 0%      10.0 ± 0%     ~     (all equal)
RoundTrip/ContainersOfContainers/Decode-4                  306 ± 0%       200 ± 0%  -34.64%  (p=0.000 n=10+9)
RoundTrip/ContainersOfContainers/Streaming_Decode-4        146 ± 0%       146 ± 0%     ~     (p=1.000 n=10+10)

@abhinav abhinav requested review from witriew and usmyth July 8, 2021 01:51
@abhinav abhinav force-pushed the abg/skip-collections-faster branch from 839b255 to fae4d5f Compare July 8, 2021 02:26
@abhinav abhinav force-pushed the abg/offset-skips branch from 8961254 to f6fbe7f Compare July 8, 2021 02:26
protocol/binary/stream_reader.go Outdated Show resolved Hide resolved
protocol/binary/stream_reader.go Outdated Show resolved Hide resolved
Base automatically changed from abg/skip-collections-faster to streamdev July 8, 2021 02:45
Before streaming, when we were working off an in-memory buffer, skipping
bytes was very cheap: `offset += n`. The streaming implementation relies
on `io.CopyN(ioutil.Discard)` to skip bytes.

This is comparatively inefficient because:

1. We have to copy bytes for them to be discarded
2. io.CopyN makes a new buffer for each call

This change addresses (1) for the common case of using the non-streaming
reader: if the `io.Reader` that we're streaming from is an
`offsetReader`, we optimize our skips by simply incrementing its offset.

Naively, we might do this by adding two implementations of `discard` on
`StreamReader` and picking between them at initialization.

    type StreamReader struct {
        // ...
        discard func(int64) error
    }

    func NewStreamReader(r io.Reader) *StreamReader {
        sr := StreamReader{...}
        if condition {
            sr.discard = sr.discardStream
        } else {
            sr.discard = sr.discardOffset
        }
        // ...
    }

Except that alone isn't enough. The `sr.discard = sr.discardStream`
assignment causes an allocation because we're referencing a bound method
of an instance of the object. It's roughly equivalent to allocating a
closure:

    sr.discard = func(n int64) error { return sr.discardStream(n) }

To do this more efficiently, we'll hold onto the pointers to the bound
methods discardStream and discardOffset on the pooled StreamReader
object. This is the same technique employed in `binary.Writer` to avoid
similar allocations for writeValue and writeMapItem.

```
name                                                  old time/op    new time/op    delta
RoundTrip/PrimitiveOptionalStruct/Decode-4              2.91µs ± 6%    2.90µs ± 1%     ~     (p=0.497 n=10+9)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4    1.85µs ± 4%    1.84µs ± 2%     ~     (p=0.691 n=9+8)
RoundTrip/Graph/Decode-4                                9.64µs ± 1%    6.36µs ± 2%  -34.02%  (p=0.000 n=10+9)
RoundTrip/Graph/Streaming_Decode-4                      2.69µs ± 4%    2.69µs ± 4%     ~     (p=0.495 n=8+9)
RoundTrip/ContainersOfContainers/Decode-4               79.1µs ±27%    56.6µs ± 1%  -28.50%  (p=0.000 n=10+10)
RoundTrip/ContainersOfContainers/Streaming_Decode-4     30.3µs ± 2%    30.5µs ± 5%     ~     (p=0.696 n=8+10)

name                                                  old alloc/op   new alloc/op   delta
RoundTrip/PrimitiveOptionalStruct/Decode-4              1.40kB ± 0%    1.40kB ± 0%     ~     (all equal)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4     56.0B ± 0%     56.0B ± 0%     ~     (all equal)
RoundTrip/Graph/Decode-4                                3.52kB ± 0%    2.80kB ± 0%  -20.54%  (p=0.000 n=7+10)
RoundTrip/Graph/Streaming_Decode-4                        168B ± 0%      168B ± 0%     ~     (all equal)
RoundTrip/ContainersOfContainers/Decode-4               15.7kB ± 0%    13.2kB ± 0%  -16.28%  (p=0.000 n=10+8)
RoundTrip/ContainersOfContainers/Streaming_Decode-4     10.1kB ± 0%    10.1kB ± 0%     ~     (p=0.059 n=8+10)

name                                                  old allocs/op  new allocs/op  delta
RoundTrip/PrimitiveOptionalStruct/Decode-4                14.0 ± 0%      14.0 ± 0%     ~     (all equal)
RoundTrip/PrimitiveOptionalStruct/Streaming_Decode-4      10.0 ± 0%      10.0 ± 0%     ~     (all equal)
RoundTrip/Graph/Decode-4                                  63.0 ± 0%      33.0 ± 0%  -47.62%  (p=0.000 n=10+10)
RoundTrip/Graph/Streaming_Decode-4                        10.0 ± 0%      10.0 ± 0%     ~     (all equal)
RoundTrip/ContainersOfContainers/Decode-4                  306 ± 0%       200 ± 0%  -34.64%  (p=0.000 n=10+9)
RoundTrip/ContainersOfContainers/Streaming_Decode-4        146 ± 0%       146 ± 0%     ~     (p=1.000 n=10+10)
```
@abhinav abhinav force-pushed the abg/offset-skips branch from f6fbe7f to 0abdb94 Compare July 8, 2021 02:45
Co-authored-by: Wit Riewrangboonya <wit@uber.com>
@codecov
Copy link

codecov bot commented Jul 8, 2021

Codecov Report

Merging #519 (933dc23) into streamdev (ce4445d) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##           streamdev     #519      +/-   ##
=============================================
- Coverage      68.83%   68.83%   -0.01%     
=============================================
  Files            130      130              
  Lines          22755    22767      +12     
=============================================
+ Hits           15663    15671       +8     
- Misses          4069     4071       +2     
- Partials        3023     3025       +2     
Impacted Files Coverage Δ
protocol/binary/stream_reader.go 100.00% <100.00%> (ø)
protocol/binary/reader.go 82.14% <0.00%> (-3.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce4445d...933dc23. Read the comment docs.

@abhinav abhinav merged commit ebaad6e into streamdev Jul 8, 2021
@abhinav abhinav deleted the abg/offset-skips branch July 8, 2021 02:51
@abhinav abhinav mentioned this pull request Aug 30, 2021
abhinav added a commit that referenced this pull request Aug 30, 2021
# Commits

The following comments are included in this release. Some of these
cherry-picked and released in v1.28.0, but they appear again in the
list above.

- protocol: Add streaming interfaces (#485)
- Move Stream-based interfaces into their own package
- Make Streaming interfaces private to allow for safe experimentation (#488)
- idl: Return structured ParseError from idl.Parse() (#492)
- Add CHANGELOG entry for #492 (#494)
- Support "<" in the templating language (#499)
- idl: add a Position struct to wrap reported lines (#497)
- Add streamwriter implementation (#490)
- Add a "StreamReader" which implements "stream.Reader"
- Use the "stream.Reader" in the "binary.Reader"
- Add code generation for all wire types for stream encoding (#500)
- Generate "Decode" for "enums" that will directly decode (#495)
- Provide "decode" code generation for the streaming variants for all other types (#496)
- idl: record document positions on constant nodes (#503)
- ast: move idl.Position to the ast package (#504)
- idl: replace internal.Position with ast.Position (#505)
- Expose stream protocol method to close Writer (#506)
- idl: add column numbers to parse error positions (#507)
- idl: record full positions for constants (#508)
- Mark assertParseCases() as a test helper (#509)
- protocol/stream: Define enveloping interfaces (#511)
- protocol/stream: Declare interface for encoding envelopes (#513)
- binary/StreamWriter: Borrow => New; unexport Return (#515)
- stream: add Close method, pool binary reader (#514)
- binary/reader: Return to pool after ReadValue (#517)
- binary/reader: Skip fixed width collections faster (#518)
- binary/stream/reader: Fast-path offsetReader skips (#519)
- binary: Move Responders and Protocol into package (#516)
- benchmark: Refactor into a suite (#520)
- Upgrade to Ragel version 6.10 (from 6.9) (#523)
- Responder: Deduplicate interface (#524)
- gen/quick_test: Add missing types (#525)
- enum/json: Support rejecting unknown values (#502)
- Back to development
- Upgrade to golang.org/x/tools version 0.1.5 (#529)
- ast: add column values to the AST nodes (#522)
- stream: Implement Request and Response handling with Enveloping (#526)
- offsetReader: Implement io.Seeker
- binary/ReadRequest: Use io.Seeker if available
- StreamReader: Use Seeker instead of offsetReader
- protocol/stream: Unembed stream.Protocol from stream.RequestReader (#532)
- thrifttest: Add mocks for streaming interfaces (#527)
- streaming: Unembed iface.Private in streaming-based interfaces (#533)
- Regenerate files for tests after merging `streamdev`
- ast: formally declare CppInclude as a Node (#536)
- ast: add Annotations(Node) []*Annotations (#537)
- Preparing release v1.29.0

# API changes

I ran apidiff on all packages in v1.28.0 and compared it with this
release. Removing changes to gen/internal/tests, the result is:

```
--- go.uber.org/thriftrw/ast ---
Compatible changes:
- Annotation.Column: added
- Annotations: added
- BaseType.Column: added
- Constant.Column: added
- ConstantList.Column: added
- ConstantMap.Column: added
- ConstantMapItem.Column: added
- ConstantReference.Column: added
- CppInclude.Column: added
- DefinitionInfo.Column: added
- Enum.Column: added
- EnumItem.Column: added
- Field.Column: added
- Function.Column: added
- Include.Column: added
- ListType.Column: added
- MapType.Column: added
- Namespace.Column: added
- Position.Column: added
- Position.String: added
- Service.Column: added
- ServiceReference.Column: added
- SetType.Column: added
- Struct.Column: added
- TypeReference.Column: added
- Typedef.Column: added

--- go.uber.org/thriftrw/envelope/stream ---
NEW PACKAGE

--- go.uber.org/thriftrw/gen ---
Compatible changes:
- StreamGenerator: added

--- go.uber.org/thriftrw/internal/envelope/exception ---
Compatible changes:
- (*ExceptionType).Decode: added
- (*TApplicationException).Decode: added
- (*TApplicationException).Encode: added
- ExceptionType.Encode: added

--- go.uber.org/thriftrw/plugin/api ---
Compatible changes:
- (*Argument).Decode: added
- (*Argument).Encode: added
- (*Feature).Decode: added
- (*Function).Decode: added
- (*Function).Encode: added
- (*GenerateServiceRequest).Decode: added
- (*GenerateServiceRequest).Encode: added
- (*GenerateServiceResponse).Decode: added
- (*GenerateServiceResponse).Encode: added
- (*HandshakeRequest).Decode: added
- (*HandshakeRequest).Encode: added
- (*HandshakeResponse).Decode: added
- (*HandshakeResponse).Encode: added
- (*Module).Decode: added
- (*Module).Encode: added
- (*ModuleID).Decode: added
- (*Plugin_Goodbye_Args).Decode: added
- (*Plugin_Goodbye_Args).Encode: added
- (*Plugin_Goodbye_Result).Decode: added
- (*Plugin_Goodbye_Result).Encode: added
- (*Plugin_Handshake_Args).Decode: added
- (*Plugin_Handshake_Args).Encode: added
- (*Plugin_Handshake_Result).Decode: added
- (*Plugin_Handshake_Result).Encode: added
- (*Service).Decode: added
- (*Service).Encode: added
- (*ServiceGenerator_Generate_Args).Decode: added
- (*ServiceGenerator_Generate_Args).Encode: added
- (*ServiceGenerator_Generate_Result).Decode: added
- (*ServiceGenerator_Generate_Result).Encode: added
- (*ServiceID).Decode: added
- (*SimpleType).Decode: added
- (*Type).Decode: added
- (*Type).Encode: added
- (*TypePair).Decode: added
- (*TypePair).Encode: added
- (*TypeReference).Decode: added
- (*TypeReference).Encode: added
- Feature.Encode: added
- ModuleID.Encode: added
- ServiceID.Encode: added
- SimpleType.Encode: added

--- go.uber.org/thriftrw/protocol ---
Compatible changes:
- BinaryStreamer: added

--- go.uber.org/thriftrw/protocol/binary ---
Compatible changes:
- Default: added
- EnvelopeV0Responder: added
- EnvelopeV1Responder: added
- NewStreamReader: added
- NewStreamWriter: added
- NoEnvelopeResponder: added
- Protocol: added
- Responder: added
- StreamReader: added
- StreamWriter: added

--- go.uber.org/thriftrw/protocol/envelope ---
NEW PACKAGE

--- go.uber.org/thriftrw/protocol/stream ---
NEW PACKAGE

--- go.uber.org/thriftrw/thrifttest/streamtest ---
NEW PACKAGE

--- go.uber.org/thriftrw/version ---
Incompatible changes:
- Version: value changed from "1.28.0" to "1.29.0"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants