forked from tokio-rs/prost
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Zero-copy String Types #6
Draft
ParkMyCar
wants to merge
3
commits into
master
Choose a base branch
from
str/zero-copy-string-type
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ParkMyCar
force-pushed
the
str/zero-copy-string-type
branch
from
February 2, 2024 21:42
c6c0d9b
to
7b1bb93
Compare
ParkMyCar
force-pushed
the
str/zero-copy-string-type
branch
from
February 2, 2024 22:38
7b1bb93
to
080397e
Compare
danhhz
added a commit
to danhhz/materialize
that referenced
this pull request
Feb 21, 2024
In the (presumably) common case of Rows made entirely of stack-allocated `Datum` (`Bool`, `Int64`, etc) this reduces the allocations per Row from 2 to 0. `Bytes` could be the same with careful use of prost's `bytes::Bytes` feature. `String` would need something like Parker's MaterializeInc/prost#6. For more complex types (`List`, `AclItem`, etc) this PR is probably a rounding error. This is accomplished by adding a new `decode_into` variant of the "legacy" `Codec::decode` method which enables amortizing allocations across calls in two ways. First, instead of returning `Self`, it takes `&mut Self` as a parameter, allowing for reuse of the internal `Row/SourceData` allocations. Second, it adds a `type Storage` to `Codec` and also passes it in as `&mut Option<Self::Storage>`, allowing for reuse for ProtoRow/ProtoSourceData allocations. `decode_into` gets a default implementation for `Codec`s that can't take advantage of this. This has no effect on the "structured" `Codec` path that we'll hopefully soon be moving to. I'm generally hesitant to spend too much time optimizing the "legacy" path, but this is simple enough and a big enough win that it felt worth it to me. We then take advantage of this in the `Codec` impls for `Row` (entirely unused and could be remove) and `SourceData` (the main one used in production). To reuse the `Vec<ProtoDatum>` in the `ProtoRow` in the `Ok` variant of `ProtoSourceData`, we take advantage of prost's `Message::merge`, which decodes an encoded proto into an existing message using proto's [merge] semantics. We do this because `Message::merge` seems to be careful and smart about reusing existing allocations when it can. To make this work (we don't actually want "merge" semantics), we first clear the message, so we're merging into an empty one. Unfortunately, the generated code for `ProtoSourceData::clear` is not able to keep the allocations around, so we have to manually "clear" it to the equivalent of an `Ok` with an empty `Row`. [merge]: https://protobuf.dev/programming-guides/encoding/#last-one-wins To reuse the internal allocs of the `Row` inside the `SourceData`, we optimize for the (definitely) common case of `Ok`. This is done manually with some match statements and by exposing a new `Row::decode_from` method. The benchmarks should be no-ops but are not. I have no explanation for this, but it seems to reproduce. SourceData roundtrip_int64_decode_legacy time: [1.7787 ms 1.7798 ms 1.7810 ms] change: [-2.2133% -1.8816% -1.5910%] (p = 0.00 < 0.05) Performance has improved. roundtrip_bytes_decode_legacy time: [2.7741 ms 2.7763 ms 2.7800 ms] change: [-0.3923% -0.2916% -0.1545%] (p = 0.00 < 0.05) Change within noise threshold. roundtrip_string_decode_legacy time: [2.4841 ms 2.4855 ms 2.4872 ms] change: [+2.0101% +2.1176% +2.2234%] (p = 0.00 < 0.05) Performance has regressed. Row roundtrip_decode_legacy time: [1.8818 ms 1.8834 ms 1.8851 ms] change: [-6.4160% -5.9947% -5.6271%] (p = 0.00 < 0.05) Performance has improved.
5 tasks
danhhz
added a commit
to danhhz/materialize
that referenced
this pull request
Feb 21, 2024
In the (presumably) common case of Rows made entirely of stack-allocated `Datum` (`Bool`, `Int64`, etc) this reduces the allocations per Row from 2 to 0. `Bytes` could be the same with careful use of prost's `bytes::Bytes` feature. `String` would need something like Parker's MaterializeInc/prost#6. For more complex types (`List`, `AclItem`, etc) this PR is probably a rounding error. This is accomplished by adding a new `decode_into` variant of the "legacy" `Codec::decode` method which enables amortizing allocations across calls in two ways. First, instead of returning `Self`, it takes `&mut Self` as a parameter, allowing for reuse of the internal `Row/SourceData` allocations. Second, it adds a `type Storage` to `Codec` and also passes it in as `&mut Option<Self::Storage>`, allowing for reuse for ProtoRow/ProtoSourceData allocations. `decode_into` gets a default implementation for `Codec`s that can't take advantage of this. This has no effect on the "structured" `Codec` path that we'll hopefully soon be moving to. I'm generally hesitant to spend too much time optimizing the "legacy" path, but this is simple enough and a big enough win that it felt worth it to me. We then take advantage of this in the `Codec` impls for `Row` (entirely unused and could be remove) and `SourceData` (the main one used in production). To reuse the `Vec<ProtoDatum>` in the `ProtoRow` in the `Ok` variant of `ProtoSourceData`, we take advantage of prost's `Message::merge`, which decodes an encoded proto into an existing message using proto's [merge] semantics. We do this because `Message::merge` seems to be careful and smart about reusing existing allocations when it can. To make this work (we don't actually want "merge" semantics), we first clear the message, so we're merging into an empty one. Unfortunately, the generated code for `ProtoSourceData::clear` is not able to keep the allocations around, so we have to manually "clear" it to the equivalent of an `Ok` with an empty `Row`. [merge]: https://protobuf.dev/programming-guides/encoding/#last-one-wins To reuse the internal allocs of the `Row` inside the `SourceData`, we optimize for the (definitely) common case of `Ok`. This is done manually with some match statements and by exposing a new `Row::decode_from` method. The benchmarks should be no-ops but are not. I have no explanation for this, but it seems to reproduce. SourceData roundtrip_int64_decode_legacy time: [1.7787 ms 1.7798 ms 1.7810 ms] change: [-2.2133% -1.8816% -1.5910%] (p = 0.00 < 0.05) Performance has improved. roundtrip_bytes_decode_legacy time: [2.7741 ms 2.7763 ms 2.7800 ms] change: [-0.3923% -0.2916% -0.1545%] (p = 0.00 < 0.05) Change within noise threshold. roundtrip_string_decode_legacy time: [2.4841 ms 2.4855 ms 2.4872 ms] change: [+2.0101% +2.1176% +2.2234%] (p = 0.00 < 0.05) Performance has regressed. Row roundtrip_decode_legacy time: [1.8818 ms 1.8834 ms 1.8851 ms] change: [-6.4160% -5.9947% -5.6271%] (p = 0.00 < 0.05) Performance has improved.
ParkMyCar
pushed a commit
to MaterializeInc/materialize
that referenced
this pull request
Feb 22, 2024
In the (presumably) common case of Rows made entirely of stack-allocated `Datum` (`Bool`, `Int64`, etc) this reduces the allocations per Row from 2 to 0. `Bytes` could be the same with careful use of prost's `bytes::Bytes` feature. `String` would need something like Parker's MaterializeInc/prost#6. For more complex types (`List`, `AclItem`, etc) this PR is probably a rounding error. This is accomplished by adding a new `decode_into` variant of the "legacy" `Codec::decode` method which enables amortizing allocations across calls in two ways. First, instead of returning `Self`, it takes `&mut Self` as a parameter, allowing for reuse of the internal `Row/SourceData` allocations. Second, it adds a `type Storage` to `Codec` and also passes it in as `&mut Option<Self::Storage>`, allowing for reuse of ProtoRow/ProtoSourceData allocations. `decode_into` gets a default implementation for `Codec`s that can't take advantage of this. This has no effect on the "structured" `Codec` path that we'll hopefully soon be moving to. I'm generally hesitant to spend too much time optimizing the "legacy" path, but this is simple enough and a big enough win that it felt worth it to me. We then take advantage of this in the `Codec` impls for `Row` (entirely unused and could be remove) and `SourceData` (the main one used in production). To reuse the `Vec<ProtoDatum>` in the `ProtoRow` in the `Ok` variant of `ProtoSourceData`, we take advantage of prost's `Message::merge`, which decodes an encoded proto into an existing message using proto's [merge] semantics. We do this because `Message::merge` seems to be careful and smart about reusing existing allocations when it can. To make this work (we don't actually want "merge" semantics), we first clear the message, so we're merging into an empty one. Unfortunately, the generated code for `ProtoSourceData::clear` is not able to keep the allocations around, so we have to manually "clear" it to the equivalent of an `Ok` with an empty `Row`. [merge]: https://protobuf.dev/programming-guides/encoding/#last-one-wins To reuse the internal allocs of the `Row` inside the `SourceData`, we optimize for the (definitely) common case of `Ok`. This is done manually with some match statements and by exposing a new `Row::decode_from` method. Allocs before (for Row of all Int64s): <img width="855" alt="Screenshot 2024-02-21 at 9 40 30 AM" src="https://github.com/MaterializeInc/materialize/assets/52528/406af726-3237-4a5e-8d18-b71b8bac3f47"> Allocs after: <img width="858" alt="Screenshot 2024-02-21 at 9 40 45 AM" src="https://github.com/MaterializeInc/materialize/assets/52528/6cb16bf3-bede-45e2-954b-a06956182c29"> Benchmarks: roundtrip_int64_decode_legacy time: [1.4475 ms 1.4484 ms 1.4493 ms] change: [-20.305% -20.018% -19.760%] (p = 0.00 < 0.05) Performance has improved. roundtrip_bytes_decode_legacy time: [2.2823 ms 2.2851 ms 2.2888 ms] change: [-18.031% -17.935% -17.793%] (p = 0.00 < 0.05) Performance has improved. roundtrip_string_decode_legacy time: [1.9071 ms 1.9086 ms 1.9104 ms] change: [-21.660% -21.548% -21.397%] (p = 0.00 < 0.05) Performance has improved. ### Motivation * This PR adds a feature that has not yet been specified. ### Tips for reviewer I haven't yet actually hooked this up in the main prod codepath. Will do that now and push it as a commit if it's easy, or a separate PR if it's not. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - <!-- Add release notes here or explicitly state that there are no user-facing behavior changes. -->
chuck-alt-delete
pushed a commit
to MaterializeInc/materialize
that referenced
this pull request
Mar 1, 2024
In the (presumably) common case of Rows made entirely of stack-allocated `Datum` (`Bool`, `Int64`, etc) this reduces the allocations per Row from 2 to 0. `Bytes` could be the same with careful use of prost's `bytes::Bytes` feature. `String` would need something like Parker's MaterializeInc/prost#6. For more complex types (`List`, `AclItem`, etc) this PR is probably a rounding error. This is accomplished by adding a new `decode_into` variant of the "legacy" `Codec::decode` method which enables amortizing allocations across calls in two ways. First, instead of returning `Self`, it takes `&mut Self` as a parameter, allowing for reuse of the internal `Row/SourceData` allocations. Second, it adds a `type Storage` to `Codec` and also passes it in as `&mut Option<Self::Storage>`, allowing for reuse of ProtoRow/ProtoSourceData allocations. `decode_into` gets a default implementation for `Codec`s that can't take advantage of this. This has no effect on the "structured" `Codec` path that we'll hopefully soon be moving to. I'm generally hesitant to spend too much time optimizing the "legacy" path, but this is simple enough and a big enough win that it felt worth it to me. We then take advantage of this in the `Codec` impls for `Row` (entirely unused and could be remove) and `SourceData` (the main one used in production). To reuse the `Vec<ProtoDatum>` in the `ProtoRow` in the `Ok` variant of `ProtoSourceData`, we take advantage of prost's `Message::merge`, which decodes an encoded proto into an existing message using proto's [merge] semantics. We do this because `Message::merge` seems to be careful and smart about reusing existing allocations when it can. To make this work (we don't actually want "merge" semantics), we first clear the message, so we're merging into an empty one. Unfortunately, the generated code for `ProtoSourceData::clear` is not able to keep the allocations around, so we have to manually "clear" it to the equivalent of an `Ok` with an empty `Row`. [merge]: https://protobuf.dev/programming-guides/encoding/#last-one-wins To reuse the internal allocs of the `Row` inside the `SourceData`, we optimize for the (definitely) common case of `Ok`. This is done manually with some match statements and by exposing a new `Row::decode_from` method. Allocs before (for Row of all Int64s): <img width="855" alt="Screenshot 2024-02-21 at 9 40 30 AM" src="https://github.com/MaterializeInc/materialize/assets/52528/406af726-3237-4a5e-8d18-b71b8bac3f47"> Allocs after: <img width="858" alt="Screenshot 2024-02-21 at 9 40 45 AM" src="https://github.com/MaterializeInc/materialize/assets/52528/6cb16bf3-bede-45e2-954b-a06956182c29"> Benchmarks: roundtrip_int64_decode_legacy time: [1.4475 ms 1.4484 ms 1.4493 ms] change: [-20.305% -20.018% -19.760%] (p = 0.00 < 0.05) Performance has improved. roundtrip_bytes_decode_legacy time: [2.2823 ms 2.2851 ms 2.2888 ms] change: [-18.031% -17.935% -17.793%] (p = 0.00 < 0.05) Performance has improved. roundtrip_string_decode_legacy time: [1.9071 ms 1.9086 ms 1.9104 ms] change: [-21.660% -21.548% -21.397%] (p = 0.00 < 0.05) Performance has improved. ### Motivation * This PR adds a feature that has not yet been specified. ### Tips for reviewer I haven't yet actually hooked this up in the main prod codepath. Will do that now and push it as a commit if it's easy, or a separate PR if it's not. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](https://github.com/MaterializeInc/cloud/pull/5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [ ] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - <!-- Add release notes here or explicitly state that there are no user-facing behavior changes. -->
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
work in progress