Shrank EncodedLevel to speed up step_in/step_out. #113
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes:
The
EncodedLevel
struct is just large enough that variations of it (likeOption<EncodedLevel>
andResult<EncodedLevel, _>
) can cross LLVM's threshold for usingmemcpy
to move values around. This shows up in profiling as substantial overhead in thestep_in
andstep_out
functions, which move instances ofEncodedLevel
to and from aVec
of container levels.In a particularly deeply nested test file (a single struct with ~800 levels of nested child structs), this caused the reader to be painfully slow. A 15MB file took ~230ms to read through with
next()
/step_in()
/step_out()
, loading each scalar value encountered.This PR shrinks the
EncodedLevel
struct by:usize
offsets (8 bytes apiece onx86_64
) withu8
lengths from which offsets can be calculated if necessary.Vec
of annotations on eachEncodedLevel
with a commonVec
that lives onCursorState
. EachEncodedLevel
now tracks the number of annotations it has pushed onto that communalVec
, allowing the reader to use a singleVec
/allocation across the entire stream. This dropped the size ofEncodedLevel
by a further 23 bytes.Performance test
15MB binary Ion test file containing a single struct with 773 levels of nested values.
Before: 230ms
After: 125ms (-45.65%)
Memory layout
Before
Note that depending on layout/alignment a size of 120 bytes means that,
Option<EncodedValue>
andResult<EncodedValue, _>
can take 128 bytes even though they only add a single discriminator byte.After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.