Skip to content

Conversation

@etseidl
Copy link
Contributor

@etseidl etseidl commented Oct 30, 2025

Which issue does this PR close?

Rationale for this change

Following the recent improvements in Thrift decoding, the percentage of time spent decoding LEB128 encoded integers has increased.

What changes are included in this PR?

This PR modifies the varint decoder to first test for integers that can be encoded in a single byte (using zig-zag encoding, the maximum int that can be encoded is 63). Many of the fields in the Parquet footer (including all enum values) will be in this range, so optimizing for this frequent occurrence makes sense.

Are these changes tested?

Should be covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 30, 2025
@etseidl
Copy link
Contributor Author

etseidl commented Oct 30, 2025

bench on intel i7-12700K

group                             57_0_0                                 vlq
-----                             ------                                 ---
decode parquet metadata           1.04      4.9±0.04µs        ? ?/sec    1.00      4.7±0.06µs        ? ?/sec
decode parquet metadata (wide)    1.06     17.8±0.20ms        ? ?/sec    1.00     16.8±0.25ms        ? ?/sec
open(default)                     1.04      5.2±0.07µs        ? ?/sec    1.00      5.0±0.05µs        ? ?/sec
open(page index)                  1.12    104.9±0.80µs        ? ?/sec    1.00     93.8±1.07µs        ? ?/sec

@etseidl
Copy link
Contributor Author

etseidl commented Oct 30, 2025

Hmm...github lost a comment last night.

bench on intel macbook

group                             57_0                                   vlq
-----                             ----                                   ---
decode parquet metadata           1.03     14.9±0.43µs        ? ?/sec    1.00     14.5±0.30µs        ? ?/sec
decode parquet metadata (wide)    1.05     52.0±1.26ms        ? ?/sec    1.00     49.5±1.58ms        ? ?/sec
open(default)                     1.03     15.4±0.47µs        ? ?/sec    1.00     15.0±0.25µs        ? ?/sec
open(page index)                  1.13    242.2±9.01µs        ? ?/sec    1.00    215.1±5.72µs        ? ?/sec

@etseidl etseidl marked this pull request as ready for review October 30, 2025 14:34
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Thank you @etseidl -- I queued up a benchmark to confirm

Another thing we could try if we wanted to get all crazy is manually unrolling the loop (at least for the first 4 or 8 bytes) to remove the back branch 🤔

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

BTW it was nice to read https://en.wikipedia.org/wiki/Variable-length_quantity understand this better

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

Many of the fields in the Parquet footer (including all enum values) will be in this range, so optimizing for this frequent occurrence makes sense.

This is a great observation btw

@etseidl
Copy link
Contributor Author

etseidl commented Oct 30, 2025

Many of the fields in the Parquet footer (including all enum values) will be in this range, so optimizing for this frequent occurrence makes sense.

This is a great observation btw

There's actually prior art in the rust compiler. rust-lang/rust#92604

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing vlq_speedup (e78c56d) to 1c8eac1 diff
BENCH_NAME=metadata
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench metadata
BENCH_FILTER=
BENCH_BRANCH_NAME=vlq_speedup
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                             main                                   vlq_speedup
-----                             ----                                   -----------
decode parquet metadata           1.01      9.6±0.07µs        ? ?/sec    1.00      9.5±0.04µs        ? ?/sec
decode parquet metadata (wide)    1.00     43.8±0.71ms        ? ?/sec    1.00     43.9±1.57ms        ? ?/sec
open(default)                     1.00      9.6±0.04µs        ? ?/sec    1.02      9.8±0.04µs        ? ?/sec
open(page index)                  1.10    194.0±1.39µs        ? ?/sec    1.00    176.0±2.40µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

The benchmark results look consistent with an improvment to me -- great work @etseidl

@alamb alamb merged commit bac0cb5 into apache:main Oct 31, 2025
16 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 31, 2025

Another thing we could try if we wanted to get all crazy is manually unrolling the loop (at least for the first 4 or 8 bytes) to remove the back branch 🤔

I got crazy and gave it a try:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants