Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jun 2, 2025

Which issue does this PR close?

Rationale for this change

@Dandandan noted on https://github.com/apache/arrow-rs/pull/7513/files#r2119136807:

Looking at the code - this doesn't actually really use the TrustedLen optimization.
I think it could use the same optimization as MutableBuffer::from_trusted_len

What changes are included in this PR?

Use the trusted len iter optimization for extendig PrimitiveBuffers

Are there any user-facing changes?

The only place I can see this used now is take with fixed size arrays. Thus this may make that somewhat faster (I will run some benchmarks)

I think the real value will be in the APIs sketched out in #7513 to append to StringViewBuilder faster

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 2, 2025
@alamb alamb force-pushed the alamb/from_trusted_len branch 2 times, most recently from 15e7ba9 to 94b18ac Compare June 2, 2025 13:50
@alamb alamb force-pushed the alamb/from_trusted_len branch from 94b18ac to 91d1abb Compare June 2, 2025 13:58
@alamb alamb changed the title Optimize PrimitiveBuilder::append_trusted_len_iter Optimize PrimitiveBuilder::append_trusted_len_iter Jun 2, 2025
@alamb alamb marked this pull request as ready for review June 2, 2025 14:27
alamb added a commit that referenced this pull request Jun 2, 2025
# Which issue does this PR close?

- part of #7591

# Rationale for this change

I would like a benchmark that shows improvements in
#7590

`take` with FixedSizeBinary  is the only one I know of now

# What changes are included in this PR?

Add `fsb` to the take_kernel benchmarks


# Are there any user-facing changes?

No, this is a benchmark
@alamb
Copy link
Contributor Author

alamb commented Jun 2, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/from_trusted_len (64c6cdd) to 3681540 diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_from_trusted_len
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Jun 2, 2025

🤖: Benchmark completed

Details

group                                                                     alamb_from_trusted_len                 main
-----                                                                     ----------------------                 ----
take bool 1024                                                            1.00   1396.5±2.16ns        ? ?/sec    1.00   1396.5±2.56ns        ? ?/sec
take bool 512                                                             1.00    725.6±0.77ns        ? ?/sec    1.00    726.5±1.06ns        ? ?/sec
take bool null indices 1024                                               1.00      2.5±0.05µs        ? ?/sec    1.08      2.7±0.03µs        ? ?/sec
take bool null values 1024                                                1.06      3.0±0.00µs        ? ?/sec    1.00      2.8±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.00      5.5±0.07µs        ? ?/sec    1.06      5.9±0.04µs        ? ?/sec
take check bounds i32 1024                                                1.00   1159.0±2.79ns        ? ?/sec    1.01  1172.3±14.19ns        ? ?/sec
take check bounds i32 512                                                 1.06    632.1±1.44ns        ? ?/sec    1.00    597.0±7.56ns        ? ?/sec
take i32 1024                                                             1.00    580.2±1.32ns        ? ?/sec    1.00    579.9±0.74ns        ? ?/sec
take i32 512                                                              1.00    380.3±0.53ns        ? ?/sec    1.00    378.4±0.39ns        ? ?/sec
take i32 null indices 1024                                                1.00    788.6±0.98ns        ? ?/sec    1.01    795.1±2.44ns        ? ?/sec
take i32 null values 1024                                                 1.03      2.1±0.00µs        ? ?/sec    1.00      2.1±0.00µs        ? ?/sec
take i32 null values null indices 1024                                    1.00      3.2±0.04µs        ? ?/sec    1.07      3.4±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      8.0±0.01µs        ? ?/sec    1.00      8.0±0.05µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.05     10.0±0.07µs        ? ?/sec    1.00      9.5±0.12µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.02     20.6±0.10µs        ? ?/sec    1.00     20.2±0.08µs        ? ?/sec
take str 1024                                                             1.01     10.8±0.05µs        ? ?/sec    1.00     10.7±0.04µs        ? ?/sec
take str 512                                                              1.03      5.5±0.02µs        ? ?/sec    1.00      5.3±0.01µs        ? ?/sec
take str null indices 1024                                                1.00      9.2±0.03µs        ? ?/sec    1.23     11.2±0.04µs        ? ?/sec
take str null indices 512                                                 1.00      3.5±0.02µs        ? ?/sec    1.43      5.0±0.03µs        ? ?/sec
take str null values 1024                                                 1.00     10.1±0.04µs        ? ?/sec    1.11     11.2±0.06µs        ? ?/sec
take str null values null indices 1024                                    1.00      9.1±0.06µs        ? ?/sec    1.04      9.5±0.05µs        ? ?/sec
take stringview 1024                                                      1.12    854.0±0.92ns        ? ?/sec    1.00    765.4±2.47ns        ? ?/sec
take stringview 512                                                       1.00    473.2±1.04ns        ? ?/sec    1.02    483.8±3.55ns        ? ?/sec
take stringview null indices 1024                                         1.02  1390.9±23.41ns        ? ?/sec    1.00   1359.2±7.85ns        ? ?/sec
take stringview null indices 512                                          1.04    689.5±1.62ns        ? ?/sec    1.00    662.0±1.13ns        ? ?/sec
take stringview null values 1024                                          1.01      2.1±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take stringview null values null indices 1024                             1.00      3.7±0.06µs        ? ?/sec    1.05      3.8±0.07µs        ? ?/sec

@alamb alamb marked this pull request as draft June 2, 2025 17:43
@alamb
Copy link
Contributor Author

alamb commented Jun 2, 2025

I need to spend some time making sure it is actually faster

@alamb alamb closed this Jun 9, 2025
@alamb
Copy link
Contributor Author

alamb commented Jun 9, 2025

I think we are going to go with a different approach (use a Vec instead)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize PrimitiveBuilder::append_trusted_len_iter

1 participant