Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use initial capacity for interner hashmap #2272

Merged
merged 1 commit into from
Aug 1, 2022

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Aug 1, 2022

Which issue does this PR close?

Closes #2273 2273

Rationale for this change

This saves rehashing / reshuffling the hashmap when there are quite some distinct values.

This is quite a bit faster:

 Benchmarking write_batch primitive/4096 values primitive:
                        time:   [1.0495 ms 1.0520 ms 1.0550 ms]
                        thrpt:  [167.22 MiB/s 167.70 MiB/s 168.09 MiB/s]
                 change:
                        time:   [-24.025% -22.929% -21.823%] (p = 0.00 < 0.05)
                        thrpt:  [+27.916% +29.751% +31.622%]
                        Performance has improved.
Benchmarking write_batch primitive/4096 values primitive non-null:     
                        time:   [826.87 µs 827.46 µs 828.10 µs]
                        thrpt:  [208.91 MiB/s 209.07 MiB/s 209.22 MiB/s]
                 change:
                        time:   [-36.274% -36.156% -36.056%] (p = 0.00 < 0.05)
                        thrpt:  [+56.388% +56.633% +56.922%]
                        Performance has improved.

What changes are included in this PR?

Adds an initial capacity for the hashmap.

Are there any user-facing changes?

@Dandandan Dandandan requested a review from tustvold August 1, 2022 22:17
@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 1, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@codecov-commenter
Copy link

Codecov Report

Merging #2272 (a2c8f0a) into master (d4f038a) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2272      +/-   ##
==========================================
- Coverage   82.28%   82.28%   -0.01%     
==========================================
  Files         245      245              
  Lines       62688    62688              
==========================================
- Hits        51583    51582       -1     
- Misses      11105    11106       +1     
Impacted Files Coverage Δ
parquet/src/util/interner.rs 91.66% <100.00%> (ø)
...rquet/src/arrow/record_reader/definition_levels.rs 87.34% <0.00%> (-1.69%) ⬇️
parquet/src/arrow/schema.rs 96.76% <0.00%> (-0.18%) ⬇️
arrow/src/datatypes/datatype.rs 62.61% <0.00%> (+0.31%) ⬆️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.68%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@tustvold tustvold merged commit b4fa47d into apache:master Aug 1, 2022
@ursabot
Copy link

ursabot commented Aug 1, 2022

Benchmark runs are scheduled for baseline = d4f038a and contender = b4fa47d. b4fa47d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use initial capacity for interner hashmap
4 participants