-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a config to force using string view in benchmark #11514
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
documentation
Improvements or additions to documentation
core
Core DataFusion crate
sqllogictest
SQL Logic Tests (.slt)
labels
Jul 17, 2024
XiangpengHao
changed the title
add a config to force using string view in benchmark
Add a config to force using string view in benchmark
Jul 17, 2024
alamb
approved these changes
Jul 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @XiangpengHao
Since this is targeting a branch, I think it is fine to merge as is and we can improve it later. I did leave some suggestions on how to rename the config option
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb
approved these changes
Jul 18, 2024
alamb
added a commit
that referenced
this pull request
Jul 29, 2024
… some ClickBench queries (not on by default) (#11667) * Pin to pre-release version of arrow 52.2.0 * Update for deprecated method * Add a config to force using string view in benchmark (#11514) * add a knob to force string view in benchmark * fix sql logic test * update doc * fix ci * fix ci only test * Update benchmarks/src/util/options.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update datafusion/common/src/config.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * update tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Add String view helper functions (#11517) * add functions * add tests for hash util * Add ArrowBytesViewMap and ArrowBytesViewSet (#11515) * Update `string-view` branch to arrow-rs main (#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * merge * update cast * consistent dep * fix ci * add more tests * make doc happy * update new implementation * fix bug * avoid unused dep * update dep * update * fix cargo check * update doc * pick up the comments change again --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Enable `GroupValueBytesView` for aggregation with StringView types (#11519) * add functions * Update `string-view` branch to arrow-rs main (#10966) * Pin to arrow main * Fix clippy with latest arrow * Uncomment test that needs new arrow-rs to work * Update datafusion-cli Cargo.lock * Update Cargo.lock * tapelo * merge * update cast * consistent dep * fix ci * avoid unused dep * update dep * update * fix cargo check * better group value view aggregation * update --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Initial support for regex_replace on `StringViewArray` (#11556) * initial support for string view regex * update tests * Add support for Utf8View for date/temporal codepaths (#11518) * Add StringView support for date_part and make_date funcs * run cargo update in datafusion-cli * cargo fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * GC `StringViewArray` in `CoalesceBatchesStream` (#11587) * gc string view when appropriate * make clippy happy * address comments * make doc happy * update style * Add comments and tests for gc_string_view_batch * better herustic * update test * Update datafusion/physical-plan/src/coalesce_batches.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * [Bug] fix bug in return type inference of `utf8_to_int_type` (#11662) * fix bug in return type inference * update doc * add tests --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Fix clippy * Increase ByteViewMap block size to 2MB (#11674) * better default block size * fix related test * Change `--string-view` to only apply to parquet formats (#11663) * use inferenced schema, don't load schema again * move config to parquet-only * update * update * better format * format * update * Implement native support StringView for character length (#11676) * native support for character length * Update datafusion/functions/src/unicode/character_length.rs --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Remove uneeded patches * cargo fmt --------- Co-authored-by: Xiangpeng Hao <haoxiangpeng123@gmail.com> Co-authored-by: Xiangpeng Hao <me@haoxp.xyz> Co-authored-by: Andrew Duffy <a10y@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
core
Core DataFusion crate
documentation
Improvements or additions to documentation
sqllogictest
SQL Logic Tests (.slt)
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(targets the string-view2 branch)
Which issue does this PR close?
Part of #10918
Rationale for this change
Add a config knob to force using string view in benchmark.
Note that current support for StringView is very limited, and I'm filing many subsequent PRs to improve performance.
So this knob is used for testing/debugging/profiling purpose only, you don't want to enable this unless you also work on string view integration.
You'll see bad performance numbers for almost every benchmark query, those are not final numbers.
To run a benchmark with string view:
To run without string view, remove the
--string-view
arg.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?