Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ClickBench benchmarks with DataFusion 43.0.0 #13099

Open
1 task done
alamb opened this issue Oct 24, 2024 · 3 comments
Open
1 task done

Update ClickBench benchmarks with DataFusion 43.0.0 #13099

alamb opened this issue Oct 24, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Oct 24, 2024

Is your feature request related to a problem or challenge?

Like #11567

Requires

Once DataFusion 43.0.0 is released, It would be great to update ClickBench https://benchmark.clickhouse.com/ with runs from the latest version. It looks like we are still reporting numbers for DataFusion 40 and there have been significant improvements since then. See for more details:

Describe the solution you'd like

Perhaps we can follow the model of ClickHouse/ClickBench#210 (thanks @pmcgleenon )

We will also need to update DataFusion to apply the new binary_as_string option added by @goldmedal in #12816. TLDR is that we need to update the create table statements to have the OPTIONS ('binary_as_string' 'true') clause

https://github.com/ClickHouse/ClickBench/blob/main/datafusion/create_partitioned.sql

CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'partitioned'
OPTIONS ('binary_as_string' 'true');

Note this is the same as the DuckDB runner, as explained in #12788

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Oct 24, 2024
@alamb alamb changed the title Update ClickBench benchmarks with DataFusion 43 Update ClickBench benchmarks with DataFusion `43.0.0 Oct 24, 2024
@alamb alamb changed the title Update ClickBench benchmarks with DataFusion `43.0.0 Update ClickBench benchmarks with DataFusion 43.0.0 Oct 24, 2024
@pmcgleenon
Copy link
Contributor

Hi @alamb are we good to proceed with this now that 43.0 has been released (#13254)?

@alamb
Copy link
Contributor Author

alamb commented Nov 13, 2024

Hi @pmcgleenon

I just double checked and this should be good to go now -- the clickbench runner uses datafusion-cli rather than the python bindings (which haven't yet released a version 43.0.0)

Thank you so much

@pmcgleenon
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants