-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include datafusion in the benchmark #5
Comments
Hi Kevin, thanks for the suggestion! I currently don't have a lot of bandwidth to add a whole new solution to the benchmark, but if you want to open a PR that adds the necessary setup-datafusion.sh, ver-datafusion.sh, upg-datafusion.sh, groupby-datafusion.rs, and join-datafusion.rs then I'd be happy to review. Take a look at files in the other solution folders and that should give you a good idea of what is necessary. Although it may require more steps as datafusion doesn't have any R or python APIs, so you may also need to add/modify some files in _launcher and _helpers See repro.sh for steps to run the benchmark either locally or on an AWS instance. If no errors are thrown for the 0.5GB & 5GB datasets I'd be happy to merge your PR and re-run the benchmark to include results for datafusion. |
There is actually a python api, though it's not documented well: If i have time i'll try to port the benchmarks to it. |
Looks like almost all of this work is done already: https://github.com/apache/arrow-datafusion/tree/main/benchmarks/db-benchmark Would you like to add the PR @kszlim or would you like me to take a stab? |
Go ahead, I don't have the time! |
@MrPowers Was just looking at this again. Looks like the db benchmark for data fusion is here now? |
@kszlim I feel interest to include datafusion in coming benchmarking #13 (comment), is it support streaming (data large than memory scenario)? |
The rust library does, I'm not sure if the python bindings expose it. |
Closed by #18 |
Thanks! |
Datafusion is another stateless query engine/dataframe library I'd be interested in seeing results for.
https://github.com/apache/arrow-datafusion
The text was updated successfully, but these errors were encountered: