Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] benchmarks against object storage #1472

Closed
wants to merge 5 commits into from
Closed

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Nov 25, 2024

I'm using this PR as a space to collect some info about running the TPC-H queries against object storage. Goals are to compare

  • Vortex
  • Parquet

Against storage backends

  • S3
  • S3 Express One Zone
  • Google Cloud Storage

Changes

This PR creates a new binary that runs every TPC-H query while logging IOs in our objectstore reader, allowing us to examine both request sizes and request counts for each query.

Parquet and Vortex are each selectable, and the bucket is also configurable.

To run the test that uses S3 Express One Zone, you need to set AWS_S3_EXPRESS=true in your .env or directly in your shell environment

@a10y
Copy link
Contributor Author

a10y commented Nov 25, 2024

Initial results

Attaching two zips, one with TRACE-level logs of executing all TPC-H queries (except q15) using the Vortex Datafusion provider.

s3express_vortex.zip
s3_vortex.zip

Some interesting bits:

Total number of IO's to perform each query:

image

Total time to execute the query (not including table registration)

S3 normal:

image

S3 Express One:

image

@lwwmanning lwwmanning added the do not merge Pull requests that are not intended to merge label Nov 27, 2024
@lwwmanning
Copy link
Member

going to close this for now, we can reopen after #1676 is done

@lwwmanning lwwmanning closed this Dec 17, 2024
@robert3005 robert3005 deleted the aduffy/tpch-objectstore branch January 29, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Pull requests that are not intended to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants