-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(query): Support use parquet format when spilling #16612
Conversation
Signed-off-by: coldWater <forsaken628@gmail.com>
Signed-off-by: coldWater <forsaken628@gmail.com>
PR Summary
|
Benchmark: dataset: tpch sf100 settings:
sql
Compared with arrow ipc, the optimization of parquet's file size mainly comes from dictionary encoding. parquet's cpu usage is quite high at the same time. There is no significant advantage for highly discrete data. |
Docker Image for PR
|
Signed-off-by: coldWater <forsaken628@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
LGTM, need rebase. |
Signed-off-by: coldWater <forsaken628@gmail.com>
cd8304a
to
22fca28
Compare
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Support use
parquet
format when spilling, you can switch to arrow ipc viaset spilling_file_format = 'arrow'
.Tests
Type of change
This change is