-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Improve performance of TPC-H queries #391
Comments
We don't need to support |
We don't need to support |
Also |
Thanks. I saw those from the |
|
I will take a look at q8 and see why it is not enabled there. |
The error |
Please disable |
I ran TPC-H locally and profiled the sole executor with 4 CPU cores allocated to it. One thing I noticed is that I want to look at the granularity that these operations occur at, and see if we can coalesce metrics on the native side and maybe ship more at once to reduce the JNI overhead. I want to add more metrics to Comet to understand where we're spending time, but the overhead is going to add up. |
What is the problem the feature request solves?
This epic is for tracking progress on improving performance of Comet with our benchmarks derived from TPC-H.
Current status (September 2024)
Features needed to support all queries natively
We do not run all queries fully natively yet due to these missing features:
Planned features that could help in general
Issues that affect multiple queries
Per-Query Tracking
Most of these queries are already faster with Comet enabled. Here are notes on areas where performance could potentially be improved.
lineitem
scans take 2x longer in Comet, but this is offset by avoiding an expensive C2R. The time for native decoding in Comet is longer than the entire scan in Spark.The text was updated successfully, but these errors were encountered: