-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2885 +/- ##
==========================================
+ Coverage 85.43% 85.62% +0.19%
==========================================
Files 275 279 +4
Lines 49739 50872 +1133
==========================================
+ Hits 42495 43561 +1066
- Misses 7244 7311 +67
|
1dc96ba to
62afff7
Compare
df214b4 to
9f802d0
Compare
367cb56 to
440a4ae
Compare
|
I think this is converging on a version I believe would be a good start on decorrelating sub-queries (it doesn't have to support all cases, as long as it fails instead of returning invalid plans). I've spent about 3 weeks on it so far, and I really would appreciate a thorough review before going too much farther. The longer I stare at it without external input, the more blind I will be as to what I missed or could do differently. @andygrove @Dandandan @alamb I'd really appreciate your input (especially failing test case suggestions and readability improvements, but also thoughts on general approach, relationship with other work, and what should be in/out of scope). |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
9fedbb7 to
ee91f24
Compare
|
Any last thoughts @Dandandan @andygrove ? |
|
Thanks for the implementation @avantgardnerio awesome addition to DataFusion 😎 |
|
I think all the items listed as critical are now resolved. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are any other comments, I'll plan to merge this PR tomorrow
datafusion/optimizer/src/utils.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Thanks again @avantgardnerio |
|
Benchmark runs are scheduled for baseline = 117df4d and contender = 7b0f2f8. 7b0f2f8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #159
Closes #160
Closes #163
Closes #168
Closes #171
Closes #172
Closes #175
Rationale for this change
Getting the TPC-H queries to work allows Datafusion users and potential users to compare standard benchmarks to decide if Datafusion is the correct choice for their use case.
What changes are included in this PR?
3 new query optimizers:
pros
and 21 (error in filter push down)cons
the code is verbose,and duplicated - update: it's definitely less verbose, and less duplicated, thoughts on the new methods?variables names are hard to understandfilter_to_joinAre there any user-facing changes?
Correlated subqueries should work
, but likely some will break or return incorrect results (thus the "draft" status).