-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885
Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2885 +/- ##
==========================================
+ Coverage 85.43% 85.62% +0.19%
==========================================
Files 275 279 +4
Lines 49739 50872 +1133
==========================================
+ Hits 42495 43561 +1066
- Misses 7244 7311 +67
|
1dc96ba
to
62afff7
Compare
df214b4
to
9f802d0
Compare
367cb56
to
440a4ae
Compare
I think this is converging on a version I believe would be a good start on decorrelating sub-queries (it doesn't have to support all cases, as long as it fails instead of returning invalid plans). I've spent about 3 weeks on it so far, and I really would appreciate a thorough review before going too much farther. The longer I stare at it without external input, the more blind I will be as to what I missed or could do differently. @andygrove @Dandandan @alamb I'd really appreciate your input (especially failing test case suggestions and readability improvements, but also thoughts on general approach, relationship with other work, and what should be in/out of scope). |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
9fedbb7
to
ee91f24
Compare
Any last thoughts @Dandandan @andygrove ? |
Thanks for the implementation @avantgardnerio awesome addition to DataFusion 😎 |
I think all the items listed as critical are now resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there are any other comments, I'll plan to merge this PR tomorrow
pub fn verify_not_disjunction(predicates: &[&Expr]) -> Result<()> { | ||
struct DisjunctionVisitor {} | ||
|
||
impl ExpressionVisitor for DisjunctionVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks again @avantgardnerio |
Benchmark runs are scheduled for baseline = 117df4d and contender = 7b0f2f8. 7b0f2f8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #159
Closes #160
Closes #163
Closes #168
Closes #171
Closes #172
Closes #175
Rationale for this change
Getting the TPC-H queries to work allows Datafusion users and potential users to compare standard benchmarks to decide if Datafusion is the correct choice for their use case.
What changes are included in this PR?
3 new query optimizers:
pros
and 21 (error in filter push down)cons
the code is verbose,and duplicated - update: it's definitely less verbose, and less duplicated, thoughts on the new methods?variables names are hard to understandfilter_to_join
Are there any user-facing changes?
Correlated subqueries should work
, but likely some will break or return incorrect results (thus the "draft" status).