-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support LATERAL joins #3713
Conversation
038a614
to
c01af79
Compare
Ok, this is ready for a look. Might make sense to sit down together @frankmcsherry so I can walk you through the approach. We pass all the PostgreSQL lateral join tests that I managed to port, save for two that cause "arrangement alarmingly absent" panics that I am hopeful are not my fault. Getting these tests to pass required fixing various unrelated bugs in the SQL planner, which is why there are so many commits in this stack. The performance of many of these queries is abysmal, so we'll just have to warn folks to be really careful with these joins until our optimizer gets better (if ever; lateral joins are a great way to write n^2^2^2^2 queries). |
Some debugging info from one of the panicky queries: https://gist.github.com/benesch/288674bae2fcd6376ceaa2714547ed03 |
test/sqllogictest/table_func.slt
Outdated
| Get materialize.public.x (u5) | ||
| ArrangeBy (#0) | ||
|
||
%1 = | ||
| Get materialize.public.x (u5) | ||
| FlatMap generate_series(1, #0) | ||
| | demand = (#0..#2) | ||
| | demand = (#0, #2) | ||
|
||
%2 = | ||
| Join %0 %1 (= #0 #2) | ||
| | implementation = Differential %1 %0.(#0) | ||
| | demand = (#0, #1, #4) | ||
| Project (#0, #1, #4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a beefy regression of a class we should sort out, somehow. I expect this falls under the general heading of "branch doesn't always need to join the left side again". We should at least decide who's job we want to make doing that, though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the subsequent EXPLAIN
diffs; certainly going to be perf regressions there, pretty substantial to for something like json decoding (FlatMap
is currently stateless; these joins will break that and have us keep the whole input collection around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, everything looks sane here except for some serious planning regressions. We should sort out if we can live with these briefly, or figure out how to fix them first. I think they are nearly fatal for certain workloads (e.g. using TVF for decoding JSON), though I don't think the actual regressions demonstrate that case.
8e872dc
to
fb64d76
Compare
src/sql/src/plan/decorrelate.rs
Outdated
| RelationExpr::Threshold { .. } | ||
| RelationExpr::Negate { .. } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure that these two need to be in here. Do you have a sense for if they need to be, or if they just weren't obviously safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter! Anyway, reworked, see incoming comment.
These are MSSQL-specific and therefore irrelevant to us.
These are MSSQL-specific and therefore irrelevant to us.
It is confusing to use TableFactor::Table for both FROM foo and: FROM foo (a, b, c) With this patch, the latter gets a separate TableFactor::Function variant.
PostgreSQL permits table functions to be preceded by the optional noise word LATERAL, as in: SELECT * FROM foo, LATERAL bar() This does not change the semantics of the query, as table functions are always implicitly planned as if LATERAL was specified. This commit teaches the parser to accept this syntax, but not to retain the distinction.
This allows creating a new nested "scope" in SQL for lateral joins, as required for e.g. FROM a, LATERAL (...) b without going through a scalar expression scope first. (Most forms of correlation in SQL occur in scalar position.)
This function is very much not-generalizable, and gets quite tricky with LATERAL joins. Just inline it in its one call site.
Using the visit_mut helper makes it difficult (impossible?) to increase `depth` for certain RelationExprs, which will be important for LATERAL joins. Just manually recur rather than using the visit_mut helper. There are intended to be no semantic changes in this commit.
A LATERAL join allows the right-hand side of the join to access columns defined on the left-hand side of the join. A simple example from the PostgreSQL docs is the query SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss which is equivalent to: SELECT * FROM foo, bar WHERE bar.id = foo.bar_id; The hope is that LATERAL joins will be useful for expressing "top-k within a group", as in: SELECT * FROM (SELECT DISTINCT cat FROM foo) grp, JOIN LATERAL (SELECT * FROM foo WHERE foo.cat = grp.cat ORDER BY foo.val LIMIT $k)
Ok, I think this is in a place where this can merge. I adjusted the new branch planning to be very, very targeted; see here: d37fe57#diff-ce359fcdca08c6161d027c3fd75d8b39R633-R690 With this new approach, there are (actually!) zero planning regressions—at least as far as our tests can tell. |
Looks good to me! |
This is still pretty panicky on some of the Postgres test cases, but it mostly sorta works, so putting up in case anyone is curious.A LATERAL join allows the right-hand side of the join to access columns
defined on the left-hand side of the join. A simple example from the
PostgreSQL docs is the query
which is equivalent to:
The hope is that LATERAL joins will be useful for expressing "top-k
within a group", as in:
Fixes MaterializeInc/database-issues#1020.
Fixes MaterializeInc/database-issues#1158.
This change is