Introduce the ability to express a correlated subquery #119

jacques-n · 2022-01-04T19:45:10Z

We should introduce a way to express a correlated subquery. We need to come up with the best way to represent this. Some prior art:

Calcite - A filter relation that defines a dynamic variable of it's data that is used in a new expression type that contains a subtree. The subquery subtree references the fields of this dynamic variable to connect the sub and main trees. ([dynamic variable].[field name] e.g. $cor0.P_PARTKEY)
Trino - A relational node type that has a special set symbols from outer tree that are then referenced via special expression assignments from the inner tree. (I think, just a quick review of the code.)
Spark Looks like an expression that contains a subtree, a set of outer expressions and an exprId (that I believe is used inside the inner subtree). This looks/feels a bit like Calcite although Calcite also has the agg functions & groupings.

Other examples people think that should provide inspiration?

The text was updated successfully, but these errors were encountered:

jacques-n · 2022-01-04T21:38:53Z

For reference, the following tpch queries require a subquery (or decorrelation): 2, 4, 11, 15, 16, 17, 18, 20, 21.

cpcloud · 2022-01-24T16:47:51Z

After looking into this a bit there are a number of things to address:

What is the expression type of a subquery (correlated or not) used in, for example, a `WHERE` clause?

I'm not entirely sure whether a subquery should be a distinct relation, or whether it's a special kind of Expression that has a Rel as input.

This has meaningful consequences for the proto implementation, since this would require an expression to reference a relation, which requires expressions to import relations and this isn't allowed in protobuf (protocolbuffers/protobuf#5504). We would have to introduce a level of indirection using something like a relation id that would allow a subquery Expression to refer to a relation.

What is the right way to track outer references at multiple levels?

For example, a subquery with a subquery with a subquery that references the top-level relation.

It looks like in Calcite, there's a unique id for a given relation so there's no need to explicitly track something like the "level" of an outer query.
In Trino, it looks like there's a function that accumulates the outer refs
I'm not sure how spark manages to capture refs above the current outer query's scope

jacques-n · 2022-01-27T01:54:33Z

My inclination is definitely towards a new type of expression. (Let's ignore the proto impl issue for the moment.)

In terms of determining the level, do you think there is a way of defining this with relative numbering? I'd really prefer that over the naming scheme that things like Calcite seem to use.

jacques-n mentioned this issue Jan 6, 2022

SubQuery Support in Substrait #123

Closed

cpcloud mentioned this issue Feb 2, 2022

feat: add subquery representation #134

Merged

jacques-n closed this as completed in #134 Feb 6, 2022

ashvina mentioned this issue Oct 12, 2022

Add support for SQL APPLY operation #357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce the ability to express a correlated subquery #119

Introduce the ability to express a correlated subquery #119

jacques-n commented Jan 4, 2022 •

edited

Loading

jacques-n commented Jan 4, 2022

cpcloud commented Jan 24, 2022

jacques-n commented Jan 27, 2022

Introduce the ability to express a correlated subquery #119

Introduce the ability to express a correlated subquery #119

Comments

jacques-n commented Jan 4, 2022 • edited Loading

jacques-n commented Jan 4, 2022

cpcloud commented Jan 24, 2022

What is the expression type of a subquery (correlated or not) used in, for example, a WHERE clause?

What is the right way to track outer references at multiple levels?

jacques-n commented Jan 27, 2022

jacques-n commented Jan 4, 2022 •

edited

Loading

What is the expression type of a subquery (correlated or not) used in, for example, a `WHERE` clause?