Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: support recursive CTEs #41368

Merged
merged 3 commits into from
Oct 10, 2019
Merged

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Oct 6, 2019

opt: minor refactoring in union optbuilder

This change pulls up the logic to cross-check column types, to be used
for WITH RECURSIVE. The resulting code is more straightforward to
follow.

Release note: None

opt: use Presentation for cteSource.cols in optbuilder

The columns in cteSource are only used for their IDs and names,
which is better represented by a Presentation. This will make things
easier for recursive CTEs.

We also make the "no columns" error more general since our syntax
allows more types of statements inside WITH.

Release note: None

opt: support recursive CTEs

A recursive CTE is of the form:

WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)

Recursive CTE evaluation (paraphrased from postgres docs):

  1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
  2. So long as the working table is not empty:
    • evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

  • optbuilder: We build the recursive query using a cteSource that
    corresponds to the "working table". This code turned out to be
    tricky, in part because we want to allow non-recursive CTEs even if
    RECURSIVE is used (which is necessary when defining multiple CTEs,
    some of which are not recursive).

  • execution: the execution operator is somewhat similar to
    applyJoinNode (but simpler). The execbuilder provides a
    callback that can be used to regenerate the right-hand-side plan
    each time; this callback takes a reference to the working table (as
    a bufferNode).

  • execbuilder: to implement the callback mentioned above, we create
    an "inner" builder inside the callback which uses the same factory
    but is otherwise independent from the "outer" builder.

Fixes #21085.

Release note (sql change): WITH RECURSIVE is now supported.

@RaduBerinde RaduBerinde requested a review from a team as a code owner October 6, 2019 22:55
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member Author

It's worth mentioning that postgres also supports a variant with UNION instead of UNION ALL, where all new rows are deduplicated against previously emitted rows. We don't implement that (at least not here). SQL Server doesn't either AFAICT

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! Execution code looks good to me.

Can #21085 can now be closed or not yet?

Reviewed 2 of 3 files at r1, 3 of 6 files at r2, 35 of 35 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, and @rytaft)


pkg/sql/recursive_cte.go, line 46 at r3 (raw file):

	// "working" table).
	workingRows     *rowcontainer.RowContainer
	lastWorkingRows *rowcontainer.RowContainer

This field appears to be unused.


pkg/sql/opt/exec/factory.go, line 487 at r3 (raw file):

	// ConstructRecursiveCTE constructs a node whose input can be referenced
	// from elsewhere in the query. The no
	ConstructRecursiveCTE(initial Node, fn RecursiveCTEIterationFn, label string) (Node, error)

[nit]: (above) unfinished comment or a few dangling words.


pkg/sql/opt/ops/relational.opt, line 874 at r3 (raw file):

#      "working table" for the next iteration.
[Relational]
define RecursiveCTE {

[nit]: (above) s/WIthID/WithID/.


pkg/sql/opt/optbuilder/union.go, line 130 at r3 (raw file):

// but Postgres is more lenient:
// http://www.postgresql.org/docs/9.5/static/typeconv-union-case.html.
func (b *Builder) checkTypesMatch(

[nit]: (a few lines above) in one place probably s/propagateToLeft/propagateToRight/ (also, maybe s/1/true/).


pkg/sql/opt/optbuilder/with.go, line 11 at r3 (raw file):

// licenses/APL.txt.

package optbuilder

[nit]: (above) I think s/2018/2019/.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Yeah, updated the commit message.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @rytaft, and @yuzefovich)


pkg/sql/recursive_cte.go, line 46 at r3 (raw file):

Previously, yuzefovich wrote…

This field appears to be unused.

Fixed. Wonder why the linter didn't complain..


pkg/sql/opt/exec/factory.go, line 487 at r3 (raw file):

Previously, yuzefovich wrote…

[nit]: (above) unfinished comment or a few dangling words.

Done.


pkg/sql/opt/ops/relational.opt, line 874 at r3 (raw file):

Previously, yuzefovich wrote…

[nit]: (above) s/WIthID/WithID/.

Done.


pkg/sql/opt/optbuilder/union.go, line 130 at r3 (raw file):

Previously, yuzefovich wrote…

[nit]: (a few lines above) in one place probably s/propagateToLeft/propagateToRight/ (also, maybe s/1/true/).

Done.


pkg/sql/opt/optbuilder/with.go, line 11 at r3 (raw file):

Previously, yuzefovich wrote…

[nit]: (above) I think s/2018/2019/.

Done.

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 6 of 6 files at r2, 11 of 35 files at r3, 1 of 38 files at r4, 3 of 6 files at r5, 35 of 35 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/recursive_cte.go, line 26 at r6 (raw file):

//     a "working" table.
//  2. So long as the working table is not empty:
//     - evaluate the recursive query, substituting the current contents of

same here, the formatting here seems a little odd to my eye


pkg/sql/logictest/testdata/logic_test/with-opt, line 0 at r6 (raw file):
This is so so tight

Reviewable seems to have become convinced this is a binary file—any idea why? In the past I've seen this happen when we had obscure unicode characters in a file?


pkg/sql/opt/exec/execbuilder/relational.go, line 1455 at r6 (raw file):

		catalog:   b.catalog,
		evalCtx:   b.evalCtx,
		withExprs: b.withExprs[:len(b.withExprs):len(b.withExprs)],

this is very fancy, maybe deserving of a comment of what's going on here? took me a bit to think about it (I think what's happening is that you want the instantiations of this template to all be guaranteed to re-allocate if they add any new WITHs?)


pkg/sql/opt/optbuilder/with.go, line 45 at r6 (raw file):

	//     a "working" table.
	//  2. So long as the working table is not empty:
	//     - evaluate the recursive query, substituting the current contents of

nit: looks like this maybe wants to be either two bullet points or not a bulleted list at all?


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

	// annoying cases like `SELECT 1 UNION ALL SELECT 2`.
	referenced := false
	cteSrc.onRef = func() {

should this error if a recursive CTE is referenced >once (maybe you do this elsewhere I haven't seen yet)?

d=# with recursive foo as (values (1) union all (select * from foo union all select * from foo)) select * from foo;
2019-10-07 16:17:09.279 EDT [2703] ERROR:  recursive reference to query "foo" must not appear more than once at character 88
2019-10-07 16:17:09.279 EDT [2703] STATEMENT:  with recursive foo as (values (1) union all (select * from foo union all select * from foo)) select * from foo;
ERROR:  recursive reference to query "foo" must not appear more than once
LINE 1: ...on all (select * from foo union all select * from foo)) sele...

pkg/sql/opt/optbuilder/with.go, line 167 at r6 (raw file):

// getCTECols renames a presentation for the scope, renaming the columns to
// those provided in the AliasClause (if any). Throws an error if there is a
// mistmatch in the number of columns.

nit: s/mistmatch/mismatch/


pkg/sql/opt/optbuilder/testdata/with, line 920 at r6 (raw file):

with &2 (search_graph)
 ├── columns: id:25(int) link:26(int) data:27(string) depth:28(int) path:29(int[]) cycle:30(bool)
 ├── recursive-c-t-e

haha, the kebab casing of this is kind of goofy. nbd but maybe calling it WithRecursive would avoid this?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

should this error if a recursive CTE is referenced >once (maybe you do this elsewhere I haven't seen yet)?

d=# with recursive foo as (values (1) union all (select * from foo union all select * from foo)) select * from foo;
2019-10-07 16:17:09.279 EDT [2703] ERROR:  recursive reference to query "foo" must not appear more than once at character 88
2019-10-07 16:17:09.279 EDT [2703] STATEMENT:  with recursive foo as (values (1) union all (select * from foo union all select * from foo)) select * from foo;
ERROR:  recursive reference to query "foo" must not appear more than once
LINE 1: ...on all (select * from foo union all select * from foo)) sele...

Hm. I don't see a problem with referencing it more than one time.. Am I missing something?


pkg/sql/opt/optbuilder/testdata/with, line 920 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

haha, the kebab casing of this is kind of goofy. nbd but maybe calling it WithRecursive would avoid this?

It's not like the With operator though, in fact it is always built under a With..

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, RaduBerinde wrote…

Hm. I don't see a problem with referencing it more than one time.. Am I missing something?

Can't some sketchy stuff happen if you, say, join the recursive reference with itself, since the working table only contains a delta?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

Can't some sketchy stuff happen if you, say, join the recursive reference with itself, since the working table only contains a delta?

In the recursive query, recursive references resolve to the current working table. So you'd be joining the delta with the delta.

IMO this syntax is pretty misleading. There isn't any recursion going on here, it's actually iteration.

If we just want to detect this case so that folks don't shoot themselves in the foot, I'm for it. I guess you can always use WITH inside the recursive query if you need to reference it multiple times (hm, does the syntax allow that?)

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/with-opt, line at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

This is so so tight

Reviewable seems to have become convinced this is a binary file—any idea why? In the past I've seen this happen when we had obscure unicode characters in a file?

Probably from the center dot unicode (though we use that in logictests). Maybe there's too many of them.


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, RaduBerinde wrote…

In the recursive query, recursive references resolve to the current working table. So you'd be joining the delta with the delta.

IMO this syntax is pretty misleading. There isn't any recursion going on here, it's actually iteration.

If we just want to detect this case so that folks don't shoot themselves in the foot, I'm for it. I guess you can always use WITH inside the recursive query if you need to reference it multiple times (hm, does the syntax allow that?)

Yeah this works in pg:

with recursive cte(x) as (
    select 1
  union all
    (with cte2 as (select * from cte) select a.x from cte2 AS a,cte2 AS b)
) select * from cte;

I'll add the restriction and add some tests.

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, @rytaft, and @yuzefovich)


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, RaduBerinde wrote…

In the recursive query, recursive references resolve to the current working table. So you'd be joining the delta with the delta.

IMO this syntax is pretty misleading. There isn't any recursion going on here, it's actually iteration.

If we just want to detect this case so that folks don't shoot themselves in the foot, I'm for it. I guess you can always use WITH inside the recursive query if you need to reference it multiple times (hm, does the syntax allow that?)

No real opinion from me, besides the fact that Postgres disallows it. If you think it would be worth allowing I don't object.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @rytaft, and @yuzefovich)


pkg/sql/recursive_cte.go, line 26 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

same here, the formatting here seems a little odd to my eye

Done.


pkg/sql/opt/exec/execbuilder/relational.go, line 1455 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

this is very fancy, maybe deserving of a comment of what's going on here? took me a bit to think about it (I think what's happening is that you want the instantiations of this template to all be guaranteed to re-allocate if they add any new WITHs?)

Done.


pkg/sql/opt/optbuilder/with.go, line 127 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

No real opinion from me, besides the fact that Postgres disallows it. If you think it would be worth allowing I don't object.

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: very nice! Just some nits.

Reviewed 2 of 3 files at r1, 38 of 38 files at r4, 6 of 6 files at r5, 26 of 35 files at r6, 3 of 6 files at r8, 35 of 35 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, @RaduBerinde, and @yuzefovich)


pkg/sql/opt_limits.go, line 252 at r6 (raw file):

	case *applyJoinNode, *lookupJoinNode, *zigzagJoinNode, *saveTableNode:
		// These nodes are only planned by the optimizer.

Not related to this PR, but isn't this true of all nodes now?


pkg/sql/recursive_cte.go, line 122 at r6 (raw file):

		return false, err
	}
	n.nextRowIdx = 1

Is there a reason to use a 1-indexed rather than 0-indexed RowIdx?


pkg/sql/logictest/testdata/logic_test/with-opt, line at r6 (raw file):

This is so so tight

💯


pkg/sql/opt/exec/factory.go, line 482 at r6 (raw file):

	// ConstructScanBuffer constructs a node which refers to a node constructed by
	// ConstructBuffer or ConstructRecursiveCTEBuffer.

Should this be ConstructRecursiveCTE?


pkg/sql/opt/exec/execbuilder/relational.go, line 1442 at r6 (raw file):

	}

	// Renumber the columns so they match the columns expected by recursive query.

[nit] recursive query -> the recursive query


pkg/sql/opt/ops/relational.opt, line 892 at r6 (raw file):

    # InitialCols are the columns produced by the initial expression.
    InitialCols ColList

[nit] extra line


pkg/sql/opt/ops/relational.opt, line 900 at r6 (raw file):

    # OutCols are the columns produced by the RecursiveCTE operator; they map
    # 1-1 to InitialCols and to RecursiveCols.
    OutCols ColList

Maybe add a comment about why you need separate ColLists for all three?


pkg/sql/opt/optbuilder/union.go, line 151 at r4 (raw file):

		if l.typ.Family() == types.UnknownFamily && tolerateUnknownLeft {
			propagateToLeft = true

This could cause an unnecessary call to propagateTypes (creating an unnecessary projection) if the right side is also unknown


pkg/sql/opt/optbuilder/with.go, line 21 at r5 (raw file):

)

// getCTECols renames a presentation for the scope, renaming the columns to

renames a presentation -> returns a presentation?


pkg/sql/opt/optbuilder/with.go, line 24 at r5 (raw file):

// those provided in the AliasClause (if any). Throws an error if there is a
// mistmatch in the number of columns.
func (b *Builder) getCTECols(cteScope *scope, name tree.AliasClause) physical.Presentation {

[nit] it's a bit weird to call the AliasClause "name", since it actually contains multiple names.


pkg/sql/opt/optbuilder/with.go, line 94 at r6 (raw file):

	b.dropOrderingAndExtraCols(initialScope)

	// The properties of the binding tricky: the recursive expression is invoked

[nit] tricky -> are tricky


pkg/sql/opt/optbuilder/with.go, line 106 at r6 (raw file):

	// the recursive query. We don't have anything better though.
	bindingProps.Stats.RowCount = initialScope.expr.Relational().Stats.RowCount
	cteSrc.bindingProps = bindingProps

feels like this stuff with bindingProps should go into a helper function


pkg/sql/opt/optbuilder/with.go, line 205 at r6 (raw file):

) (initial, recursive *tree.Select, ok bool) {
	sel, ok := stmt.(*tree.Select)
	if !ok || sel.With != nil || sel.OrderBy != nil || sel.Limit != nil {

can you add a comment about the need for checks for With, OrderBy, and Limit?


pkg/sql/opt/optbuilder/testdata/subquery, line 2452 at r9 (raw file):

error (22023): unsupported comparison operator: <decimal> IN <tuple{int}>

build

What is this testing?


pkg/sql/parser/sql.y, line 6041 at r9 (raw file):

| WITH RECURSIVE cte_list
  {
    /* SKIP DOC */

why skip doc?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @justinj, @rytaft, and @yuzefovich)


pkg/sql/opt_limits.go, line 252 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Not related to this PR, but isn't this true of all nodes now?

Yes. There is a lot to clean up after the HP removal, I will get to it.


pkg/sql/recursive_cte.go, line 122 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Is there a reason to use a 1-indexed rather than 0-indexed RowIdx?

It's easier to increment at the start of Next than at the exit (has multiple exit points). It's also consistent with bufferNode and scanBufferNode.


pkg/sql/opt/exec/factory.go, line 482 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Should this be ConstructRecursiveCTE?

Fixed.


pkg/sql/opt/exec/execbuilder/relational.go, line 1442 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] recursive query -> the recursive query

Done.


pkg/sql/opt/ops/relational.opt, line 892 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] extra line

Done.


pkg/sql/opt/ops/relational.opt, line 900 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Maybe add a comment about why you need separate ColLists for all three?

Done.


pkg/sql/opt/optbuilder/union.go, line 151 at r4 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This could cause an unnecessary call to propagateTypes (creating an unnecessary projection) if the right side is also unknown

Unknown is Equivalent to Unknown (see check above). Added a comment.


pkg/sql/opt/optbuilder/with.go, line 21 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

renames a presentation -> returns a presentation?

Done.


pkg/sql/opt/optbuilder/with.go, line 24 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it's a bit weird to call the AliasClause "name", since it actually contains multiple names.

It is, but that's the name of the field in tree.CTE we're using so I'd rather stick to that.


pkg/sql/opt/optbuilder/with.go, line 94 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] tricky -> are tricky

Done.


pkg/sql/opt/optbuilder/with.go, line 106 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

feels like this stuff with bindingProps should go into a helper function

I think it would make it even harder to describe what's going on.


pkg/sql/opt/optbuilder/with.go, line 205 at r6 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

can you add a comment about the need for checks for With, OrderBy, and Limit?

Done.


pkg/sql/opt/optbuilder/testdata/subquery, line 2452 at r9 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

What is this testing?

.. wow. I think it was something I was playing with related to #41187, removed.


pkg/sql/parser/sql.y, line 6041 at r9 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

why skip doc?

Good catch, fixed!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 39 files at r10, 3 of 6 files at r11, 41 of 41 files at r12.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @justinj, and @yuzefovich)


pkg/sql/opt_limits.go, line 252 at r6 (raw file):

Previously, RaduBerinde wrote…

Yes. There is a lot to clean up after the HP removal, I will get to it.

👍


pkg/sql/opt/optbuilder/union.go, line 151 at r4 (raw file):

Previously, RaduBerinde wrote…

Unknown is Equivalent to Unknown (see check above). Added a comment.

Thanks!

@RaduBerinde RaduBerinde force-pushed the recursive-cte-2 branch 2 times, most recently from 14086f1 to c1b3b1e Compare October 9, 2019 14:13
@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 9, 2019

Build failed

This change pulls up the logic to cross-check column types, to be used
for `WITH RECURSIVE`. The resulting code is more straightforward to
follow.

Release note: None
The columns in `cteSource` are only used for their IDs and names,
which is better represented by a `Presentation`. This will make things
easier for recursive CTEs.

We also make the "no columns" error more general since our syntax
allows more types of statements inside WITH.

Release note: None
A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference;
    - emit all resulting rows, and save them as the next iteration's
      working table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes cockroachdb#21085.

Release note (sql change): WITH RECURSIVE is now supported (only the
UNION ALL variant).
@RaduBerinde
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Oct 10, 2019
41368: opt: support recursive CTEs r=RaduBerinde a=RaduBerinde

#### opt: minor refactoring in union optbuilder

This change pulls up the logic to cross-check column types, to be used
for `WITH RECURSIVE`. The resulting code is more straightforward to
follow.

Release note: None

#### opt: use Presentation for cteSource.cols in optbuilder

The columns in `cteSource` are only used for their IDs and names,
which is better represented by a `Presentation`. This will make things
easier for recursive CTEs.

We also make the "no columns" error more general since our syntax
allows more types of statements inside WITH.

Release note: None

#### opt: support recursive CTEs

A recursive CTE is of the form:
```
WITH RECURSIVE cte AS (
    <initial query>
  UNION ALL
    <recursive query>
)
```

Recursive CTE evaluation (paraphrased from postgres docs):
 1. Evaluate the initial query; emit the results and also save them in
    a "working" table.
 2. So long as the working table is not empty:
    - evaluate the recursive query, substituting the current contents of
      the working table for the recursive self-reference. Emit all
      resulting rows, and save them into the next iteration's working
      table.

This change adds optimizer and execution support for recursive CTEs.

Some notes for the various components:

 - optbuilder:  We build the recursive query using a `cteSource` that
   corresponds to the "working table". This code turned out to be
   tricky, in part because we want to allow non-recursive CTEs even if
   RECURSIVE is used (which is necessary when defining multiple CTEs,
   some of which are not recursive).

 - execution: the execution operator is somewhat similar to
   `applyJoinNode` (but simpler). The execbuilder provides a
   callback that can be used to regenerate the right-hand-side plan
   each time; this callback takes a reference to the working table (as
   a `bufferNode`).

 - execbuilder: to implement the callback mentioned above, we create
   an "inner" builder inside the callback which uses the same factory
   but is otherwise independent from the "outer" builder.

Fixes #21085.

Release note (sql change): WITH RECURSIVE is now supported.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Oct 10, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: WITH RECURSIVE (recursive common table expressions)
5 participants