sql: single-use common table expression support #20359

jordanlewis · 2017-11-30T19:54:27Z

Implement support for simple non-recursive common table expressions.

Referring to a single table expression more than once is not yet
supported.

cockroach-teamcity · 2017-11-30T19:54:33Z

This change is

knz · 2017-11-30T22:06:00Z

The name resolution technique is a bit too ad-hoc for my taste but I'm willing to go along with it for a first iteration.

Please however refrain from storing the logical plan in the naming scope. Store the AST query instead. then re-plan the AST every time the name is used. This is the way every other engine does it, and it gives way to all sorts of optimizations. (And it lifts the "restriction" (bug) in your current approach - that the name can only be used once.)

Reviewed 15 of 15 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

knz · 2017-11-30T22:07:38Z

of course there's a trick in there which is that whenever you resolve a name when replanning the CTE AST, you need to look up that name in the scope wher ethe CTE was defined, not in the scope where the CTE name is used.

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jordanlewis · 2017-11-30T22:09:25Z

TFTR!

It's not correct to re-plan the AST every time the name is used, unfortunately. The contract around CTEs is that every CTE clause gets evaluated exactly once. Take for example

SELECT * FROM z WHERE a IN x

Or something like that - if you had more than one reference to x you would need to ensure it's only evaluated once.

The reason I left that restriction in is that supporting this in general requires some way of storing temporary tables.

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jordanlewis · 2017-11-30T22:10:09Z

Oops my comment didn't render correctly. I meant:

WITH x AS (INSERT INTO t VALUES(1))
SELECT * FROM z WHERE a IN x

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

BramGruneir · 2017-11-30T22:15:53Z

CTEs are a fairly large feature. Do they not warrant an RFC? At least to pare down the scope.

Also does this really fully fix #7029? Unless we fully support CTEs, I don't think we should close that issue.

jordanlewis · 2017-11-30T22:17:18Z

I figured I would open more issues if this gets merged - specifically to support WITH RECURSIVE and temporary tables in CTEs.

I don't think this requires an RFC, at least without WITH RECURSIVE. It's a lot more straightforward than it seems from the outside.

knz · 2017-11-30T22:17:52Z

Read-only CTEs can be replanned every time. They are also the most common case

Write queries in CTEs require temporary tables. Your approach is not sufficient.

I kinda agree a RFC would be desirable. But we can support read-only, non-recursive CTEs using the approach I suggested.

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jordanlewis · 2017-11-30T22:19:58Z

Why do write queries in CTEs require temporary tables?

knz · 2017-11-30T22:36:14Z

Because the amount of data can be arbitrarily large!

jordanlewis · 2017-11-30T22:40:38Z

The amount of data where? Returned by an insert in an AS clause? Returned by a select in the AS clause of an insert? I don't follow - unless I'm missing something, we're either already vulnerable to a problem caused by what you're suggesting or there's really nothing wrong here.

Perhaps you could use an example to illustrate your point?

knz · 2017-11-30T22:47:07Z

All right so the situation is as follows:

the most common case is read-only CTEs
as common is the case where a CTE is used multiple times in the remainder of the query.
in the relatively less common case where INSERT/UPDATE are used as CTE (with RETURNING), then the most common case is when they are used multiple times in the remainder of the query.
the case where both a) data-modifying CTEs are used and b) they are only used once is rare and, I posit, not worth supporting at this time.

Meanwhile:

currently our [...] syntax does not allow the result of an INSERT/DELETE/UPDATE to be used multiple times, and therefore it can fit the "streaming model" and does not require accumulating the rows in memory, whereas
trying to support data-modifying CTEs that can be used multiple times will require accumulating the updated/inserted rows in memory, which is not tractable in the general case (There can be an arbitrary amount of modified rows)

So my position is that I do not want to think about situation 6, and I think you're mis-prioritizing by looking at point 4, and to some lesser extent point 3.

If we want to make a difference we need a solution for the most common case - read-only CTEs used in multiple positions. IF that makes your (our) life more difficult for data-modifying CTEs, I don't care (yet).

jordanlewis · 2017-11-30T23:02:18Z

Thanks for laying that out! I agree that we shouldn't think about situation 6 - temporary tables are way out of scope for this effort.

I disagree with your prioritization - do you have evidence for the frequency of use of these different query types? I think it's likely that queries that that use INSERT ... RETURNING clauses just once are more common than those that use them more than once.

In any case, for now I will modify the PR to plan each query where it's used. Banning write clauses is just as difficult as permitting their use only once, though, so I'll opt for that instead.

knz · 2017-12-01T00:05:58Z

The argument about one time use vs multiple times.

Just recall what the acronym CTE stands for: common table expressions.

This feature was developed starting from the observation that some table expressions are used multiple times and therefore that there was a need for DRY. So yes it's easy to argue that WITH is not used unless there are multiple points of use.

knz · 2017-12-01T00:22:13Z

All right so to recap our other discussion offline:

your initial argument is that the case of one-use CTE is common enough (e.g. the issue linked) to warrant that partial support
as we both found out, if the SQL standard indeed mandates referential transparency between all uses of a CTEs, this mandates temporary tables (materialization) even for read-only queries because in the general case a SELECT can use non-deterministic functions.

In other words, if there is a possibility to re-plan separate at every point of reuse, this can only be an optimization that kicks in after we have a proof that all possible executions will produce the same rows and that either the ordering is fixed by ORDER BY or all the points of use are order-agnostic (e.g. EXISTS, IN etc). Without this proof then using separate plans for separate uses might not only be inefficient, it is likely simply incorrect.

knz · 2017-12-01T00:23:26Z

So you can dismiss my initial reservations, but please capture the different aspects of this discussion in the commit message. The rationale for the limited implementation must be clear in the proposed release note, and also indicated as a recommended doc scope for the person who will document/explain the partial feature. Think about e.g. Robert and how he will have to explain our partial support to a potential customer.

BramGruneir · 2017-12-01T00:30:21Z

I'd like reiterate my point that this requires an RFC. A lot of this discussion should be included in it. It would also help others understand what our current CTE support is, by pointing them it.

…

On Nov 30, 2017 19:23, "kena" ***@***.***> wrote: So you can dismiss my initial reservations, but please capture the different aspects of this discussion in the commit message. The rationale for the limited implementation must be clear in the proposed release note, and also indicated as a recommended doc scope for the person who will document/explain the partial feature. Think about e.g. Robert and how he will have to explain our partial support to a potential customer. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#20359 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABihuZasiB10WMMX7SFZoZhIm9NH0_USks5s70cDgaJpZM4QxHAw> .

jordanlewis · 2017-12-01T00:30:58Z

Fair enough!

jordanlewis · 2017-12-01T05:46:35Z

RFC available in #20374.

jordanlewis · 2017-12-21T21:09:06Z

PTAL.

knz · 2017-12-27T14:03:52Z

I had a more thorough look at the code in light of the RFC. The code does what it says on the label but I have a few minor suggestions for improvement.

A larger concern now is whether the name resolution code is good as-is. The "planDataSource" logic was introduced while I was handling joins but it really belongs to this new scoping data structure you are introducing here. I wonder how much of this code can be merged together / simplified accordingly.

Reviewed 19 of 19 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

pkg/sql/with.go, line 18 at r2 (raw file):

import (
	"golang.org/x/net/context"

I think you need to revert this.

(Also, rebase and check with the linter.)

pkg/sql/with.go, line 24 at r2 (raw file):

)

type cteNameEnvironment []cteNameEnvironmentFrame

Add comments throughout to explain the various data structures and functions.

Also add a top-level comment to explain how you manage name scoping, and how the magic works with the resetter function.

pkg/sql/with.go, line 59 at r2 (raw file):

			frame[cte.Name.Alias] = cteSource{plan: ctePlan, alias: cte.Name}
		}
		return func() { p.cteNameEnvironment = p.cteNameEnvironment.pop() }, nil

If you make this func take the planner as parameter, you can define it globally, and thus make it static and avoid a heap allocation.

pkg/sql/sem/tree/with.go, line 43 at r2 (raw file):

		}
		cte.Name.Format(buf, f)
		buf.WriteString(" AS (\n")

This would be the first time we use newline characters during pretty-printing. 1) Are you comfortable doing so? Are there no places where we assume pretty-printed queries fit on a single line? 2) is this necessary?

Comments from Reviewable

Implement support for single-use non-recursive common table expressions. Referring to a single table expression more than once is not yet supported. Release note (sql): add support for single-use common table expressions

jordanlewis · 2017-12-28T15:36:30Z

Rebased and responded to comments. I'm not sure about changing the name resolution completely. plan data sources are also used in expressions with statement sources as i'm sure you're aware (from [...] syntax).

It's possible that the next time we'll need this kind of naming environment is in correlated subqueries, which I think is the main other example of naming scopes in SQL statements (right)? Maybe aliases too, actually.

Review status: 10 of 23 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

pkg/sql/with.go, line 18 at r2 (raw file):

Previously, knz (kena) wrote…

I think you need to revert this.

(Also, rebase and check with the linter.)

Done.

pkg/sql/with.go, line 24 at r2 (raw file):

Previously, knz (kena) wrote…

Add comments throughout to explain the various data structures and functions.

Also add a top-level comment to explain how you manage name scoping, and how the magic works with the resetter function.

Done.

pkg/sql/with.go, line 59 at r2 (raw file):

Previously, knz (kena) wrote…

If you make this func take the planner as parameter, you can define it globally, and thus make it static and avoid a heap allocation.

Done.

pkg/sql/sem/tree/with.go, line 43 at r2 (raw file):

Previously, knz (kena) wrote…

This would be the first time we use newline characters during pretty-printing. 1) Are you comfortable doing so? Are there no places where we assume pretty-printed queries fit on a single line? 2) is this necessary?

Didn't realize. Removed the newlines.

Comments from Reviewable

petermattis · 2017-12-28T15:42:53Z

It's possible that the next time we'll need this kind of naming environment is in correlated subqueries, which I think is the main other example of naming scopes in SQL statements (right)? Maybe aliases too, actually.

I'd be hesitant to change the name resolution code significantly at this time due to the upcoming planner work. See scope in opttoy for how it handles name resolution (properly?) for correlated subqueries. In particular, each scope has a parent scope and we have to walk through the list looking for the name being resolved.

jordanlewis · 2017-12-28T15:44:32Z

I'd be hesitant to change the name resolution code significantly at this time due to the upcoming planner work.

SGTM, let's merge as is then 😁

knz · 2017-12-28T15:48:07Z

Yes I agree with peter that it is too early to have that conversation - I wasn't suggesting to do anything about it in this PR.

The architecture Peter and Radu are pushing forward will provide a new context to think better about name resolution, even without considering "opttoy". Let's familiarize ourselves with this context before we resume the conversation. However I do want to point out that we need a single solution to satisfy our various naming scope requirements. If the code Peter and Radu cannot be used about this in the coming 6 months, we'll still want to do something about it here.

Reviewed 12 of 13 files at r3.
Review status: 22 of 23 files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

knz · 2017-12-28T15:48:39Z

Reviewed 1 of 13 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful.

Comments from Reviewable

jordanlewis · 2017-12-28T15:49:27Z

Although I'll note that the naming environments for CTEs are slightly different from those for correlated subqueries - correlated subqueries need to keep track of columns and their sources, whereas CTEs need to keep track of sources only. This might be a small enough difference that the same environment could be used, but it's worth pointing out.

jordanlewis · 2017-12-28T16:11:54Z

TFTRs!

jordanlewis requested review from justinj, knz and a team November 30, 2017 19:54

jordanlewis mentioned this pull request Dec 1, 2017

rfcs: simple common table expressions #20374

Merged

jordanlewis mentioned this pull request Dec 18, 2017

sql: mutations in statement sources don't work properly with limits #20732

Closed

jordanlewis changed the title ~~[WIP] sql: simple common table expression support~~ sql: single-use common table expression support Dec 21, 2017

jordanlewis force-pushed the cte branch from 1df0588 to a02919d Compare December 21, 2017 20:44

jordanlewis requested a review from a team as a code owner December 21, 2017 20:44

jordanlewis force-pushed the cte branch from a02919d to 8e217cc Compare December 21, 2017 20:58

jordanlewis force-pushed the cte branch 2 times, most recently from 06190c1 to ac7f747 Compare December 21, 2017 21:41

sql: single-use common table expression support

7dc98cf

Implement support for single-use non-recursive common table expressions. Referring to a single table expression more than once is not yet supported. Release note (sql): add support for single-use common table expressions

jordanlewis force-pushed the cte branch from ac7f747 to 7dc98cf Compare December 27, 2017 22:32

jordanlewis merged commit ac52672 into cockroachdb:master Dec 28, 2017

jordanlewis deleted the cte branch December 28, 2017 16:12

This was referenced Dec 28, 2017

sql: multi-use common table expressions #21084

Closed

sql: WITH RECURSIVE (recursive common table expressions) #21085

Closed

jseldess mentioned this pull request Jan 10, 2018

Single-use, non-recursive common table expressions (CTEs) - WITH clause cockroachdb/docs#2350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: single-use common table expression support #20359

sql: single-use common table expression support #20359

jordanlewis commented Nov 30, 2017 •

edited

Loading

cockroach-teamcity commented Nov 30, 2017

knz commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

BramGruneir commented Nov 30, 2017 •

edited

Loading

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Dec 1, 2017

knz commented Dec 1, 2017

knz commented Dec 1, 2017

BramGruneir commented Dec 1, 2017 via email

jordanlewis commented Dec 1, 2017

jordanlewis commented Dec 1, 2017

jordanlewis commented Dec 21, 2017

knz commented Dec 27, 2017

jordanlewis commented Dec 28, 2017

petermattis commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

knz commented Dec 28, 2017

knz commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

sql: single-use common table expression support #20359

sql: single-use common table expression support #20359

Conversation

jordanlewis commented Nov 30, 2017 • edited Loading

cockroach-teamcity commented Nov 30, 2017

knz commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

BramGruneir commented Nov 30, 2017 • edited Loading

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Nov 30, 2017

jordanlewis commented Nov 30, 2017

knz commented Dec 1, 2017

knz commented Dec 1, 2017

knz commented Dec 1, 2017

BramGruneir commented Dec 1, 2017 via email

jordanlewis commented Dec 1, 2017

jordanlewis commented Dec 1, 2017

jordanlewis commented Dec 21, 2017

knz commented Dec 27, 2017

jordanlewis commented Dec 28, 2017

petermattis commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

knz commented Dec 28, 2017

knz commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

jordanlewis commented Dec 28, 2017

jordanlewis commented Nov 30, 2017 •

edited

Loading

BramGruneir commented Nov 30, 2017 •

edited

Loading