-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: single-use common table expression support #20359
Conversation
The name resolution technique is a bit too ad-hoc for my taste but I'm willing to go along with it for a first iteration. Please however refrain from storing the logical plan in the naming scope. Store the AST query instead. then re-plan the AST every time the name is used. This is the way every other engine does it, and it gives way to all sorts of optimizations. (And it lifts the "restriction" (bug) in your current approach - that the name can only be used once.) Reviewed 15 of 15 files at r1. Comments from Reviewable |
of course there's a trick in there which is that whenever you resolve a name when replanning the CTE AST, you need to look up that name in the scope wher ethe CTE was defined, not in the scope where the CTE name is used. Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
TFTR! It's not correct to re-plan the AST every time the name is used, unfortunately. The contract around CTEs is that every CTE clause gets evaluated exactly once. Take for example
Or something like that - if you had more than one reference to x you would need to ensure it's only evaluated once. The reason I left that restriction in is that supporting this in general requires some way of storing temporary tables. Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Oops my comment didn't render correctly. I meant:
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
CTEs are a fairly large feature. Do they not warrant an RFC? At least to pare down the scope. Also does this really fully fix #7029? Unless we fully support CTEs, I don't think we should close that issue. |
I figured I would open more issues if this gets merged - specifically to support I don't think this requires an RFC, at least without |
Read-only CTEs can be replanned every time. They are also the most common case Write queries in CTEs require temporary tables. Your approach is not sufficient. I kinda agree a RFC would be desirable. But we can support read-only, non-recursive CTEs using the approach I suggested. Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Why do write queries in CTEs require temporary tables? |
Because the amount of data can be arbitrarily large! |
The amount of data where? Returned by an insert in an Perhaps you could use an example to illustrate your point? |
All right so the situation is as follows:
Meanwhile:
So my position is that I do not want to think about situation 6, and I think you're mis-prioritizing by looking at point 4, and to some lesser extent point 3. If we want to make a difference we need a solution for the most common case - read-only CTEs used in multiple positions. IF that makes your (our) life more difficult for data-modifying CTEs, I don't care (yet). |
Thanks for laying that out! I agree that we shouldn't think about situation 6 - temporary tables are way out of scope for this effort. I disagree with your prioritization - do you have evidence for the frequency of use of these different query types? I think it's likely that queries that that use In any case, for now I will modify the PR to plan each query where it's used. Banning write clauses is just as difficult as permitting their use only once, though, so I'll opt for that instead. |
The argument about one time use vs multiple times. Just recall what the acronym CTE stands for: common table expressions. This feature was developed starting from the observation that some table expressions are used multiple times and therefore that there was a need for DRY. So yes it's easy to argue that WITH is not used unless there are multiple points of use. |
All right so to recap our other discussion offline:
In other words, if there is a possibility to re-plan separate at every point of reuse, this can only be an optimization that kicks in after we have a proof that all possible executions will produce the same rows and that either the ordering is fixed by ORDER BY or all the points of use are order-agnostic (e.g. EXISTS, IN etc). Without this proof then using separate plans for separate uses might not only be inefficient, it is likely simply incorrect. |
So you can dismiss my initial reservations, but please capture the different aspects of this discussion in the commit message. The rationale for the limited implementation must be clear in the proposed release note, and also indicated as a recommended doc scope for the person who will document/explain the partial feature. Think about e.g. Robert and how he will have to explain our partial support to a potential customer. |
I'd like reiterate my point that this requires an RFC. A lot of this
discussion should be included in it.
It would also help others understand what our current CTE support is, by
pointing them it.
…On Nov 30, 2017 19:23, "kena" ***@***.***> wrote:
So you can dismiss my initial reservations, but please capture the
different aspects of this discussion in the commit message. The rationale
for the limited implementation must be clear in the proposed release note,
and also indicated as a recommended doc scope for the person who will
document/explain the partial feature. Think about e.g. Robert and how he
will have to explain our partial support to a potential customer.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#20359 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABihuZasiB10WMMX7SFZoZhIm9NH0_USks5s70cDgaJpZM4QxHAw>
.
|
Fair enough! |
RFC available in #20374. |
PTAL. |
06190c1
to
ac7f747
Compare
I had a more thorough look at the code in light of the RFC. The code does what it says on the label but I have a few minor suggestions for improvement. A larger concern now is whether the name resolution code is good as-is. The "planDataSource" logic was introduced while I was handling joins but it really belongs to this new scoping data structure you are introducing here. I wonder how much of this code can be merged together / simplified accordingly. Reviewed 19 of 19 files at r2. pkg/sql/with.go, line 18 at r2 (raw file):
I think you need to revert this. (Also, rebase and check with the linter.) pkg/sql/with.go, line 24 at r2 (raw file):
Add comments throughout to explain the various data structures and functions. Also add a top-level comment to explain how you manage name scoping, and how the magic works with the resetter function. pkg/sql/with.go, line 59 at r2 (raw file):
If you make this func take the planner as parameter, you can define it globally, and thus make it static and avoid a heap allocation. pkg/sql/sem/tree/with.go, line 43 at r2 (raw file):
This would be the first time we use newline characters during pretty-printing. 1) Are you comfortable doing so? Are there no places where we assume pretty-printed queries fit on a single line? 2) is this necessary? Comments from Reviewable |
Implement support for single-use non-recursive common table expressions. Referring to a single table expression more than once is not yet supported. Release note (sql): add support for single-use common table expressions
Rebased and responded to comments. I'm not sure about changing the name resolution completely. plan data sources are also used in expressions with statement sources as i'm sure you're aware ( It's possible that the next time we'll need this kind of naming environment is in correlated subqueries, which I think is the main other example of naming scopes in SQL statements (right)? Maybe aliases too, actually. Review status: 10 of 23 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. pkg/sql/with.go, line 18 at r2 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/with.go, line 24 at r2 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/with.go, line 59 at r2 (raw file): Previously, knz (kena) wrote…
Done. pkg/sql/sem/tree/with.go, line 43 at r2 (raw file): Previously, knz (kena) wrote…
Didn't realize. Removed the newlines. Comments from Reviewable |
I'd be hesitant to change the name resolution code significantly at this time due to the upcoming planner work. See |
SGTM, let's merge as is then 😁 |
Yes I agree with peter that it is too early to have that conversation - I wasn't suggesting to do anything about it in this PR. The architecture Peter and Radu are pushing forward will provide a new context to think better about name resolution, even without considering "opttoy". Let's familiarize ourselves with this context before we resume the conversation. However I do want to point out that we need a single solution to satisfy our various naming scope requirements. If the code Peter and Radu cannot be used about this in the coming 6 months, we'll still want to do something about it here. Reviewed 12 of 13 files at r3. Comments from Reviewable |
Reviewed 1 of 13 files at r3. Comments from Reviewable |
Although I'll note that the naming environments for CTEs are slightly different from those for correlated subqueries - correlated subqueries need to keep track of columns and their sources, whereas CTEs need to keep track of sources only. This might be a small enough difference that the same environment could be used, but it's worth pointing out. |
TFTRs! |
Implement support for simple non-recursive common table expressions.
Referring to a single table expression more than once is not yet
supported.
Fixes #7029.