-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: Add optimizer support for new Insert operator #32423
Conversation
niiice 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at a bunch of this, need to let a bunch of it sit and will come back for another look later.
Reviewed 22 of 22 files at r1, 3 of 3 files at r2, 7 of 7 files at r3, 3 of 3 files at r4, 31 of 31 files at r5, 3 of 3 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/logictest/testdata/logic_test/builtin_function, line 1990 at r1 (raw file):
UTF8 NULL # TODO(SQL execution team): Restore this to original form by removing FROM
nit: I believe the intention of the names is generally like, "who is the person you can ask about this", which I guess in this case would either be you or Jordan.
pkg/sql/opt/memo/testdata/logprops/join, line 91 at r1 (raw file):
│ │ ├── fd: ()-->(5,6) │ │ ├── prune: (6) │ │ ├── max1-row
nit/unrelated: I wonder if Max1Row
should error if its input is provably >1 row? Perhaps that case is limited enough that it's not worth it.
pkg/sql/opt/norm/custom_funcs.go, line 556 at r1 (raw file):
// Project operator can be merged with the columns of a Values operator to form // a single combined Values operator. This is only possible when there is a // single Values row and when the projections are not dependent on columns from
Why must there be a single Values
row? I think having an explanation of why these restrictions exist would be helpful.
pkg/sql/opt/norm/rules/project.opt, line 38 at r1 (raw file):
[MergeProjectWithValues, Normalize] (Project $input:(Values)
Could this be something like (Values [ * ])
and then CanMergeProjectWithValues
doesn't have to check there's a single row?
pkg/sql/opt/optbuilder/insert.go, line 495 at r5 (raw file):
// explicitly specified: // // INSERT INTO <table> (...) VALUES (...))
nit: extra )
pkg/sql/opt/optbuilder/insert.go, line 552 at r5 (raw file):
ib.addSynthesizedCols(func(tabCol opt.Column) bool { return !tabCol.IsComputed() }) // Add any missing computed columns.
I think a comment might be helpful to clarify that this ordering is enforced because computed columns can depend on default columns (but not the other way around).
pkg/sql/opt/optbuilder/insert.go, line 593 at r5 (raw file):
expr := ib.parseDefaultOrComputedExpr(tabColID) texpr := ib.outScope.resolveType(expr, tabCol.DatumType()) scopeCol := ib.b.addColumn(projectionsScope, "" /* label */, texpr)
is there any reason not to set the label to tabCol.ColName()
?
pkg/sql/opt/optbuilder/insert.go, line 672 at r5 (raw file):
more, less := "expressions", "target columns" if actual < expected { more, less = less, more
I like this, haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt/memo/testdata/logprops/join, line 91 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
nit/unrelated: I wonder if
Max1Row
should error if its input is provably >1 row? Perhaps that case is limited enough that it's not worth it.
Yeah, maybe something we add in future, but not terribly important.
pkg/sql/opt/norm/custom_funcs.go, line 556 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Why must there be a single
Values
row? I think having an explanation of why these restrictions exist would be helpful.
Done.
pkg/sql/opt/norm/rules/project.opt, line 38 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
Could this be something like
(Values [ * ])
and thenCanMergeProjectWithValues
doesn't have to check there's a single row?
Yes, it's a good idea. Though it does require a couple changes to Optgen, due to an unreferenced variable in this case.
pkg/sql/opt/optbuilder/insert.go, line 552 at r5 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I think a comment might be helpful to clarify that this ordering is enforced because computed columns can depend on default columns (but not the other way around).
Done.
pkg/sql/opt/optbuilder/insert.go, line 593 at r5 (raw file):
Previously, justinj (Justin Jaffray) wrote…
is there any reason not to set the label to
tabCol.ColName()
?
Well, I went back and forth a bit. The trouble is that if I set the columns in the input expression to the names of the target table columns, then it creates duplicate labels in the metadata. This in turn causes the formatter to fully qualify the names, which looks messy. I suppose I could come up with some other naming like a_default
for the default value of an a
column, or something.
pkg/sql/opt/optbuilder/insert.go, line 672 at r5 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I like this, haha
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've reviewed about half of this.... great stuff! Taking a break and will come back later and review the rest.
Reviewed 17 of 22 files at r1, 49 of 49 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 3 of 3 files at r10, 30 of 33 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt/exec/execbuilder/testdata/explain, line 584 at r7 (raw file):
└── values · · (x int[]) · · size 1 column, 1 row · · · row 0, expr 0 (ARRAY[(1)[int],(2)[int],(3)[int]])[int[]] · ·
This seems like a worse plan -- any idea why?
pkg/sql/opt/memo/statistics_builder.go, line 1754 at r11 (raw file):
colSet opt.ColSet, ins *InsertExpr, ) *props.ColumnStatistic { return sb.colStat(colSet, ins.Input)
This isn't updating the cache for Insert -- maybe use copyColStatFromChild
?
pkg/sql/opt/norm/custom_funcs.go, line 555 at r7 (raw file):
// MergeProjectWithValues merges a Project operator with its input Values // operator. This is only possible in certain circumstances, which must first be // validated by calling CanMergeProjectWithValues (see its header comment).
CanMergeProjectWithValues -> AreProjectionsCorrelated
pkg/sql/opt/norm/testdata/rules/decorrelate, line 3487 at r7 (raw file):
project ├── columns: i:2(int) y:7(int) └── inner-join-apply
how come we could decorrelate this before but not anymore?
pkg/sql/opt/optbuilder/insert.go, line 323 at r11 (raw file):
if indexCol.Column.IsDefault() || indexCol.Column.IsComputed() { // The column has a default value. allNull = false
What if the default value is NULL?
pkg/sql/opt/optbuilder/insert.go, line 464 at r11 (raw file):
val = ib.parseDefaultOrComputedExpr(ib.targetColList[itup]) } newTuple = append(newTuple, val)
do you want to check if newTuple != nil
?
pkg/sql/opt/optbuilder/insert.go, line 560 at r11 (raw file):
// scans the list of table columns, looking for any that do not yet have values // provided by the input expression. New columns are synthesized for any missing // columns.
Can you add a sentence explaining what the addCol
function does?
pkg/sql/opt/optbuilder/insert.go, line 572 at r11 (raw file):
// Get column metadata, including any mutation columns. var tabCol opt.Column
Don't you already have a helper function for this? tableColumnByOrdinal
?
pkg/sql/opt/optbuilder/insert.go, line 706 at r11 (raw file):
exprStr = tabCol.DefaultExprStr() case tabCol.IsNullable(): return tree.DNull
do you want to also cache this in parsedExprs
?
pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):
n = ate.Expr // It's okay to ignore the As columns here, as they're not permitted in // DML aliases where this function is used.
should there be an error if they are used?
pkg/sql/opt/optbuilder/select.go, line 21 at r10 (raw file):
"github.com/cockroachdb/cockroach/pkg/sql/privilege"
[nit] extra space
pkg/sql/opt/optbuilder/util.go, line 423 at r10 (raw file):
// metadata, so that the privileges can be re-checked on reuse of the memo. func (b *Builder) checkPrivilege(ds opt.DataSource, priv privilege.Kind) { if !b.skipSelectPrivilegeChecks {
What if the privilege is UPDATE? Don't we still want to check?
pkg/sql/opt/optbuilder/testdata/insert, line 267 at r11 (raw file):
INSERT INTO abcde WITH a AS (SELECT 1) VALUES (1, DEFAULT, 2) ---- error (42601): DEFAULT can only appear in a VALUES list within INSERT or on the right side of a SET
This error message is a bit confusing... but I assume this is the same message we were using before?
pkg/sql/opt/testutils/testcat/test_catalog.go, line 508 at r11 (raw file):
// IsDefault is part of the opt.Column interface. func (tc *Column) IsDefault() bool {
[nit] maybe HasDefault would be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the rest of the review -- only one additional (minor) comment. Impressive work!
Reviewed 3 of 33 files at r11, 3 of 3 files at r12, 11 of 11 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt/optbuilder/project.go, line 134 at r11 (raw file):
outScope.cols = make([]scopeColumn, 0, len(selects)+len(exprs)-1) } for i, e := range exprs {
now this is shadowing the outer i
. Can you give it a new name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt/exec/execbuilder/testdata/explain, line 584 at r7 (raw file):
Previously, rytaft wrote…
This seems like a worse plan -- any idea why?
The difference is that the InlineProjectInProject
rule fired before, and now it doesn't, because we don't have an InlineProjectInValues
rule. This seems like an edge case, so I don't think it's a priority to add such a rule.
pkg/sql/opt/memo/statistics_builder.go, line 1754 at r11 (raw file):
Previously, rytaft wrote…
This isn't updating the cache for Insert -- maybe use
copyColStatFromChild
?
But why cache it twice, both in the Insert cache and in the input node cache? Seems better to pass through calls, which will end up using the input node cache. I'd thought about just inlining this in colStat
, but decided to put it here for consistency with other methods. Perhaps Go inlines it.
pkg/sql/opt/norm/custom_funcs.go, line 555 at r7 (raw file):
Previously, rytaft wrote…
CanMergeProjectWithValues -> AreProjectionsCorrelated
Fixed the comment.
pkg/sql/opt/norm/testdata/rules/decorrelate, line 3487 at r7 (raw file):
Previously, rytaft wrote…
how come we could decorrelate this before but not anymore?
Because of the new MergeProjectWithValues
rule. To decorrelate this, we'd need a new rule that mapped (InnerJoin <input> (Values)) => (Project <input> (Values))
when Values
has cardinality one. Since this is quite an edge case, I don't think it's too important. But I'll keep in mind.
pkg/sql/opt/optbuilder/insert.go, line 323 at r11 (raw file):
Previously, rytaft wrote…
What if the default value is NULL?
I'm matching the semantics of the existing code (in makeBaseFKHelper
), and that code doesn't care whether a default value is NULL, or if a computed column evaluates to NULL. I changed the name from allNull
to allMissing
to describe the current behavior better. This code really exists just to give a better error at compile-time; if the default or computed value turns out to be NULL at runtime, it should be correctly rejected (unless all values are NULL, of course).
pkg/sql/opt/optbuilder/insert.go, line 464 at r11 (raw file):
Previously, rytaft wrote…
do you want to check if
newTuple != nil
?
Yes, good catch.
pkg/sql/opt/optbuilder/insert.go, line 560 at r11 (raw file):
Previously, rytaft wrote…
Can you add a sentence explaining what the
addCol
function does?
Done.
pkg/sql/opt/optbuilder/insert.go, line 572 at r11 (raw file):
Previously, rytaft wrote…
Don't you already have a helper function for this?
tableColumnByOrdinal
?
Yes, written at different times. Fixed to use the helper function.
pkg/sql/opt/optbuilder/insert.go, line 706 at r11 (raw file):
Previously, rytaft wrote…
do you want to also cache this in
parsedExprs
?
I had considered it, but decided not to, b/c it adds a bit more complexity for very little benefit.
pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):
Previously, rytaft wrote…
should there be an error if they are used?
I just cut and pasted this method from data_source.go
. I don't know what this comment means, so I removed it.
pkg/sql/opt/optbuilder/project.go, line 134 at r11 (raw file):
Previously, rytaft wrote…
now this is shadowing the outer
i
. Can you give it a new name?
Done.
pkg/sql/opt/optbuilder/util.go, line 423 at r10 (raw file):
Previously, rytaft wrote…
What if the privilege is UPDATE? Don't we still want to check?
Hmm, I guess the existing planner code does check in case where an INSERT is embedded in a view. I fixed this and added a regression case. Good catch.
pkg/sql/opt/optbuilder/testdata/insert, line 267 at r11 (raw file):
Previously, rytaft wrote…
This error message is a bit confusing... but I assume this is the same message we were using before?
Yes, it's a pre-existing error message, and what the HP shows today.
pkg/sql/opt/testutils/testcat/test_catalog.go, line 508 at r11 (raw file):
Previously, rytaft wrote…
[nit] maybe HasDefault would be a better name?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the InsertPrivate
structure by getting rid of the need for TableCols
.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 60 files at r14, 3 of 3 files at r15, 3 of 7 files at r16, 1 of 3 files at r17, 36 of 36 files at r18, 3 of 3 files at r19, 11 of 11 files at r20.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I just cut and pasted this method from
data_source.go
. I don't know what this comment means, so I removed it.
Looks like you removed it in data_source.go
but not here. I think it's referring to the fact that the tree.AliasClause
may also have columns (e.g., SELECT * FROM t AS a(b, c)
), but those column aliases aren't allowed in DML statements. I think it's actually a helpful comment, but maybe just needs a bit more info.
pkg/sql/opt/optbuilder/scope.go, line 580 at r20 (raw file):
col := &s.cols[i] // TODO(rytaft): Do not return a match if this column is being // backfilled, or the column expression being resolved is not from
Do you need to make any modifications here to ensure that mutation columns are not returned? If not, maybe you can update the TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):
// The child's provided ordering satisfies both <required> and the Insert // internal ordering; it may need to be trimmed. childProvided := expr.(*memo.InsertExpr).Input.ProvidedPhysical().Ordering
IIUC Insert doesn't output all input columns - only those in InputCols
(and which are not mutation columns). In principle, we could need to remap columns here; this should look like projectBuildProvided
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 191 at r20 (raw file):
A little fragile to define "can write" as "anything not handled yet" (e.g. if some of functioncalled above has a bug and returns a We could add a |
In some circumstances, it's possible to merge a Project operator with its input Values operator. For example: SELECT column1, 3 FROM (VALUES (1, 2)) => (VALUES (1, 3)) This simplifies the resulting query, requiring one less execution operator. This case commonly occurs with default values in INSERT statements. Release note: None
Add a method that returns a SQL syntax tag that identifies an operator, to be used in error messages. For example: SelectOp => SELECT ShowTraceForSessionOp => SHOW TRACE FOR SESSION Release note: None
Every caller passes expr.ResolvedType() as the type, so change method to do that itself rather than require every caller to do it. Release note: None
In preparation for supporting INSERT, support checking privileges other than privilege.SELECT when resolving data sources. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/exec/execbuilder/relational_builder.go, line 191 at r20 (raw file):
Previously, RaduBerinde wrote…
A little fragile to define "can write" as "anything not handled yet" (e.g. if some of functioncalled above has a bug and returns a
nil
root, we would get a very confusing error).We could add a
Mutation
tag and use that. We can then move the check up (if opt.IsMutation(e) && b.evalCtx.TxnReadOnly { return err }
) and movecase InsertExpr
in the switch above.
Good idea. Done.
pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):
Previously, rytaft wrote…
Looks like you removed it in
data_source.go
but not here. I think it's referring to the fact that thetree.AliasClause
may also have columns (e.g.,SELECT * FROM t AS a(b, c)
), but those column aliases aren't allowed in DML statements. I think it's actually a helpful comment, but maybe just needs a bit more info.
I see. No, there doesn't need to be an error here because the SQL grammar does not allow the syntax. So the parser would error if you try to use them, and it never gets here. I added a bit to the comment to clarify that.
pkg/sql/opt/optbuilder/scope.go, line 580 at r20 (raw file):
Previously, rytaft wrote…
Do you need to make any modifications here to ensure that mutation columns are not returned? If not, maybe you can update the TODO?
I removed the TODO. At least so far, the scopes never make mutation columns accessible to the query to begin with. Mutation columns are treated as if they were computed columns, and added only after expressions have been analyzed. I removed the TODO.
pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):
Previously, RaduBerinde wrote…
IIUC Insert doesn't output all input columns - only those in
InputCols
(and which are not mutation columns). In principle, we could need to remap columns here; this should look likeprojectBuildProvided
It's not possible to use mutation columns in ordering expressions (or any other expression for that matter, besides possibly another computed mutation expression). Doesn't that mean we can just ignore remapping? I added a check in check_expr.go
to ensure we never use mutation columns as output or ordering columns.
Add Insert operator to the optimizer. This includes Optgen definition of the operator, semantic analysis in optbuilder, logical properties, and required ordering. The implementation is side-by-side with the existing heuristic planner version. The main differences are that it supports correlated subqueries, integrates with the CBO, and compiles default and computed properties differently. Rather than keeping these properties separate, the new Insert operator wraps the input expression with Project operators that synthesize the default and computed expressions. Release note: None
Add rule to simplify an Insert operator's internal ordering. Release note: None
Add handling for the Insert operator in execbuilder and in the exec factory. This is mostly code copied over from sql/insert.go. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
It's not possible to use mutation columns in ordering expressions (or any other expression for that matter, besides possibly another computed mutation expression). Doesn't that mean we can just ignore remapping? I added a check in
check_expr.go
to ensure we never use mutation columns as output or ordering columns.
After our discussion, I added a call to remapProvided
. I also added a test in xform/testcases/physprops/ordering
.
bors r+ |
32423: opt: Add optimizer support for new Insert operator r=andy-kimball a=andy-kimball Add Insert operator to the optimizer. The implementation is side-by-side with the existing heuristic planner version. The main differences are that it supports correlated subqueries, integrates with the CBO, and compiles default and computed properties differently. Rather than keeping these properties separate, the new Insert operator wraps the input expression with Project operators that synthesize the default and computed expressions. Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Build succeeded |
Due to cockroachdb#32423, cockroachdb#31721 has been fixed. Also cleaned up the test output a little bit. Closes cockroachdb#31721. Fixes cockroachdb#32714. Release note: None
Add Insert operator to the optimizer. The implementation is side-by-side
with the existing heuristic planner version. The main differences are that it
supports correlated subqueries, integrates with the CBO, and compiles
default and computed properties differently. Rather than keeping these
properties separate, the new Insert operator wraps the input expression
with Project operators that synthesize the default and computed
expressions.