Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Add optimizer support for new Insert operator #32423

Merged
merged 7 commits into from
Nov 29, 2018

Conversation

andy-kimball
Copy link
Contributor

Add Insert operator to the optimizer. The implementation is side-by-side
with the existing heuristic planner version. The main differences are that it
supports correlated subqueries, integrates with the CBO, and compiles
default and computed properties differently. Rather than keeping these
properties separate, the new Insert operator wraps the input expression
with Project operators that synthesize the default and computed
expressions.

@andy-kimball andy-kimball requested a review from a team as a code owner November 16, 2018 18:42
@andy-kimball andy-kimball requested review from a team November 16, 2018 18:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Nov 16, 2018

niiice 🎉

Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at a bunch of this, need to let a bunch of it sit and will come back for another look later.

Reviewed 22 of 22 files at r1, 3 of 3 files at r2, 7 of 7 files at r3, 3 of 3 files at r4, 31 of 31 files at r5, 3 of 3 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/logictest/testdata/logic_test/builtin_function, line 1990 at r1 (raw file):

UTF8  NULL

# TODO(SQL execution team): Restore this to original form by removing FROM

nit: I believe the intention of the names is generally like, "who is the person you can ask about this", which I guess in this case would either be you or Jordan.


pkg/sql/opt/memo/testdata/logprops/join, line 91 at r1 (raw file):

 │    │    ├── fd: ()-->(5,6)
 │    │    ├── prune: (6)
 │    │    ├── max1-row

nit/unrelated: I wonder if Max1Row should error if its input is provably >1 row? Perhaps that case is limited enough that it's not worth it.


pkg/sql/opt/norm/custom_funcs.go, line 556 at r1 (raw file):

// Project operator can be merged with the columns of a Values operator to form
// a single combined Values operator. This is only possible when there is a
// single Values row and when the projections are not dependent on columns from

Why must there be a single Values row? I think having an explanation of why these restrictions exist would be helpful.


pkg/sql/opt/norm/rules/project.opt, line 38 at r1 (raw file):

[MergeProjectWithValues, Normalize]
(Project
    $input:(Values)

Could this be something like (Values [ * ]) and then CanMergeProjectWithValues doesn't have to check there's a single row?


pkg/sql/opt/optbuilder/insert.go, line 495 at r5 (raw file):

	// explicitly specified:
	//
	//   INSERT INTO <table> (...) VALUES (...))

nit: extra )


pkg/sql/opt/optbuilder/insert.go, line 552 at r5 (raw file):

	ib.addSynthesizedCols(func(tabCol opt.Column) bool { return !tabCol.IsComputed() })

	// Add any missing computed columns.

I think a comment might be helpful to clarify that this ordering is enforced because computed columns can depend on default columns (but not the other way around).


pkg/sql/opt/optbuilder/insert.go, line 593 at r5 (raw file):

		expr := ib.parseDefaultOrComputedExpr(tabColID)
		texpr := ib.outScope.resolveType(expr, tabCol.DatumType())
		scopeCol := ib.b.addColumn(projectionsScope, "" /* label */, texpr)

is there any reason not to set the label to tabCol.ColName()?


pkg/sql/opt/optbuilder/insert.go, line 672 at r5 (raw file):

		more, less := "expressions", "target columns"
		if actual < expected {
			more, less = less, more

I like this, haha

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/memo/testdata/logprops/join, line 91 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

nit/unrelated: I wonder if Max1Row should error if its input is provably >1 row? Perhaps that case is limited enough that it's not worth it.

Yeah, maybe something we add in future, but not terribly important.


pkg/sql/opt/norm/custom_funcs.go, line 556 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

Why must there be a single Values row? I think having an explanation of why these restrictions exist would be helpful.

Done.


pkg/sql/opt/norm/rules/project.opt, line 38 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

Could this be something like (Values [ * ]) and then CanMergeProjectWithValues doesn't have to check there's a single row?

Yes, it's a good idea. Though it does require a couple changes to Optgen, due to an unreferenced variable in this case.


pkg/sql/opt/optbuilder/insert.go, line 552 at r5 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I think a comment might be helpful to clarify that this ordering is enforced because computed columns can depend on default columns (but not the other way around).

Done.


pkg/sql/opt/optbuilder/insert.go, line 593 at r5 (raw file):

Previously, justinj (Justin Jaffray) wrote…

is there any reason not to set the label to tabCol.ColName()?

Well, I went back and forth a bit. The trouble is that if I set the columns in the input expression to the names of the target table columns, then it creates duplicate labels in the metadata. This in turn causes the formatter to fully qualify the names, which looks messy. I suppose I could come up with some other naming like a_default for the default value of an a column, or something.


pkg/sql/opt/optbuilder/insert.go, line 672 at r5 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I like this, haha

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've reviewed about half of this.... great stuff! Taking a break and will come back later and review the rest.

Reviewed 17 of 22 files at r1, 49 of 49 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 3 of 3 files at r10, 30 of 33 files at r11.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/exec/execbuilder/testdata/explain, line 584 at r7 (raw file):

 └── values  ·              ·                                           (x int[])  ·
·            size           1 column, 1 row                             ·          ·
·            row 0, expr 0  (ARRAY[(1)[int],(2)[int],(3)[int]])[int[]]  ·          ·

This seems like a worse plan -- any idea why?


pkg/sql/opt/memo/statistics_builder.go, line 1754 at r11 (raw file):

	colSet opt.ColSet, ins *InsertExpr,
) *props.ColumnStatistic {
	return sb.colStat(colSet, ins.Input)

This isn't updating the cache for Insert -- maybe use copyColStatFromChild?


pkg/sql/opt/norm/custom_funcs.go, line 555 at r7 (raw file):

// MergeProjectWithValues merges a Project operator with its input Values
// operator. This is only possible in certain circumstances, which must first be
// validated by calling CanMergeProjectWithValues (see its header comment).

CanMergeProjectWithValues -> AreProjectionsCorrelated


pkg/sql/opt/norm/testdata/rules/decorrelate, line 3487 at r7 (raw file):

project
 ├── columns: i:2(int) y:7(int)
 └── inner-join-apply

how come we could decorrelate this before but not anymore?


pkg/sql/opt/optbuilder/insert.go, line 323 at r11 (raw file):

			if indexCol.Column.IsDefault() || indexCol.Column.IsComputed() {
				// The column has a default value.
				allNull = false

What if the default value is NULL?


pkg/sql/opt/optbuilder/insert.go, line 464 at r11 (raw file):

				val = ib.parseDefaultOrComputedExpr(ib.targetColList[itup])
			}
			newTuple = append(newTuple, val)

do you want to check if newTuple != nil?


pkg/sql/opt/optbuilder/insert.go, line 560 at r11 (raw file):

// scans the list of table columns, looking for any that do not yet have values
// provided by the input expression. New columns are synthesized for any missing
// columns.

Can you add a sentence explaining what the addCol function does?


pkg/sql/opt/optbuilder/insert.go, line 572 at r11 (raw file):

		// Get column metadata, including any mutation columns.
		var tabCol opt.Column

Don't you already have a helper function for this? tableColumnByOrdinal?


pkg/sql/opt/optbuilder/insert.go, line 706 at r11 (raw file):

		exprStr = tabCol.DefaultExprStr()
	case tabCol.IsNullable():
		return tree.DNull

do you want to also cache this in parsedExprs?


pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):

		n = ate.Expr
		// It's okay to ignore the As columns here, as they're not permitted in
		// DML aliases where this function is used.

should there be an error if they are used?


pkg/sql/opt/optbuilder/select.go, line 21 at r10 (raw file):

	"github.com/cockroachdb/cockroach/pkg/sql/privilege"

[nit] extra space


pkg/sql/opt/optbuilder/util.go, line 423 at r10 (raw file):

// metadata, so that the privileges can be re-checked on reuse of the memo.
func (b *Builder) checkPrivilege(ds opt.DataSource, priv privilege.Kind) {
	if !b.skipSelectPrivilegeChecks {

What if the privilege is UPDATE? Don't we still want to check?


pkg/sql/opt/optbuilder/testdata/insert, line 267 at r11 (raw file):

INSERT INTO abcde WITH a AS (SELECT 1) VALUES (1, DEFAULT, 2)
----
error (42601): DEFAULT can only appear in a VALUES list within INSERT or on the right side of a SET

This error message is a bit confusing... but I assume this is the same message we were using before?


pkg/sql/opt/testutils/testcat/test_catalog.go, line 508 at r11 (raw file):

// IsDefault is part of the opt.Column interface.
func (tc *Column) IsDefault() bool {

[nit] maybe HasDefault would be a better name?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the rest of the review -- only one additional (minor) comment. Impressive work!

Reviewed 3 of 33 files at r11, 3 of 3 files at r12, 11 of 11 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/optbuilder/project.go, line 134 at r11 (raw file):

						outScope.cols = make([]scopeColumn, 0, len(selects)+len(exprs)-1)
					}
					for i, e := range exprs {

now this is shadowing the outer i. Can you give it a new name?

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/exec/execbuilder/testdata/explain, line 584 at r7 (raw file):

Previously, rytaft wrote…

This seems like a worse plan -- any idea why?

The difference is that the InlineProjectInProject rule fired before, and now it doesn't, because we don't have an InlineProjectInValues rule. This seems like an edge case, so I don't think it's a priority to add such a rule.


pkg/sql/opt/memo/statistics_builder.go, line 1754 at r11 (raw file):

Previously, rytaft wrote…

This isn't updating the cache for Insert -- maybe use copyColStatFromChild?

But why cache it twice, both in the Insert cache and in the input node cache? Seems better to pass through calls, which will end up using the input node cache. I'd thought about just inlining this in colStat, but decided to put it here for consistency with other methods. Perhaps Go inlines it.


pkg/sql/opt/norm/custom_funcs.go, line 555 at r7 (raw file):

Previously, rytaft wrote…

CanMergeProjectWithValues -> AreProjectionsCorrelated

Fixed the comment.


pkg/sql/opt/norm/testdata/rules/decorrelate, line 3487 at r7 (raw file):

Previously, rytaft wrote…

how come we could decorrelate this before but not anymore?

Because of the new MergeProjectWithValues rule. To decorrelate this, we'd need a new rule that mapped (InnerJoin <input> (Values)) => (Project <input> (Values)) when Values has cardinality one. Since this is quite an edge case, I don't think it's too important. But I'll keep in mind.


pkg/sql/opt/optbuilder/insert.go, line 323 at r11 (raw file):

Previously, rytaft wrote…

What if the default value is NULL?

I'm matching the semantics of the existing code (in makeBaseFKHelper), and that code doesn't care whether a default value is NULL, or if a computed column evaluates to NULL. I changed the name from allNull to allMissing to describe the current behavior better. This code really exists just to give a better error at compile-time; if the default or computed value turns out to be NULL at runtime, it should be correctly rejected (unless all values are NULL, of course).


pkg/sql/opt/optbuilder/insert.go, line 464 at r11 (raw file):

Previously, rytaft wrote…

do you want to check if newTuple != nil?

Yes, good catch.


pkg/sql/opt/optbuilder/insert.go, line 560 at r11 (raw file):

Previously, rytaft wrote…

Can you add a sentence explaining what the addCol function does?

Done.


pkg/sql/opt/optbuilder/insert.go, line 572 at r11 (raw file):

Previously, rytaft wrote…

Don't you already have a helper function for this? tableColumnByOrdinal?

Yes, written at different times. Fixed to use the helper function.


pkg/sql/opt/optbuilder/insert.go, line 706 at r11 (raw file):

Previously, rytaft wrote…

do you want to also cache this in parsedExprs?

I had considered it, but decided not to, b/c it adds a bit more complexity for very little benefit.


pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):

Previously, rytaft wrote…

should there be an error if they are used?

I just cut and pasted this method from data_source.go. I don't know what this comment means, so I removed it.


pkg/sql/opt/optbuilder/project.go, line 134 at r11 (raw file):

Previously, rytaft wrote…

now this is shadowing the outer i. Can you give it a new name?

Done.


pkg/sql/opt/optbuilder/util.go, line 423 at r10 (raw file):

Previously, rytaft wrote…

What if the privilege is UPDATE? Don't we still want to check?

Hmm, I guess the existing planner code does check in case where an INSERT is embedded in a view. I fixed this and added a regression case. Good catch.


pkg/sql/opt/optbuilder/testdata/insert, line 267 at r11 (raw file):

Previously, rytaft wrote…

This error message is a bit confusing... but I assume this is the same message we were using before?

Yes, it's a pre-existing error message, and what the HP shows today.


pkg/sql/opt/testutils/testcat/test_catalog.go, line 508 at r11 (raw file):

Previously, rytaft wrote…

[nit] maybe HasDefault would be a better name?

Done.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the InsertPrivate structure by getting rid of the need for TableCols.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 60 files at r14, 3 of 3 files at r15, 3 of 7 files at r16, 1 of 3 files at r17, 36 of 36 files at r18, 3 of 3 files at r19, 11 of 11 files at r20.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I just cut and pasted this method from data_source.go. I don't know what this comment means, so I removed it.

Looks like you removed it in data_source.go but not here. I think it's referring to the fact that the tree.AliasClause may also have columns (e.g., SELECT * FROM t AS a(b, c)), but those column aliases aren't allowed in DML statements. I think it's actually a helpful comment, but maybe just needs a bit more info.


pkg/sql/opt/optbuilder/scope.go, line 580 at r20 (raw file):

			col := &s.cols[i]
			// TODO(rytaft): Do not return a match if this column is being
			// backfilled, or the column expression being resolved is not from

Do you need to make any modifications here to ensure that mutation columns are not returned? If not, maybe you can update the TODO?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):

	// The child's provided ordering satisfies both <required> and the Insert
	// internal ordering; it may need to be trimmed.
	childProvided := expr.(*memo.InsertExpr).Input.ProvidedPhysical().Ordering

IIUC Insert doesn't output all input columns - only those in InputCols (and which are not mutation columns). In principle, we could need to remap columns here; this should look like projectBuildProvided

@RaduBerinde
Copy link
Member


pkg/sql/opt/exec/execbuilder/relational_builder.go, line 191 at r20 (raw file):

	// Handle operators which can write data.
	if ep.root == nil {

A little fragile to define "can write" as "anything not handled yet" (e.g. if some of functioncalled above has a bug and returns a nil root, we would get a very confusing error).

We could add a Mutation tag and use that. We can then move the check up (if opt.IsMutation(e) && b.evalCtx.TxnReadOnly { return err }) and move case InsertExpr in the switch above.

In some circumstances, it's possible to merge a Project operator with
its input Values operator. For example:

  SELECT column1, 3 FROM (VALUES (1, 2))
  =>
  (VALUES (1, 3))

This simplifies the resulting query, requiring one less execution operator.
This case commonly occurs with default values in INSERT statements.

Release note: None
Add a method that returns a SQL syntax tag that identifies an
operator, to be used in error messages. For example:

  SelectOp => SELECT
  ShowTraceForSessionOp => SHOW TRACE FOR SESSION

Release note: None
Every caller passes expr.ResolvedType() as the type, so change method
to do that itself rather than require every caller to do it.

Release note: None
In preparation for supporting INSERT, support checking privileges other
than privilege.SELECT when resolving data sources.

Release note: None
Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/exec/execbuilder/relational_builder.go, line 191 at r20 (raw file):

Previously, RaduBerinde wrote…

A little fragile to define "can write" as "anything not handled yet" (e.g. if some of functioncalled above has a bug and returns a nil root, we would get a very confusing error).

We could add a Mutation tag and use that. We can then move the check up (if opt.IsMutation(e) && b.evalCtx.TxnReadOnly { return err }) and move case InsertExpr in the switch above.

Good idea. Done.


pkg/sql/opt/optbuilder/insert.go, line 745 at r11 (raw file):

Previously, rytaft wrote…

Looks like you removed it in data_source.go but not here. I think it's referring to the fact that the tree.AliasClause may also have columns (e.g., SELECT * FROM t AS a(b, c)), but those column aliases aren't allowed in DML statements. I think it's actually a helpful comment, but maybe just needs a bit more info.

I see. No, there doesn't need to be an error here because the SQL grammar does not allow the syntax. So the parser would error if you try to use them, and it never gets here. I added a bit to the comment to clarify that.


pkg/sql/opt/optbuilder/scope.go, line 580 at r20 (raw file):

Previously, rytaft wrote…

Do you need to make any modifications here to ensure that mutation columns are not returned? If not, maybe you can update the TODO?

I removed the TODO. At least so far, the scopes never make mutation columns accessible to the query to begin with. Mutation columns are treated as if they were computed columns, and added only after expressions have been analyzed. I removed the TODO.


pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):

Previously, RaduBerinde wrote…

IIUC Insert doesn't output all input columns - only those in InputCols (and which are not mutation columns). In principle, we could need to remap columns here; this should look like projectBuildProvided

It's not possible to use mutation columns in ordering expressions (or any other expression for that matter, besides possibly another computed mutation expression). Doesn't that mean we can just ignore remapping? I added a check in check_expr.go to ensure we never use mutation columns as output or ordering columns.

Add Insert operator to the optimizer. This includes Optgen definition of
the operator, semantic analysis in optbuilder, logical properties, and
required ordering. The implementation is side-by-side with the existing
heuristic planner version. The main differences are that it supports
correlated subqueries, integrates with the CBO, and compiles default
and computed properties differently. Rather than keeping these properties
separate, the new Insert operator wraps the input expression with Project
operators that synthesize the default and computed expressions.

Release note: None
Add rule to simplify an Insert operator's internal ordering.

Release note: None
Add handling for the Insert operator in execbuilder and in the exec
factory. This is mostly code copied over from sql/insert.go.

Release note: None
Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/ordering/insert.go, line 43 at r20 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

It's not possible to use mutation columns in ordering expressions (or any other expression for that matter, besides possibly another computed mutation expression). Doesn't that mean we can just ignore remapping? I added a check in check_expr.go to ensure we never use mutation columns as output or ordering columns.

After our discussion, I added a call to remapProvided. I also added a test in xform/testcases/physprops/ordering.

@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Nov 29, 2018
32423: opt: Add optimizer support for new Insert operator r=andy-kimball a=andy-kimball

Add Insert operator to the optimizer.  The implementation is side-by-side
with the existing heuristic planner version. The main differences are that it
supports correlated subqueries, integrates with the CBO, and compiles
default and computed properties differently. Rather than keeping these
properties separate, the new Insert operator wraps the input expression
with Project operators that synthesize the default and computed
expressions.


Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 29, 2018

Build succeeded

@craig craig bot merged commit b3eaf62 into cockroachdb:master Nov 29, 2018
@andy-kimball andy-kimball deleted the stmt branch November 29, 2018 04:44
BramGruneir added a commit to BramGruneir/cockroach that referenced this pull request Dec 3, 2018
Due to cockroachdb#32423, cockroachdb#31721 has been fixed.

Also cleaned up the test output a little bit.

Closes cockroachdb#31721.
Fixes cockroachdb#32714.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 3, 2018
32729: roachtest: Update hibernate blacklist 2.2 r=BramGruneir a=BramGruneir

Due to #32423, #31721 has been fixed.

Also cleaned up the test output a little bit.

Closes #31721.
Fixes #32714.

Release note: None

Co-authored-by: Bram Gruneir <bram@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants