Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: synthesize check constraints on enum columns #49284

Merged
merged 1 commit into from
Jun 15, 2020

Conversation

rohany
Copy link
Contributor

@rohany rohany commented May 19, 2020

Fixes #49263.

This PR teaches the optimizer how to synthesize check constraints on
columns of an ENUM type, allowing queries like:

CREATE TYPE t AS ENUM ('howdy', 'hello');
CREATE TABLE tt (x t, y INT, PRIMARY KEY (x, y));
SELECT x, y FROM tt WHERE y = 2

to be planned using constrained spans on the enum values, rather than a
full table scan.

Release note (performance improvement): Allow the optimizer to use enum
information to generate better query plans.

@rohany rohany requested review from jordanlewis and RaduBerinde May 19, 2020 19:33
@rohany rohany requested a review from a team as a code owner May 19, 2020 19:33
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rohany
Copy link
Contributor Author

rohany commented May 19, 2020

Left some todos in here that need to get resolved as well.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/opt_catalog.go, line 540 at r1 (raw file):

	inboundFKs  []optForeignKeyConstraint

	// checkConstraints is the set of synthesized check constraints for this

not just synthesized -- you've included all constraints below


pkg/sql/opt_catalog.go, line 663 at r1 (raw file):

	}
	// Move all existing and synthesized checks into the opt table.
	ot.checkConstraints = make([]cat.CheckConstraint, 0, len(desc.Checks)+len(synthesizedChecks))

why were we looking at desc.ActiveChecks() before, but now this is only looking at desc.Checks?


pkg/sql/resolver.go, line 178 at r1 (raw file):

		return nil, errors.AssertionFailedf("%s was not a type descriptor", rawDesc)
	}
	// TODO (rohany): Should we perform lookups on the parent database andschema

andschema -> and schema

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/opt_catalog.go, line 540 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

not just synthesized -- you've included all constraints below

Done.


pkg/sql/opt_catalog.go, line 663 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

why were we looking at desc.ActiveChecks() before, but now this is only looking at desc.Checks?

Not sure -- looking at the code it seems to be the same thing, but I'll stay consistent.


pkg/sql/resolver.go, line 178 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

andschema -> and schema

Done.

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

      └── [/'hi' - /'hi']

# TODO (rohany): Why is the NOT NULL important here? I couldn't get this

Do you know the answer to this?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Do you know the answer to this?

I think we just haven't added support for it yet, but I think we probably could... @RaduBerinde probably knows better since he was working with check constraints and their interactions with nulls recently....

@rytaft
Copy link
Collaborator

rytaft commented May 21, 2020


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think we just haven't added support for it yet, but I think we probably could... @RaduBerinde probably knows better since he was working with check constraints and their interactions with nulls recently....

This comment is probably relevant:

		// Check constraints that are guaranteed to not evaluate to NULL
		// are the only ones converted into filters. This is because a NULL
		// constraint is interpreted as passing, whereas a NULL filter is not.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This comment is probably relevant:

		// Check constraints that are guaranteed to not evaluate to NULL
		// are the only ones converted into filters. This is because a NULL
		// constraint is interpreted as passing, whereas a NULL filter is not.

The problem is that CHECK (expr) "passes" if the expression evaluates to NULL so we can't blindly convert it to a filter. We could do a better job by adding an OR x IS NULL in this case (in general, in cases where the result is NULL iff a given variable is NULL).

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/opt/exec/execbuilder/testdata/enums, line 54 at r2 (raw file):

Previously, RaduBerinde wrote…

The problem is that CHECK (expr) "passes" if the expression evaluates to NULL so we can't blindly convert it to a filter. We could do a better job by adding an OR x IS NULL in this case (in general, in cases where the result is NULL iff a given variable is NULL).

If you want to play around with that, this is where that logic would live: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optbuilder/select.go#L599

@rohany
Copy link
Contributor Author

rohany commented May 21, 2020

TFTRs! I'm going to hold off on merging this until some other things I'm working on answer the TODO's in here.

If you want to play around with that, this is where that logic would live: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optbuilder/select.go#L599

Sure, it would be fun to take a look at that after.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - can't wait to see this working in concert with partitioning!

@rohany
Copy link
Contributor Author

rohany commented May 27, 2020

I've been thinking about how these checks can interact with the alter type command. For context, when we add a value to an enum type, it doesn't directly become "writeable". We first need to make sure that all nodes know how to interpret an enum value before we write it, so the enum members themselves go through a mini schema change, where they progress from "readable" to "writeable".

It seems like the check constraints in the optimizer can catch most cases of attempting to insert a value that is only in the readable state. However, we cannot restrict this check constraint to be only the readable values of the enum -- another node with the value in the writeable state could have inserted that value before the current node has the value in the writeable state. So it seems like we want to instead have a notion of "checks to enforce on reads" and "checks to enforce on writes" in the optimizer. Does this make sense?

@rohany
Copy link
Contributor Author

rohany commented Jun 1, 2020

hey @RaduBerinde, @rytaft did you get a chance to read my comment here?

@RaduBerinde
Copy link
Member

That sounds right to me, we would need the check constraints to enforce on writes to be different than the check constraints that we can use to make assumptions about existing data. They are kind of different already, in that the latter are a subset (because of non-validated constraints, or because of NULL handling in constraint conditions).

@RaduBerinde
Copy link
Member

As a potential (maybe temporary) solution to avoid correctness issues, we can use a big hammer and treat any constraint involving an enum that is being changed as non-validated.

@mgartner mgartner self-requested a review June 12, 2020 16:57
@rohany
Copy link
Contributor Author

rohany commented Jun 15, 2020

As a potential (maybe temporary) solution to avoid correctness issues, we can use a big hammer and treat any constraint involving an enum that is being changed as non-validated.

I think that this is the correct solution. It looks like the optimizer already has logic of using only validated constraints on reads, so I can just piggyback onto that here. I'm planning on just synthesizing an extra unvalidated constraint containing the "read only" enum members to be verified on mutation operations, and then keeping the validated constraint with all members of the enum to be used on reads.

Given that I don't have to do a bigger change here when ALTER TYPE is done, I'm comfortable with merging this.

Can y'all take a quick second look at this?

This PR teaches the optimizer how to synthesize check constraints on
columns of an ENUM type, allowing queries like:

```
CREATE TYPE t AS ENUM ('howdy', 'hello');
CREATE TABLE tt (x t, y INT, PRIMARY KEY (x, y));
SELECT x, y FROM tt WHERE y = 2
```

to be planned using constrained spans on the enum values, rather than a
full table scan.

Release note (performance improvement): Allow the optimizer to use enum
information to generate better query plans.
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rohany)

@rohany
Copy link
Contributor Author

rohany commented Jun 15, 2020

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 15, 2020

Build succeeded

@craig craig bot merged commit 5cff000 into cockroachdb:master Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: teach the optimizer to generate check constraints on enum columns
5 participants