-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: prune unnecessary synthesized columns for CHECK constraints #51526
Comments
This has implications for splitting mutations across column families to minimize contention. Take YCSB's schema as an example: CREATE TABLE usertable (
ycsb_key VARCHAR(255) PRIMARY KEY NOT NULL,
FIELD0 TEXT NOT NULL,
FIELD1 TEXT NOT NULL,
FIELD2 TEXT NOT NULL,
FIELD3 TEXT NOT NULL,
FIELD4 TEXT NOT NULL,
FIELD5 TEXT NOT NULL,
FIELD6 TEXT NOT NULL,
FIELD7 TEXT NOT NULL,
FIELD8 TEXT NOT NULL,
FIELD9 TEXT NOT NULL,
FAMILY (ycsb_key),
FAMILY (FIELD0),
FAMILY (FIELD1),
FAMILY (FIELD2),
FAMILY (FIELD3),
FAMILY (FIELD4),
FAMILY (FIELD5),
FAMILY (FIELD6),
FAMILY (FIELD7),
FAMILY (FIELD8),
FAMILY (FIELD9)
) This allows for fully disjoint updates to different columns in the same row: root@localhost:26257/tpce_test> EXPLAIN (verbose) UPDATE usertable SET field4 = 'newval' WHERE ycsb_key = 'key';
tree | field | description | columns | ordering
-----------------+---------------------+-----------------------+--------------------------------+-----------
| distribution | local | |
| vectorized | false | |
update | | | () |
│ | estimated row count | 0 (missing stats) | |
│ | table | usertable | |
│ | set | field4 | |
│ | auto commit | | |
└── render | | | (ycsb_key, field4, field4_new) |
│ | estimated row count | 1 (missing stats) | |
│ | render 0 | 'newval' | |
│ | render 1 | ycsb_key | |
│ | render 2 | field4 | |
└── scan | | | (ycsb_key, field4) |
| estimated row count | 1 (missing stats) | |
| table | usertable@primary | |
| spans | /"key"/5/1-/"key"/5/2 | |
| locking strength | for update | | But if each column had a check constraint on it, we'd be forced to scan all column families during the update, dramatically increasing contention. CREATE TABLE usertable (
ycsb_key VARCHAR(255) PRIMARY KEY NOT NULL,
FIELD0 TEXT NOT NULL CHECK (FIELD0 != 'wrongval'),
FIELD1 TEXT NOT NULL CHECK (FIELD1 != 'wrongval'),
FIELD2 TEXT NOT NULL CHECK (FIELD2 != 'wrongval'),
FIELD3 TEXT NOT NULL CHECK (FIELD3 != 'wrongval'),
FIELD4 TEXT NOT NULL CHECK (FIELD4 != 'wrongval'),
FIELD5 TEXT NOT NULL CHECK (FIELD5 != 'wrongval'),
FIELD6 TEXT NOT NULL CHECK (FIELD6 != 'wrongval'),
FIELD7 TEXT NOT NULL CHECK (FIELD7 != 'wrongval'),
FIELD8 TEXT NOT NULL CHECK (FIELD8 != 'wrongval'),
FIELD9 TEXT NOT NULL CHECK (FIELD9 != 'wrongval'),
FAMILY (ycsb_key),
FAMILY (FIELD0),
FAMILY (FIELD1),
FAMILY (FIELD2),
FAMILY (FIELD3),
FAMILY (FIELD4),
FAMILY (FIELD5),
FAMILY (FIELD6),
FAMILY (FIELD7),
FAMILY (FIELD8),
FAMILY (FIELD9)
)
root@localhost:26257/tpce_test> EXPLAIN (verbose) UPDATE usertable SET field4 = 'newval' WHERE ycsb_key = 'key';
tree | field | description | columns | ordering
-----------------+---------------------+------------------------+-----------------------------------------------------------------------------------------------------------------+-----------
| distribution | local | |
| vectorized | false | |
update | | | () |
│ | estimated row count | 0 (missing stats) | |
│ | table | usertable | |
│ | set | field4 | |
│ | auto commit | | |
└── render | | | (ycsb_key, field4, field4_new, check1, check2, check3, check4, check5, check6, check7, check8, check9, check10) |
│ | estimated row count | 1 (missing stats) | |
│ | render 0 | field0 != 'wrongval' | |
│ | render 1 | field1 != 'wrongval' | |
│ | render 2 | field2 != 'wrongval' | |
│ | render 3 | field3 != 'wrongval' | |
│ | render 4 | true | |
│ | render 5 | field5 != 'wrongval' | |
│ | render 6 | field6 != 'wrongval' | |
│ | render 7 | field7 != 'wrongval' | |
│ | render 8 | field8 != 'wrongval' | |
│ | render 9 | field9 != 'wrongval' | |
│ | render 10 | 'newval' | |
│ | render 11 | ycsb_key | |
│ | render 12 | field4 | |
└── scan | | | (ycsb_key, field0, field1, field2, field3, field4, field5, field6, field7, field8, field9) |
| estimated row count | 1 (missing stats) | |
| table | usertable@primary | |
| spans | /"key"/1/1-/"key"/10/2 | |
| locking strength | for update | | A more complex instance of this comes up for real in TPC-E. |
Thanks for the example. I'm planning on looking into this soon. Hopefully we can get this fixed in 21.1. |
This commit updates the optimizer to prune synthesized `CHECK` constraint columns for `UPDATES` when columns referenced in the constraints are not updated. This may also allow the optimizer to no longer fetch those referenced columns. This should provide a performance benefit for `UDPATE`s to tables with check constraints. Notably, tables that have been given many column families (in order to reduce contention) should see a significant reduction in contention for `UPDATE`s that mutate a subset of column families. Informs cockroachdb#51526 Release note (performance improvement): Previously, all `CHECK` constraints defined on a table would be tested for every `UPDATE` to the table. Now, a check constraint will not be tested for validity when the values of columns it references are not being updated. The referenced columns are no longer fetchecd in cases where they were only fetched to test `CHECK` constraints.
56007: opt: prune unnecessary check columns r=RaduBerinde a=mgartner This commit updates the optimizer to prune synthesized `CHECK` constraint columns for `UPDATES` when columns referenced in the constraints are not updated. This may also allow the optimizer to no longer fetch those referenced columns. This should provide a performance benefit for `UDPATE`s to tables with check constraints. Notably, tables that have been given many column families (in order to reduce contention) should see a significant reduction in contention for `UPDATE`s that mutate a subset of column families. Informs #51526 Release note (performance improvement): Previously, all `CHECK` constraints defined on a table would be tested for every `UPDATE` to the table. Now, a check constraint will not be tested for validity when the values of columns it references are not being updated. The referenced columns are no longer fetchecd in cases where they were only fetched to test `CHECK` constraints. Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
#56007 handles only CHECK constraint cases, not partial index cases. I've been tinkering with it and should have something in the next couple weeks. |
I realized I have a separate issue for partial indexes, #51623. Closing this. |
During mutations, the optimizer synthesizes columns for
CHECK
constraints and partial index predicates. These columns are included in the query plan, regardless if it can be proven that they do not need to be calculated. We should prune these columns (or not even created them in optbuilder) if they do not need to be created.Example of current behavior:
The
partial_index_del1
andcheck1
columns are unnecessary because theUPDATE
does not refer to any of the columns in theCHECK
constraint or partial index.The text was updated successfully, but these errors were encountered: