Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: prune unnecessary synthesized columns for CHECK constraints #51526

Closed
mgartner opened this issue Jul 16, 2020 · 5 comments
Closed

opt: prune unnecessary synthesized columns for CHECK constraints #51526

mgartner opened this issue Jul 16, 2020 · 5 comments
Assignees
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@mgartner
Copy link
Collaborator

During mutations, the optimizer synthesizes columns for CHECK constraints and partial index predicates. These columns are included in the query plan, regardless if it can be proven that they do not need to be calculated. We should prune these columns (or not even created them in optbuilder) if they do not need to be created.

Example of current behavior:

root@127.0.0.1:61646/defaultdb> create table t (a int, b int, check (b > 0), index (b) where b > 10);
CREATE TABLE

Time: 3.123ms

root@127.0.0.1:61646/defaultdb> explain (opt, verbose) update t set a = 1 where a = 0;
                                                text
----------------------------------------------------------------------------------------------------
  update t
   ├── columns: <none>
   ├── fetch columns: a:4 b:5 rowid:6
   ├── update-mapping:
   │    └── a_new:8 => a:1
   ├── check columns: check1:9
   ├── partial index put columns: partial_index_del1:7
   ├── partial index del columns: partial_index_del1:7
   ├── cardinality: [0 - 0]
   ├── volatile, mutations
   ├── stats: [rows=0]
   ├── cost: 1130.46
   └── project
        ├── columns: check1:9 a_new:8 partial_index_del1:7 a:4 b:5 rowid:6
        ├── stats: [rows=10]
        ├── cost: 1130.45
        ├── key: (6)
        ├── fd: ()-->(4,8), (6)-->(5), (5)-->(7,9)
        ├── prune: (4-9)
        ├── select
        │    ├── columns: a:4 b:5 rowid:6
        │    ├── stats: [rows=10, distinct(4)=1, null(4)=0]
        │    ├── cost: 1130.04
        │    ├── key: (6)
        │    ├── fd: ()-->(4), (6)-->(5)
        │    ├── scan t
        │    │    ├── columns: a:4 b:5 rowid:6
        │    │    ├── partial index predicates
        │    │    │    └── t_b_idx: filters
        │    │    │         └── b:5 > 10 [outer=(5), constraints=(/5: [/11 - ]; tight)]
        │    │    ├── stats: [rows=1000, distinct(4)=100, null(4)=10, distinct(6)=1000, null(6)=0]
        │    │    ├── cost: 1120.02
        │    │    ├── key: (6)
        │    │    ├── fd: (6)-->(4,5)
        │    │    └── prune: (4-6)
        │    └── filters
        │         └── a:4 = 0 [outer=(4), constraints=(/4: [/0 - /0]; tight), fd=()-->(4)]
        └── projections
             ├── b:5 > 0 [as=check1:9, outer=(5)]
             ├── 1 [as=a_new:8]
             └── b:5 > 10 [as=partial_index_del1:7, outer=(5)]
(41 rows)

The partial_index_del1 and check1 columns are unnecessary because the UPDATE does not refer to any of the columns in the CHECK constraint or partial index.

@mgartner mgartner added A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Jul 16, 2020
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jul 16, 2020
@nvanbenschoten
Copy link
Member

This has implications for splitting mutations across column families to minimize contention. Take YCSB's schema as an example:

CREATE TABLE usertable (
    ycsb_key VARCHAR(255) PRIMARY KEY NOT NULL,
    FIELD0 TEXT NOT NULL,
    FIELD1 TEXT NOT NULL,
    FIELD2 TEXT NOT NULL,
    FIELD3 TEXT NOT NULL,
    FIELD4 TEXT NOT NULL,
    FIELD5 TEXT NOT NULL,
    FIELD6 TEXT NOT NULL,
    FIELD7 TEXT NOT NULL,
    FIELD8 TEXT NOT NULL,
    FIELD9 TEXT NOT NULL,
    FAMILY (ycsb_key),
    FAMILY (FIELD0),
    FAMILY (FIELD1),
    FAMILY (FIELD2),
    FAMILY (FIELD3),
    FAMILY (FIELD4),
    FAMILY (FIELD5),
    FAMILY (FIELD6),
    FAMILY (FIELD7),
    FAMILY (FIELD8),
    FAMILY (FIELD9)
)

This allows for fully disjoint updates to different columns in the same row:

root@localhost:26257/tpce_test> EXPLAIN (verbose) UPDATE usertable SET field4 = 'newval' WHERE ycsb_key = 'key';
       tree      |        field        |      description      |            columns             | ordering
-----------------+---------------------+-----------------------+--------------------------------+-----------
                 | distribution        | local                 |                                |
                 | vectorized          | false                 |                                |
  update         |                     |                       | ()                             |
   │             | estimated row count | 0 (missing stats)     |                                |
   │             | table               | usertable             |                                |
   │             | set                 | field4                |                                |
   │             | auto commit         |                       |                                |
   └── render    |                     |                       | (ycsb_key, field4, field4_new) |
        │        | estimated row count | 1 (missing stats)     |                                |
        │        | render 0            | 'newval'              |                                |
        │        | render 1            | ycsb_key              |                                |
        │        | render 2            | field4                |                                |
        └── scan |                     |                       | (ycsb_key, field4)             |
                 | estimated row count | 1 (missing stats)     |                                |
                 | table               | usertable@primary     |                                |
                 | spans               | /"key"/5/1-/"key"/5/2 |                                |
                 | locking strength    | for update            |                                |

But if each column had a check constraint on it, we'd be forced to scan all column families during the update, dramatically increasing contention.

CREATE TABLE usertable (
    ycsb_key VARCHAR(255) PRIMARY KEY NOT NULL,
    FIELD0 TEXT NOT NULL CHECK (FIELD0 != 'wrongval'),
    FIELD1 TEXT NOT NULL CHECK (FIELD1 != 'wrongval'),
    FIELD2 TEXT NOT NULL CHECK (FIELD2 != 'wrongval'),
    FIELD3 TEXT NOT NULL CHECK (FIELD3 != 'wrongval'),
    FIELD4 TEXT NOT NULL CHECK (FIELD4 != 'wrongval'),
    FIELD5 TEXT NOT NULL CHECK (FIELD5 != 'wrongval'),
    FIELD6 TEXT NOT NULL CHECK (FIELD6 != 'wrongval'),
    FIELD7 TEXT NOT NULL CHECK (FIELD7 != 'wrongval'),
    FIELD8 TEXT NOT NULL CHECK (FIELD8 != 'wrongval'),
    FIELD9 TEXT NOT NULL CHECK (FIELD9 != 'wrongval'),
    FAMILY (ycsb_key),
    FAMILY (FIELD0),
    FAMILY (FIELD1),
    FAMILY (FIELD2),
    FAMILY (FIELD3),
    FAMILY (FIELD4),
    FAMILY (FIELD5),
    FAMILY (FIELD6),
    FAMILY (FIELD7),
    FAMILY (FIELD8),
    FAMILY (FIELD9)
)

root@localhost:26257/tpce_test> EXPLAIN (verbose) UPDATE usertable SET field4 = 'newval' WHERE ycsb_key = 'key';
       tree      |        field        |      description       |                                                     columns                                                     | ordering
-----------------+---------------------+------------------------+-----------------------------------------------------------------------------------------------------------------+-----------
                 | distribution        | local                  |                                                                                                                 |
                 | vectorized          | false                  |                                                                                                                 |
  update         |                     |                        | ()                                                                                                              |
   │             | estimated row count | 0 (missing stats)      |                                                                                                                 |
   │             | table               | usertable              |                                                                                                                 |
   │             | set                 | field4                 |                                                                                                                 |
   │             | auto commit         |                        |                                                                                                                 |
   └── render    |                     |                        | (ycsb_key, field4, field4_new, check1, check2, check3, check4, check5, check6, check7, check8, check9, check10) |
        │        | estimated row count | 1 (missing stats)      |                                                                                                                 |
        │        | render 0            | field0 != 'wrongval'   |                                                                                                                 |
        │        | render 1            | field1 != 'wrongval'   |                                                                                                                 |
        │        | render 2            | field2 != 'wrongval'   |                                                                                                                 |
        │        | render 3            | field3 != 'wrongval'   |                                                                                                                 |
        │        | render 4            | true                   |                                                                                                                 |
        │        | render 5            | field5 != 'wrongval'   |                                                                                                                 |
        │        | render 6            | field6 != 'wrongval'   |                                                                                                                 |
        │        | render 7            | field7 != 'wrongval'   |                                                                                                                 |
        │        | render 8            | field8 != 'wrongval'   |                                                                                                                 |
        │        | render 9            | field9 != 'wrongval'   |                                                                                                                 |
        │        | render 10           | 'newval'               |                                                                                                                 |
        │        | render 11           | ycsb_key               |                                                                                                                 |
        │        | render 12           | field4                 |                                                                                                                 |
        └── scan |                     |                        | (ycsb_key, field0, field1, field2, field3, field4, field5, field6, field7, field8, field9)                      |
                 | estimated row count | 1 (missing stats)      |                                                                                                                 |
                 | table               | usertable@primary      |                                                                                                                 |
                 | spans               | /"key"/1/1-/"key"/10/2 |                                                                                                                 |
                 | locking strength    | for update             |                                                                                                                 |

A more complex instance of this comes up for real in TPC-E.

@mgartner
Copy link
Collaborator Author

Thanks for the example. I'm planning on looking into this soon. Hopefully we can get this fixed in 21.1.

@mgartner mgartner self-assigned this Sep 10, 2020
mgartner added a commit to mgartner/cockroach that referenced this issue Oct 27, 2020
This commit updates the optimizer to prune synthesized `CHECK`
constraint columns for `UPDATES` when columns referenced in the
constraints are not updated. This may also allow the optimizer to no
longer fetch those referenced columns.

This should provide a performance benefit for `UDPATE`s to tables with
check constraints. Notably, tables that have been given many column
families (in order to reduce contention) should see a significant
reduction in contention for `UPDATE`s that mutate a subset of column
families.

Informs cockroachdb#51526

Release note (performance improvement): Previously, all `CHECK`
constraints defined on a table would be tested for every `UPDATE` to the
table. Now, a check constraint will not be tested for validity when the
values of columns it references are not being updated. The referenced
columns are no longer fetchecd in cases where they were only fetched to
test `CHECK` constraints.
craig bot pushed a commit that referenced this issue Oct 28, 2020
56007: opt: prune unnecessary check columns r=RaduBerinde a=mgartner

This commit updates the optimizer to prune synthesized `CHECK`
constraint columns for `UPDATES` when columns referenced in the
constraints are not updated. This may also allow the optimizer to no
longer fetch those referenced columns.

This should provide a performance benefit for `UDPATE`s to tables with
check constraints. Notably, tables that have been given many column
families (in order to reduce contention) should see a significant
reduction in contention for `UPDATE`s that mutate a subset of column
families.

Informs #51526

Release note (performance improvement): Previously, all `CHECK`
constraints defined on a table would be tested for every `UPDATE` to the
table. Now, a check constraint will not be tested for validity when the
values of columns it references are not being updated. The referenced
columns are no longer fetchecd in cases where they were only fetched to
test `CHECK` constraints.

Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@nvanbenschoten
Copy link
Member

@mgartner #56007 looks great! Is that all that's needed to close this issue, or is there more to this?

@mgartner
Copy link
Collaborator Author

#56007 handles only CHECK constraint cases, not partial index cases. I've been tinkering with it and should have something in the next couple weeks.

@mgartner mgartner changed the title opt: prune unnecessary synthesized columns for CHECK constraints and partial indexes opt: prune unnecessary synthesized columns for CHECK constraints Dec 23, 2020
@mgartner
Copy link
Collaborator Author

I realized I have a separate issue for partial indexes, #51623. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants