-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docs on indexing of REGIONAL BY ROW tables #10280
Add docs on indexing of REGIONAL BY ROW tables #10280
Conversation
Hi Becca! Please ignore the busted build, it's because this PR depends on a link introduced in #10235 - it will turn green once that one is in I have a few questions I would appreciate your thoughts on in addition to anything else you want to provide feedback on:
Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @rmloveland!
Is there anything that is obviously incorrect or misleading?
I added some comments inline in a few places
Does the general emphasis I have added of "hey most people probably want to use this, NOT explicit partitioning" make sense from your POV?
Yes, but as I mentioned in some inline comments below, I think the message should be "use REGIONAL BY ROW
tables, not explicit partitioning".
Anything else users should know about this feature that I've just completely missed?
I think it might be worth changing the name of this feature to "UNIQUE
indexes in REGIONAL BY ROW
tables". Although implicitly partitioned unique indexes technically exist, I believe that similar to UNIQUE WITHOUT INDEX
, we've decided not to expose the feature to users in 21.1 (@otan, can you confirm?). You can enable it with SET experimental_enable_implicit_column_partitioning = true
. Implicitly partitioned unique indexes are used to support UNIQUE
indexes in REGIONAL BY ROW
tables, but that's an implementation detail that users don't need to know about.
Instead, you should probably say that all indexes, including UNIQUE
indexes, are partitioned on crdb_region
in REGIONAL BY ROW
tables. These indexes can either include the partitioning key (crdb_region
) as the first column in the index definition, or exclude it. If it is included, a UNIQUE
index will enforce uniqueness on the set of columns, just like it would in a non-partitioned table. But if crdb_region
is excluded from the index definition, that serves as a signal that the database should enforce uniqueness on just the columns in the index definition. The index alone cannot enforce uniqueness on columns that are not a prefix of the index columns, so any time rows are inserted or updated in a REGIONAL BY ROW
table that has an implicitly partitioned UNIQUE
index, the optimizer must add uniqueness checks. Maybe the docs could show the EXPLAIN
output of an INSERT
or UPDATE
statement to show what these checks look like.
I'm not sure if you want to go into detail about this, but the optimizer will also try to remove those checks whenever it is safe to do so. The checks do add latency overhead since they need to visit every region to be sure that the new value doesn't already exist in any other region. There are some specific scenarios where checks can be removed that I can describe if you'd like.
Despite the overhead, the advantage of these checks is:
- They enforce uniqueness constraints that the user has specified, maintaining consistency in the database
- They allow the optimizer to take advantage of the guaranteed uniqueness of these columns to perform "locality optimized search" when searching for a row with a fixed value for the unique columns. By using this optimization, the execution engine can avoid visiting remote nodes if the row is found locally. (See inline comments for an example.)
~
Ok sorry that was a lot! Also happy to schedule some time to chat about this if it would be helpful.
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
v21.1/add-constraint.md, line 235 at r1 (raw file):
to ensure that you get the best performance possible on uniqueness checks in multi-region deployments.
This seems a bit misleading since it's not really about performance; it's about correctness. If you want a partitioned UNIQUE index where the partitioning column is not part of the UNIQUE constraint, this is currently the only way to do it. The optimizer adds uniqueness checks to ensure correctness, and will try to choose the most performant plan possible for those checks. But the fact that there are checks at all means that by definition it will be less performant than an explicitly partitioned or non-partitioned UNIQUE index.
v21.1/add-constraint.md, line 278 at r1 (raw file):
uniqueness checks happen in the region where the user is located, which will lead to reduced query latency.
This isn't correct. The uniqueness checks, which are added when a row is inserted or the email column of an existing row is updated, will need to query every region. The performance benefit happens for queries that read users
and select a single email address (e.g., SELECT * FROM users WHERE email = 'anemailaddress@gmail.com'
). If 'anemailaddress@gmail.com'
is found in the local region, there is no need to search remote regions.
v21.1/add-constraint.md, line 296 at r1 (raw file):
| | | | | | | | | | | | | | | | | |
why all the empty spaces?
v21.1/create-index.md, line 50 at r1 (raw file):
`ASC` or `DESC`| Sort the column in ascending (`ASC`) or descending (`DESC`) order in the index. How columns are sorted affects query results, particularly when using `LIMIT`.<br><br>__Default:__ `ASC` `STORING ...`| Store (but do not sort) each column whose name you include.<br><br>For information on when to use `STORING`, see [Store Columns](#store-columns). Note that columns that are part of a table's [`PRIMARY KEY`](primary-key.html) cannot be specified as `STORING` columns in secondary indexes on the table.<br><br>`COVERING` and `INCLUDE` are aliases for `STORING` and work identically. `opt_partition_by` | An [enterprise-only](enterprise-licensing.html) option that lets you [define index partitions at the row level](partitioning.html). As of CockroachDB v21.1 and later, most users should use [implicitly partitioned unique indexes](multiregion-overview.html#implicitly-partitioned-unique-indexes) instead.
This partition by clause isn't for only UNIQUE
indexes. I would either take this comment out or just say users should use REGIONAL BY ROW
tables.
v21.1/indexes.md, line 22 at r1 (raw file):
To index [spatial data](spatial-data.html), CockroachDB uses *spatial indexes*. For more information about spatial indexes, see [Spatial Indexes](spatial-indexes.html). To support [multi-region deployments](multiregion-overview.html), indexes can be [partitioned](partitioning.html) based on region. However, as of CockroachDB v21.1 and later, most users should use [implicitly partitioned unique indexes](multiregion-overview.html#implicitly-partitioned-unique-indexes) instead.
Ditto -- say they should use REGIONAL BY ROW
tables
v21.1/multiregion-overview.md, line 173 at r1 (raw file):
When you add a `UNIQUE` constraint to a column in a multi-region database,
This only applies for REGIONAL BY ROW
tables
to ensure that you get the best performance possible on uniqueness checks in multi-region deployments.
See my comments above about performance v. correctness
v21.1/unique.md, line 24 at r1 (raw file):
- You can define the `UNIQUE` constraint when [creating a table](#syntax), or you can add it to existing tables through [`ADD CONSTRAINT`](add-constraint.html#add-the-unique-constraint). - <span class="version-tag">New in v21.1:</span> CockroachDB has support for _implicitly partitioned unique indexes_ in [multi-region databases](multiregion-overview.html). When you add a `UNIQUE` constraint to a column in a multi-region database, CockroachDB automatically takes care of [partitioning](partitioning.html) the index for you to ensure that you get the best performance possible on uniqueness checks in multi-region deployments.
ditto comments above
Yeah I think it's better not to publicise it. |
ecc826e
to
e3515b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all of this information Becca! Sorry for the slow response, I think I opened too many PRs at once last week.
I have changed things around a bit to focus more on the "don't do explicit partitioning, instead use indexes on REGIONAL BY ROW tables" angle, as you said to
Also followed your guidance to move the focus away from UNIQUE indexes to all indexes on REGIONAL BY ROW tables, and noted they are automagically partitioned on crdb_region
(with a link to some more details on that column)
w.r.t. info re: including/excluding crdb_region
info, I am thinking of adding a note about the exclude case to the "Add a unique index to a REGIONAL BY ROW
table" example in add-constraint.md, since we already discuss when the region column is implicitly included, like "normal" tables - but the use case for excluding it is not as clear to me. Do you mind explaining a use case for doing that? (Sorry if it's implied by what you wrote, I am not sure I understood that section)
Re: your other comments below, I hope they are mostly addressed by the latest edits, but PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
v21.1/add-constraint.md, line 235 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
to ensure that you get the best performance possible on uniqueness checks in multi-region deployments.
This seems a bit misleading since it's not really about performance; it's about correctness. If you want a partitioned UNIQUE index where the partitioning column is not part of the UNIQUE constraint, this is currently the only way to do it. The optimizer adds uniqueness checks to ensure correctness, and will try to choose the most performant plan possible for those checks. But the fact that there are checks at all means that by definition it will be less performant than an explicitly partitioned or non-partitioned UNIQUE index.
Thanks for calling this out. I updated this language here and elsewhere where it was used. Not sure if it's 100% correct though, please let me know
v21.1/add-constraint.md, line 278 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
uniqueness checks happen in the region where the user is located, which will lead to reduced query latency.
This isn't correct. The uniqueness checks, which are added when a row is inserted or the email column of an existing row is updated, will need to query every region. The performance benefit happens for queries that read
users
and select a single email address (e.g.,SELECT * FROM users WHERE email = 'anemailaddress@gmail.com'
). If'anemailaddress@gmail.com'
is found in the local region, there is no need to search remote regions.
Thanks - I have updated this section with this info.
Is this ability to keep the query local for things that are known to be unique also known as "locality optimized search" ? I took a shot at adding that to the doc. PTAL.
v21.1/add-constraint.md, line 296 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why all the empty spaces?
Brain malfunction - removed :-)
v21.1/create-index.md, line 50 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This partition by clause isn't for only
UNIQUE
indexes. I would either take this comment out or just say users should useREGIONAL BY ROW
tables.
I see - per your comment above I guess it's any and all indexes, right?
Here is what I updated it with, it's a bit more than your suggestion, please let me know what you think. Also happy to keep it to "use REGIONAL BY ROW" if that's better than this.
"As of CockroachDB v21.1 and later, most users should use REGIONAL BY ROW
tables. Indexes against regional by row tables are automatically partitioned, so explicit index partitioning is not required."
v21.1/indexes.md, line 22 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Ditto -- say they should use
REGIONAL BY ROW
tables
Updated! I ended up using an "include file" since this info is repeated in like 4 places. The text should now be updated to be more general, recommending REGIONAL BY ROW and just creating "normal" indexes against such tables, and recommending against explicit partitioning for most users
v21.1/multiregion-overview.md, line 173 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
When you add a `UNIQUE` constraint to a column in a multi-region database,
This only applies for
REGIONAL BY ROW
tablesto ensure that you get the best performance possible on uniqueness checks in multi-region deployments.
See my comments above about performance v. correctness
Updated to rename this section to 'Indexes on REGIONAL BY ROW
tables'
Revised this whole opening paragraph as discussed above, saying most users should use REGIONAL BY ROW tables and add indexes normally, no explicit partitioning required
Re: performance, we link to the UNIQUE constraint example which is the only place now we discuss performance, and it's updated based on your earlier comment
v21.1/unique.md, line 24 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto comments above
Updated as mentioned in previous comment (these are all included from one source file now, since it's used in 4 places - if you have additional comments you won't have to leave them 4 times now, sorry about that!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better!
Here's an example for what I was talking about with the implicit v. explicit partitioning. To start, run ./cockroach demo --geo-partitioned-replicas
. Next, create a multi-region DB and an employees table as follows:
CREATE DATABASE multi_region_test_db PRIMARY REGION "europe-west1" REGIONS "us-west1", "us-east1";
USE multi_region_test_db;
CREATE TABLE employee (
id INT PRIMARY KEY,
email STRING UNIQUE,
desk_id INT,
UNIQUE (crdb_region, desk_id)
) LOCALITY REGIONAL BY ROW;
In this example, the schema guarantees that both id
and email
are globally unique, while desk_id
is only unique per region. There are three indexes in the table, all UNIQUE and all partitioned by crdb_region
. The indexes on id
and email
are implicitly partitioned, while the index on crdb_region, desk_id
is explicitly partitioned. UNIQUE indexes can only directly enforce uniqueness on all columns in the index, including partitioning columns, so each of these indexes only enforce uniqueness for id
, email
, and desk_id
per region. Therefore, the database needs to do additional work to enforce global uniqueness for id
and email
. This "additional work" is in the form of "uniqueness checks" that the optimizer adds as part of mutation queries. You can see the checks if you run EXPLAIN
on any mutation query affecting the table:
demo@127.0.0.1:26257/multi_region_test_db> EXPLAIN INSERT INTO employee VALUES (1, 'joe@example.com', 1);
info
--------------------------------------------------------------------------------------
distribution: local
vectorized: true
• root
│
├── • insert
│ │ into: employee(id, email, desk_id, crdb_region)
│ │
│ └── • buffer
│ │ label: buffer 1
│ │
│ └── • values
│ size: 5 columns, 1 row
│
├── • constraint-check
│ │
│ └── • error if rows
│ │
│ └── • lookup join (semi)
│ │ table: employee@primary
│ │ equality: (lookup_join_const_col_@15, column1) = (crdb_region,id)
│ │ equality cols are key
│ │ pred: column10 != crdb_region
│ │
│ └── • cross join
│ │ estimated row count: 3
│ │
│ ├── • values
│ │ size: 1 column, 3 rows
│ │
│ └── • scan buffer
│ label: buffer 1
│
└── • constraint-check
│
└── • error if rows
│
└── • lookup join (semi)
│ table: employee@employee_email_key
│ equality: (lookup_join_const_col_@25, column2) = (crdb_region,email)
│ equality cols are key
│ pred: (column1 != id) OR (column10 != crdb_region)
│
└── • cross join
│ estimated row count: 3
│
├── • values
│ size: 1 column, 3 rows
│
└── • scan buffer
label: buffer 1
(51 rows)
Time: 44ms total (execution 43ms / network 1ms)
For this insert query, the optimizer has added two "constraint-check" post queries to check the uniqueness of id
and email
. There is no check needed for crdb_region, desk_id
, since that constraint is automatically enforced by the index.
Here is an update example where we update the email column. We only need a uniqueness check for email
since we're not updating any other columns:
demo@127.0.0.1:26257/multi_region_test_db> EXPLAIN UPDATE employee SET email = 'joe1@exaple.com' WHERE id = 1;
info
--------------------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• root
│
├── • update
│ │ table: employee
│ │ set: email
│ │
│ └── • buffer
│ │ label: buffer 1
│ │
│ └── • render
│ │ estimated row count: 1
│ │
│ └── • union all
│ │ estimated row count: 1
│ │ limit: 1
│ │
│ ├── • scan
│ │ estimated row count: 1 (100% of the table; stats collected 1 minute ago)
│ │ table: employee@primary
│ │ spans: [/'us-east1'/1 - /'us-east1'/1]
│ │
│ └── • scan
│ estimated row count: 1 (100% of the table; stats collected 1 minute ago)
│ table: employee@primary
│ spans: [/'europe-west1'/1 - /'europe-west1'/1] [/'us-west1'/1 - /'us-west1'/1]
│
└── • constraint-check
│
└── • error if rows
│
└── • lookup join (semi)
│ table: employee@employee_email_key
│ equality: (lookup_join_const_col_@18, email_new) = (crdb_region,email)
│ equality cols are key
│ pred: (id != id) OR (crdb_region != crdb_region)
│
└── • cross join
│ estimated row count: 3
│
├── • values
│ size: 1 column, 3 rows
│
└── • scan buffer
label: buffer 1
(47 rows)
Time: 4ms total (execution 4ms / network 0ms)
If we only update desk_id
, no uniqueness checks are needed at all:
demo@127.0.0.1:26257/multi_region_test_db> EXPLAIN UPDATE employee SET desk_id = 2 WHERE id = 1;
info
------------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• update
│ table: employee
│ set: desk_id
│ auto commit
│
└── • render
│ estimated row count: 1
│
└── • union all
│ estimated row count: 1
│ limit: 1
│
├── • scan
│ estimated row count: 1 (100% of the table; stats collected 2 minutes ago)
│ table: employee@primary
│ spans: [/'us-east1'/1 - /'us-east1'/1]
│
└── • scan
estimated row count: 1 (100% of the table; stats collected 2 minutes ago)
table: employee@primary
spans: [/'europe-west1'/1 - /'europe-west1'/1] [/'us-west1'/1 - /'us-west1'/1]
(24 rows)
Time: 1ms total (execution 1ms / network 0ms)
Hopefully this example makes clear that whether or not to explicitly include crdb_region
in the index definition depends on the context: if you only need to enforce uniqueness at the region-level, then including crdb_region
in the UNIQUE index definition will enforce these semantics and allow you to get better performance on INSERTs, UPDATEs, and UPSERTs, since there won't be any added latency from uniqueness checks. However, if you need to enforce global uniqueness, you should not include crdb_region
in the UNIQUE (or PRIMARY KEY) index definition, and the database will automatically ensure that the constraint is enforced.
Let me know if this helps or if you'd like me to clarify anything. Thanks!
Reviewed 7 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
_includes/v21.1/sql/unique-indexes-regional-by-row.md, line 1 at r2 (raw file):
<span class="version-tag">New in v21.1:</span> In [multi-region deployments](multiregion-overview.html), most users should use [`REGIONAL BY ROW` tables](multiregion-overview.html#regional-by-row-tables) instead of explicit index [partitioning](partitioning.html). When you add an index to a `REGIONAL BY ROW` table, it is automatically partitioned on the [`crdb_region` column](set-locality.html#crdb_region). Explicit index partitioning is not required.
This sounds good to me, but it's no longer specific to unique indexes, so maybe consider changing the name of this page to just indexes-regional-by-row.md
?
v21.1/add-constraint.md, line 278 at r1 (raw file):
Is this ability to keep the query local for things that are known to be unique also known as "locality optimized search" ?
Yep, exactly! The new text looks good.
v21.1/add-constraint.md, line 239 at r2 (raw file):
This example assumes you have a simulated multi-region database running on your local machine following the steps described in [Low Latency Reads and Writes in a Multi-Region Cluster](demo-low-latency-multi-region-deployment.html). To show how the automatic partitioning of indexes on `REGIONAL BY ROW` tables works, we will:
nit: you've included "unique" in the title of this section, so you might want to make this a bit more verbose and say that this shows how a unique index is partitioned, but it's also similar to how all indexes are partitioned.
v21.1/add-constraint.md, line 278 at r2 (raw file):
~~~ Next, issue the [`SHOW PARTITIONS`](show-partitions.html) statement. The output below (which is edited for length) will verify that the unique index was automatically [partitioned](partitioning.html) for you. It shows that the `user_email_unique` index is now partitioned by the database regions `europe-west1`, `us-east1`, and `us-west1`. This ensures that the uniqueness constraint is enforced properly across regions when rows are inserted, or the `email` column of an existing row is updated.
nit: it's not the partitioning that ensures the constraint is enforced properly. It's actually sort of the opposite: Without the partitioning, the unique index would be able to enforce the constraint without any additional help from the optimizer. But with an implicitly partitioned unique index, the database needs to do additional work to enforce the constraint. As a result, when a new unique index is added to a REGIONAL BY ROW table, two things happen automatically:
- We run a one-time-only validation query to ensure that the existing data in the table satisfies the unique constraint.
- Thereafter, the optimizer will automatically add a "uniqueness check" when necessary to any INSERT, UPDATE, or UPSERT statement affecting the columns in the unique constraint.
v21.1/add-constraint.md, line 280 at r2 (raw file):
Next, issue the [`SHOW PARTITIONS`](show-partitions.html) statement. The output below (which is edited for length) will verify that the unique index was automatically [partitioned](partitioning.html) for you. It shows that the `user_email_unique` index is now partitioned by the database regions `europe-west1`, `us-east1`, and `us-west1`. This ensures that the uniqueness constraint is enforced properly across regions when rows are inserted, or the `email` column of an existing row is updated. There is also a performance benefit for queries that select a single email address (e.g., `SELECT * FROM users WHERE email = 'anemailaddress@gmail.com'`). If `'anemailaddress@gmail.com'` is found in the local region, there is no need to search remote regions. This feature, whereby the SQL engine will avoid sending requests to nodes in other regions when it can read a unique column that is stored locally, is known as _locality optimized search_.
super nit: "when it can read a unique column that is stored locally" isn't quite right, since the column spans all regions. Perhaps "when it can read a value from a unique column that is stored locally" would be more accurate.
v21.1/create-index.md, line 50 at r1 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
I see - per your comment above I guess it's any and all indexes, right?
Here is what I updated it with, it's a bit more than your suggestion, please let me know what you think. Also happy to keep it to "use REGIONAL BY ROW" if that's better than this.
"As of CockroachDB v21.1 and later, most users should use
REGIONAL BY ROW
tables. Indexes against regional by row tables are automatically partitioned, so explicit index partitioning is not required."
LGTM
e3515b4
to
3519c59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Becca! Also for the chat re: this stuff, and the long example in your comment. Super helpful!
In addition to addressing your specific comments below, I also added a new example on explicit vs. implicitly partitioned indexes which is an edited version of what you provided here. PTAL
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
_includes/v21.1/sql/unique-indexes-regional-by-row.md, line 1 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This sounds good to me, but it's no longer specific to unique indexes, so maybe consider changing the name of this page to just
indexes-regional-by-row.md
?
Updated!
v21.1/add-constraint.md, line 278 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is this ability to keep the query local for things that are known to be unique also known as "locality optimized search" ?
Yep, exactly! The new text looks good.
Yay thanks!
v21.1/add-constraint.md, line 239 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: you've included "unique" in the title of this section, so you might want to make this a bit more verbose and say that this shows how a unique index is partitioned, but it's also similar to how all indexes are partitioned.
Makes sense. I've added some words to that effect
v21.1/add-constraint.md, line 278 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: it's not the partitioning that ensures the constraint is enforced properly. It's actually sort of the opposite: Without the partitioning, the unique index would be able to enforce the constraint without any additional help from the optimizer. But with an implicitly partitioned unique index, the database needs to do additional work to enforce the constraint. As a result, when a new unique index is added to a REGIONAL BY ROW table, two things happen automatically:
- We run a one-time-only validation query to ensure that the existing data in the table satisfies the unique constraint.
- Thereafter, the optimizer will automatically add a "uniqueness check" when necessary to any INSERT, UPDATE, or UPSERT statement affecting the columns in the unique constraint.
Thanks - I reworked this section to hopefully capture this info. PTAL
v21.1/add-constraint.md, line 280 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
super nit: "when it can read a unique column that is stored locally" isn't quite right, since the column spans all regions. Perhaps "when it can read a value from a unique column that is stored locally" would be more accurate.
Super nit appreciated! Always good to use more correct language. Updated!
v21.1/create-index.md, line 50 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
LGTM
Sweet thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! I think the example looks great. Just one remaining nit below.
Reviewed 5 of 5 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
v21.1/add-constraint.md, line 302 at r3 (raw file):
### Using implicit vs. explicit index partitioning in `REGIONAL BY ROW` tables In `REGIONAL BY ROW` tables, all indexes are partitioned on the region column (usually called [`crdb_region`](set-locality.html#crdb_region), but it can also be [computed](computed-columns.html)).
nit: the name of the column and whether or not it is computed are orthogonal. The standard vanilla approach results in a non-computed hidden column named crdb_region
, but we allow for lots of flexibility. Here are some examples for different ways to create the partitioning column (sorry if you already knew all of this):
-- This creates a hidden `crdb_region` column that is not computed, but has a default
-- value of the gateway region. If you run SHOW CREATE TABLE, you'll see that the column
-- looks like this:
-- crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT
-- default_to_database_primary_region(gateway_region())::public.crdb_internal_region,
CREATE TABLE t (
id INT PRIMARY KEY
) LOCALITY REGIONAL BY ROW;
-- This creates a non-computed column with the same default value as the automatically-
-- created `crdb_region` column, but with a different name (note the "AS" after LOCALITY
-- REGIONAL BY ROW).
CREATE TABLE t (
id INT PRIMARY KEY,
cr crdb_internal_region NOT NULL DEFAULT
default_to_database_primary_region(gateway_region())::public.crdb_internal_region
) LOCALITY REGIONAL BY ROW AS "cr";
-- This creates a computed column with the name cr.
CREATE TABLE t (
id INT PRIMARY KEY,
cr crdb_internal_region AS (
CASE
WHEN id <= 10 THEN 'us-east-1'
ELSE 'ap-southeast-2'
END
) STORED
) LOCALITY REGIONAL BY ROW AS "cr";
-- This creates a computed column with the the default name of `crdb_region`,
-- so there's no need to identify it with "AS".
CREATE TABLE t (
id INT PRIMARY KEY,
crdb_region crdb_internal_region AS (
CASE
WHEN id <= 10 THEN 'us-east-1'
ELSE 'ap-southeast-2'
END
) STORED
) LOCALITY REGIONAL BY ROW;
It's also possible to specify in the schema that the column is NOT VISIBLE
, which means it won't show up when you run SELECT * FROM t
. This is an option whether or not the column is computed or has a different name. (This new NOT VISIBLE
syntax can also be used in regular, non-multi-region tables.)
Anyway, you definitely don't need to give all this information here, but I think it would be good to make sure it's documented somewhere. Also, as I mentioned when we chatted yesterday, we should explain that using a computed region column is one way to eliminate uniqueness checks and therefore reduce the latency of mutation operations. Another way to eliminate these checks is to use type UUID with default gen_random_uuid()
for one of the columns in the unique constraint (but if the user still wants checks in that case, there is a cluster setting to enable them called sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled
). Another way to eliminate uniqueness checks is to just use explicit partitioning (as your example now describes) if you don't actually need to guarantee global uniqueness.
None of this stuff needs to be included in the current PR (other than fixing your implication that computed columns have a different name), but I just wanted to write it down so you can use it for a future PR.
3519c59
to
be83d5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great review Becca! I think I have fixed the last error - PTAL
(Also filed some follow-up issues based on your comments below)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
v21.1/add-constraint.md, line 302 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: the name of the column and whether or not it is computed are orthogonal. The standard vanilla approach results in a non-computed hidden column named
crdb_region
, but we allow for lots of flexibility. Here are some examples for different ways to create the partitioning column (sorry if you already knew all of this):-- This creates a hidden `crdb_region` column that is not computed, but has a default -- value of the gateway region. If you run SHOW CREATE TABLE, you'll see that the column -- looks like this: -- crdb_region public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT -- default_to_database_primary_region(gateway_region())::public.crdb_internal_region, CREATE TABLE t ( id INT PRIMARY KEY ) LOCALITY REGIONAL BY ROW; -- This creates a non-computed column with the same default value as the automatically- -- created `crdb_region` column, but with a different name (note the "AS" after LOCALITY -- REGIONAL BY ROW). CREATE TABLE t ( id INT PRIMARY KEY, cr crdb_internal_region NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::public.crdb_internal_region ) LOCALITY REGIONAL BY ROW AS "cr"; -- This creates a computed column with the name cr. CREATE TABLE t ( id INT PRIMARY KEY, cr crdb_internal_region AS ( CASE WHEN id <= 10 THEN 'us-east-1' ELSE 'ap-southeast-2' END ) STORED ) LOCALITY REGIONAL BY ROW AS "cr"; -- This creates a computed column with the the default name of `crdb_region`, -- so there's no need to identify it with "AS". CREATE TABLE t ( id INT PRIMARY KEY, crdb_region crdb_internal_region AS ( CASE WHEN id <= 10 THEN 'us-east-1' ELSE 'ap-southeast-2' END ) STORED ) LOCALITY REGIONAL BY ROW;
It's also possible to specify in the schema that the column is
NOT VISIBLE
, which means it won't show up when you runSELECT * FROM t
. This is an option whether or not the column is computed or has a different name. (This newNOT VISIBLE
syntax can also be used in regular, non-multi-region tables.)Anyway, you definitely don't need to give all this information here, but I think it would be good to make sure it's documented somewhere. Also, as I mentioned when we chatted yesterday, we should explain that using a computed region column is one way to eliminate uniqueness checks and therefore reduce the latency of mutation operations. Another way to eliminate these checks is to use type UUID with default
gen_random_uuid()
for one of the columns in the unique constraint (but if the user still wants checks in that case, there is a cluster setting to enable them calledsql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled
). Another way to eliminate uniqueness checks is to just use explicit partitioning (as your example now describes) if you don't actually need to guarantee global uniqueness.None of this stuff needs to be included in the current PR (other than fixing your implication that computed columns have a different name), but I just wanted to write it down so you can use it for a future PR.
Thanks for this info Becca. I have removed the incorrect reference to computed columns vs. naming.
I have also filed the following issues (I definitely did not know all this lol):
- Update "Computed Columns" page with examples of new multi-region columns and syntax #10401 (update computed columns doc)
- Document how to avoid the cost of optimizer uniqueness checks in multi-region settings #10402 (how to avoid uniqueness checks)
- Document the non-computed ways to give the partitioned region column a different name #10403 (how to have a different name etc. for region column when it's not computed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for all your work on this!!
Reviewed 1 of 1 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rmloveland)
v21.1/add-constraint.md, line 302 at r3 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
Thanks for this info Becca. I have removed the incorrect reference to computed columns vs. naming.
I have also filed the following issues (I definitely did not know all this lol):
- Update "Computed Columns" page with examples of new multi-region columns and syntax #10401 (update computed columns doc)
- Document how to avoid the cost of optimizer uniqueness checks in multi-region settings #10402 (how to avoid uniqueness checks)
- Document the non-computed ways to give the partitioned region column a different name #10403 (how to have a different name etc. for region column when it's not computed)
Awesome -- thanks for filing these issues! And the new text here looks good.
I assume NOT VISIBLE
is already documented (or has an issue open to be documented) elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)
v21.1/add-constraint.md, line 302 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Awesome -- thanks for filing these issues! And the new text here looks good.
I assume
NOT VISIBLE
is already documented (or has an issue open to be documented) elsewhere?
Great, thanks!
Re: NOT VISIBLE
, I did not find any open docs issues, nor did grepping "not visible" suffice. So I opened #10407 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rmloveland and @rytaft)
v21.1/add-constraint.md, line 241 at r4 (raw file):
To show how the automatic partitioning of indexes on `REGIONAL BY ROW` tables works, we will: 1. [Add a column](add-column.html) to the `users` table in the [MovR dataset](movr.html).
nit: make this ordered list all start with 1.
v21.1/add-constraint.md, line 286 at r4 (raw file):
~~~ database_name | table_name | partition_name | column_names | index_name | partition_value |
Is the output edited for width length too (i.e, are these all the columns)? It's a little weird that the last column has the |
closing it. Normally when I edit a response for length, I include ...
to clearly show where it's been truncated, but that is just a suggestion!
v21.1/add-constraint.md, line 295 at r4 (raw file):
To ensure that the uniqueness constraint is enforced properly across regions when rows are inserted, or the `email` column of an existing row is updated, the database needs to do the following additional work when indexes are partitioned as shown above: 1. Run a one-time-only validation query to ensure that the existing data in the table satisfies the unique constraint.
nit: same comment re: ordered lists
v21.1/add-constraint.md, line 309 at r4 (raw file):
- If `crdb_region` is excluded from the index definition, that serves as a signal that CockroachDB should enforce uniqueness on only the columns in the index definition. In the latter case, the index alone cannot enforce uniqueness on columns that are not a prefix of the index columns, so any time rows are inserted or updated in a `REGIONAL BY ROW` table that has an implicitly partitioned `UNIQUE` index, the [optimizer](cost-based-optimizer.html) must add uniqueness checks.
Not sure if it's necessary, but might be nice to cross-link to INSERT / UPDATE docs
v21.1/add-constraint.md, line 313 at r4 (raw file):
Whether or not to explicitly include `crdb_region` in the index definition depends on the context: - If you only need to enforce uniqueness at the region level, then including `crdb_region` in the `UNIQUE` index definition will enforce these semantics and allow you to get better performance on `INSERT`s, `UPDATE`s, and `UPSERT`s, since there won't be any added latency from uniqueness checks.
Same comment here about cross-linking
_includes/v21.1/sql/indexes-regional-by-row.md, line 1 at r4 (raw file):
<span class="version-tag">New in v21.1:</span> In [multi-region deployments](multiregion-overview.html), most users should use [`REGIONAL BY ROW` tables](multiregion-overview.html#regional-by-row-tables) instead of explicit index [partitioning](partitioning.html). When you add an index to a `REGIONAL BY ROW` table, it is automatically partitioned on the [`crdb_region` column](set-locality.html#crdb_region). Explicit index partitioning is not required.
nit: double spaces (applies to other docs too) 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rmloveland)
v21.1/add-constraint.md, line 302 at r3 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
Great, thanks!
Re:
NOT VISIBLE
, I did not find any open docs issues, nor did grepping "not visible" suffice. So I opened #10407 as well
Thanks!
be83d5f
to
6f0e6c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Becca!
Thanks for the review Lauren! I think I updated all the things
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @lnhsingh and @rytaft)
v21.1/add-constraint.md, line 302 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Thanks!
👍
v21.1/add-constraint.md, line 241 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
nit: make this ordered list all start with
1.
Done!
v21.1/add-constraint.md, line 286 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
Is the output edited for width length too (i.e, are these all the columns)? It's a little weird that the last column has the
|
closing it. Normally when I edit a response for length, I include...
to clearly show where it's been truncated, but that is just a suggestion!
Good call - added some dots n stuff
v21.1/add-constraint.md, line 295 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
nit: same comment re: ordered lists
Done. I will eventually get with the program on this. (It is aesthetically displeasing tho)
v21.1/add-constraint.md, line 309 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
Not sure if it's necessary, but might be nice to cross-link to INSERT / UPDATE docs
Added links!
v21.1/add-constraint.md, line 313 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
Same comment here about cross-linking
Moar links
_includes/v21.1/sql/indexes-regional-by-row.md, line 1 at r4 (raw file):
Previously, lnhsingh (Lauren Hirata Singh) wrote…
nit: double spaces (applies to other docs too) 😬
Fixed, here and in the 'Add Constraint' page.
Fixes #9273. Fixes #9274. Fixes #9740. Fixes #9743. Fixes #9885. Fixes #10136. Relates to #9739. Summary of changes: - Update 'ADD CONSTRAINT' docs to show how to add unique indexes to REGIONAL BY ROW tables. - Also update 'ADD CONSTRAINT' with an example showing the differences in behavior between explicitly partitioned vs. implicitly partitioned indexes. - Update 'CREATE INDEX' docs to note the existence of automatically partitioned indexes, and mention that most users should probably use REGIONAL BY ROW tables and add indexes to them normally, instead of explicit partitioning. - Update 'Indexes' doc to mention: - Index partitioning in general, which was not mentioned on this page. - Automatic index partitioning that comes for free on REGIONAL BY ROW tables, which most users should use instead. - Update the 'Multi-Region Overview' page with a new section called 'Additional Features', where we list how indexes work on RBR tables as one of the new features (this will grow to add mention of other CockroachDB features that are designed to work with multi-region as we go) - Update the 'Unique Constraint' page to talk about indexes on RBR tables, as well as linking to the usage example on the 'ADD CONSTRAINT' page (described above).
6f0e6c9
to
8b0b4c7
Compare
Netlify Preview Built with commit 8b0b4c7 |
Fixes #9273.
Fixes #9274.
Fixes #9740.
Fixes #9743.
Fixes #9885.
Fixes #10136.
Relates to #9739.
Summary of changes:
Update 'ADD CONSTRAINT' docs to show how to add unique indexes to
REGIONAL BY ROW tables.
in behavior between explicitly partitioned vs. implicitly
partitioned indexes.
Update 'CREATE INDEX' docs to note the existence of automatically
partitioned indexes, and mention that most users should probably use
REGIONAL BY ROW tables and add indexes to them normally, instead of
explicit partitioning.
Update 'Indexes' doc to mention:
Index partitioning in general, which was not mentioned on this page.
Automatic index partitioning that comes for free on REGIONAL BY ROW
tables, which most users should use instead.
Update the 'Multi-Region Overview' page with a new section called
'Additional Features', where we list how indexes work on RBR tables as
one of the new features (this will grow to add mention of other
CockroachDB features that are designed to work with multi-region as we
go)
Update the 'Unique Constraint' page to talk about indexes on RBR
tables, as well as linking to the usage example on the 'ADD
CONSTRAINT' page (described above).