Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-column statistics #8091

Merged
merged 1 commit into from
Aug 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _includes/v19.2/misc/delete-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ To delete statistics for all tables in all databases:
> DELETE FROM system.table_statistics WHERE true;
~~~

To delete a named set of statistics (e.g, one named "my_stats"), run a query like the following:
To delete a named set of statistics (e.g, one named "users_stats"), run a query like the following:

{% include copy-clipboard.html %}
~~~ sql
> DELETE FROM system.table_statistics WHERE name = 'my_stats';
> DELETE FROM system.table_statistics WHERE name = 'users_stats';
~~~

After deleting statistics, restart the nodes in your cluster to clear the statistics caches.
Expand Down
4 changes: 2 additions & 2 deletions _includes/v20.1/misc/delete-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ To delete statistics for all tables in all databases:
> DELETE FROM system.table_statistics WHERE true;
~~~

To delete a named set of statistics (e.g, one named "my_stats"), run a query like the following:
To delete a named set of statistics (e.g, one named "users_stats"), run a query like the following:

{% include copy-clipboard.html %}
~~~ sql
> DELETE FROM system.table_statistics WHERE name = 'my_stats';
> DELETE FROM system.table_statistics WHERE name = 'users_stats';
~~~

After deleting statistics, restart the nodes in your cluster to clear the statistics caches.
Expand Down
4 changes: 2 additions & 2 deletions _includes/v20.2/misc/delete-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ To delete statistics for all tables in all databases:
> DELETE FROM system.table_statistics WHERE true;
~~~

To delete a named set of statistics (e.g, one named "my_stats"), run a query like the following:
To delete a named set of statistics (e.g, one named "users_stats"), run a query like the following:

{% include copy-clipboard.html %}
~~~ sql
> DELETE FROM system.table_statistics WHERE name = 'my_stats';
> DELETE FROM system.table_statistics WHERE name = 'users_stats';
~~~

After deleting statistics, restart the nodes in your cluster to clear the statistics caches.
Expand Down
99 changes: 68 additions & 31 deletions v19.2/create-statistics.md

Large diffs are not rendered by default.

24 changes: 7 additions & 17 deletions v19.2/show-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ toc: true
---
The `SHOW STATISTICS` [statement](sql-statements.html) lists [table statistics](create-statistics.html) used by the [cost-based optimizer](cost-based-optimizer.html).

{{site.data.alerts.callout_info}}
[By default, CockroachDB automatically generates statistics](cost-based-optimizer.html#table-statistics) on all indexed columns, and up to 100 non-indexed columns.
{{site.data.alerts.end}}

## Synopsis

<div>
Expand All @@ -23,27 +27,13 @@ Parameter | Description

## Examples

### List table statistics

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS students ON id FROM students_by_list;
~~~
{% include {{page.version.version}}/sql/movr-statements.md %}

~~~
CREATE STATISTICS
~~~
### List table statistics

{% include copy-clipboard.html %}
~~~ sql
> SHOW STATISTICS FOR TABLE students_by_list;
~~~

~~~
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
+-----------------+--------------+----------------------------------+-----------+----------------+------------+--------------+
students | {"id"} | 2018-10-26 15:06:34.320165+00:00 | 0 | 0 | 0 | NULL
(1 row)
> SHOW STATISTICS FOR TABLE rides;
~~~

### Delete statistics
Expand Down
99 changes: 68 additions & 31 deletions v20.1/create-statistics.md

Large diffs are not rendered by default.

34 changes: 20 additions & 14 deletions v20.1/show-statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ toc: true
---
The `SHOW STATISTICS` [statement](sql-statements.html) lists [table statistics](create-statistics.html) used by the [cost-based optimizer](cost-based-optimizer.html).

{{site.data.alerts.callout_info}}
[By default, CockroachDB automatically generates statistics](cost-based-optimizer.html#table-statistics) on all indexed columns, and up to 100 non-indexed columns.
{{site.data.alerts.end}}

## Synopsis

<div>
Expand All @@ -23,27 +27,29 @@ Parameter | Description

## Examples

### List table statistics
{% include {{page.version.version}}/sql/movr-statements.md %}

{% include copy-clipboard.html %}
~~~ sql
> CREATE STATISTICS students ON id FROM students_by_list;
~~~

~~~
CREATE STATISTICS
~~~
### List table statistics

{% include copy-clipboard.html %}
~~~ sql
> SHOW STATISTICS FOR TABLE students_by_list;
> SHOW STATISTICS FOR TABLE rides;
~~~

~~~
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
+-----------------+--------------+----------------------------------+-----------+----------------+------------+--------------+
students | {"id"} | 2018-10-26 15:06:34.320165+00:00 | 0 | 0 | 0 | NULL
(1 row)
statistics_name | column_names | created | row_count | distinct_count | null_count | histogram_id
------------------+-----------------+----------------------------------+-----------+----------------+------------+---------------------
__auto__ | {city} | 2020-08-26 17:17:13.852138+00:00 | 500 | 9 | 0 | 584554361172525057
__auto__ | {vehicle_city} | 2020-08-26 17:17:13.852138+00:00 | 500 | 9 | 0 | 584554361179242497
__auto__ | {id} | 2020-08-26 17:17:13.852138+00:00 | 500 | 500 | 0 | NULL
__auto__ | {rider_id} | 2020-08-26 17:17:13.852138+00:00 | 500 | 50 | 0 | NULL
__auto__ | {vehicle_id} | 2020-08-26 17:17:13.852138+00:00 | 500 | 15 | 0 | NULL
__auto__ | {start_address} | 2020-08-26 17:17:13.852138+00:00 | 500 | 500 | 0 | NULL
__auto__ | {end_address} | 2020-08-26 17:17:13.852138+00:00 | 500 | 500 | 0 | NULL
__auto__ | {start_time} | 2020-08-26 17:17:13.852138+00:00 | 500 | 30 | 0 | NULL
__auto__ | {end_time} | 2020-08-26 17:17:13.852138+00:00 | 500 | 367 | 0 | NULL
__auto__ | {revenue} | 2020-08-26 17:17:13.852138+00:00 | 500 | 100 | 0 | NULL
(10 rows)
~~~

### Delete statistics
Expand Down
8 changes: 7 additions & 1 deletion v20.2/cost-based-optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@ The most important factor in determining the quality of a plan is cardinality (i

The cost-based optimizer can often find more performant query plans if it has access to statistical data on the contents of your tables. This data needs to be generated from scratch for new tables, and regenerated periodically for existing tables.

By default, CockroachDB generates table statistics automatically when tables are [created](create-table.html), and as they are [updated](update.html). It does this [using a background job](create-statistics.html#view-statistics-jobs) that automatically determines which columns to get statistics on &mdash; specifically, it chooses:
By default, CockroachDB automatically generates table statistics when tables are [created](create-table.html), and as they are [updated](update.html). It does this [using a background job](create-statistics.html#view-statistics-jobs) that automatically determines which columns to get statistics on &mdash; specifically, it chooses:

- Columns that are part of the primary key or an index (in other words, all indexed columns).
- Up to 100 non-indexed columns.

<span class="version-tag">New in v20.2:</span> By default, CockroachDB also automatically collects [multi-column statistics](create-statistics.html#create-statistics-on-multiple-columns) on columns that prefix an index.

{{site.data.alerts.callout_info}}
[Schema changes](online-schema-changes.html) trigger automatic statistics collection for the affected table(s).
{{site.data.alerts.end}}
Expand Down Expand Up @@ -80,6 +82,10 @@ For instructions showing how to manually generate statistics, see the examples i

By default, the optimizer collects histograms for all index columns (specifically the first column in each index) during automatic statistics collection. If a single column statistic is explicitly requested using manual invocation of [`CREATE STATISTICS`](create-statistics.html), a histogram will be collected, regardless of whether or not the column is part of an index.

{{site.data.alerts.callout_info}}
CockroachDB does not support multi-column histograms yet. See [tracking issue](https://github.com/cockroachdb/cockroach/issues/49698).
{{site.data.alerts.end}}

If you are an advanced user and need to disable histogram collection for troubleshooting or performance tuning reasons, change the [`sql.stats.histogram_collection.enabled` cluster setting](cluster-settings.html) by running [`SET CLUSTER SETTING`](set-cluster-setting.html) as follows:

{% include copy-clipboard.html %}
Expand Down
Loading