title | summary | toc | redirect_from | key |
---|---|---|---|---|
Selection Queries |
Selection Queries can read and process data. |
true |
selection-clauses.html |
selection-clauses.html |
Selection queries read and process data in CockroachDB. They are more
general than simple SELECT
clauses: they can
group one or more selection clauses with set
operations and can request a specific
ordering or row limit.
Selection queries can occur:
- At the top level of a query like other SQL statements.
- Between parentheses as a subquery.
- As operand to other statements that take tabular data as input, for example
INSERT
,UPSERT
,CREATE TABLE AS
orALTER ... SPLIT AT
.
Parameter | Description |
---|---|
common_table_expr |
See Common Table Expressions. |
select_clause |
A valid selection clause, either simple or using set operations. |
sort_clause |
An optional ORDER BY clause. See Ordering Query Results for details. |
limit_clause |
An optional LIMIT clause. See Limiting Query Results for details. |
offset_clause |
An optional OFFSET clause. See Limiting Query Results for details. |
The optional LIMIT
and OFFSET
clauses can appear in any order, but must appear after ORDER BY
, if also present.
{{site.data.alerts.callout_info}}Because the WITH
, ORDER BY
, LIMIT
and OFFSET
sub-clauses are all optional, any simple selection clause is also a valid selection query.{{site.data.alerts.end}}
Selection clauses are the main component of a selection query. They define tabular data. There are four specific syntax forms collectively named selection clauses:
Form | Usage |
---|---|
SELECT |
Load or compute tabular data from various sources. This is the most common selection clause. |
VALUES |
List tabular data by the client. |
TABLE |
Load tabular data from the database. |
Set Operations | Combine tabular data from two or more selection clauses. |
{{site.data.alerts.callout_info}}To perform joins or other relational operations over selection clauses, use a table expression and convert it back into a selection clause with TABLE
or SELECT
.{{site.data.alerts.end}}
A VALUES
clause defines tabular data defined by the expressions
listed within parentheses. Each parenthesis group defines a single row
in the resulting table.
The columns of the resulting table data have automatically generated
names. These names can be modified with
AS
when the
VALUES
clause is used as a sub-query.
{% include copy-clipboard.html %}
> VALUES (1, 2, 3), (4, 5, 6);
+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+---------+---------+---------+
A TABLE
clause reads tabular data from a specified table. The
columns of the resulting table data are named after the schema of the
table.
In general, TABLE x
is equivalent to SELECT * FROM x
, but it is
shorter to type.
{{site.data.alerts.callout_info}}Any table expression between parentheses is a valid operand for TABLE
, not just simple table or view names.{{site.data.alerts.end}}
{% include copy-clipboard.html %}
> CREATE TABLE employee_copy AS TABLE employee;
This statement copies the content from table employee
into a new
table. However, note that the TABLE
clause does not preserve the indexing,
foreign key, or constraint and default information from the schema of the
table it reads from, so in this example, the new table employee_copy
will likely have a simpler schema than employee
.
Other examples:
{% include copy-clipboard.html %}
> TABLE employee;
{% include copy-clipboard.html %}
> INSERT INTO employee_copy TABLE employee;
See Simple SELECT
Clause for more
details.
Set operations combine data from two selection clauses. They are valid as operand to other set operations or as main component in a selection query.
SQL lets you compare the results of multiple selection clauses. You can think of each of the set operators as representing a Boolean operator:
UNION
= ORINTERSECT
= ANDEXCEPT
= NOT
By default, each of these comparisons displays only one copy of each value (similar to SELECT DISTINCT
). However, each function also lets you add an ALL
to the clause to display duplicate values.
UNION
combines the results of two queries into one result.
{% include copy-clipboard.html %}
> SELECT name
FROM accounts
WHERE state_opened IN ('AZ', 'NY')
UNION
SELECT name
FROM mortgages
WHERE state_opened IN ('AZ', 'NY');
+-----------------+
| name |
+-----------------+
| Naseem Joossens |
| Ricarda Caron |
| Carola Dahl |
| Aygün Sanna |
+-----------------+
To show duplicate rows, you can use ALL
.
{% include copy-clipboard.html %}
> SELECT name
FROM accounts
WHERE state_opened IN ('AZ', 'NY')
UNION ALL
SELECT name
FROM mortgages
WHERE state_opened IN ('AZ', 'NY');
+-----------------+
| name |
+-----------------+
| Naseem Joossens |
| Ricarda Caron |
| Carola Dahl |
| Naseem Joossens |
| Aygün Sanna |
| Carola Dahl |
+-----------------+
INTERSECT
finds only values that are present in both query operands.
{% include copy-clipboard.html %}
> SELECT name
FROM accounts
WHERE state_opened IN ('NJ', 'VA')
INTERSECT
SELECT name
FROM mortgages;
+-----------------+
| name |
+-----------------+
| Danijel Whinery |
| Agar Archer |
+-----------------+
EXCEPT
finds values that are present in the first query operand but not the second.
{% include copy-clipboard.html %}
> SELECT name
FROM mortgages
EXCEPT
SELECT name
FROM accounts;
+------------------+
| name |
+------------------+
| Günay García |
| Karla Goddard |
| Cybele Seaver |
+------------------+
The following sections provide examples. For more details, see Ordering Query Results.
{% include copy-clipboard.html %}
> SELECT *
FROM accounts
WHERE balance BETWEEN 350 AND 500
ORDER BY balance DESC;
+----+--------------------+---------+----------+--------------+
| id | name | balance | type | state_opened |
+----+--------------------+---------+----------+--------------+
| 12 | Raniya Žitnik | 500 | savings | CT |
| 59 | Annibale Karga | 500 | savings | ND |
| 27 | Adelbert Ventura | 500 | checking | IA |
| 86 | Theresa Slaski | 500 | checking | WY |
| 73 | Ruadh Draganov | 500 | checking | TN |
| 16 | Virginia Ruan | 400 | checking | HI |
| 43 | Tahirih Malinowski | 400 | checking | MS |
| 50 | Dusan Mallory | 350 | savings | NV |
+----+--------------------+---------+----------+--------------+
Columns are sorted in the order you list them in sortby_list
. For example, ORDER BY a, b
sorts the rows by column a
and then sorts rows with the same a
value by their column b
values.
{% include copy-clipboard.html %}
> SELECT *
FROM accounts
WHERE balance BETWEEN 350 AND 500
ORDER BY balance DESC, name ASC;
+----+--------------------+---------+----------+--------------+
| id | name | balance | type | state_opened |
+----+--------------------+---------+----------+--------------+
| 27 | Adelbert Ventura | 500 | checking | IA |
| 59 | Annibale Karga | 500 | savings | ND |
| 12 | Raniya Žitnik | 500 | savings | CT |
| 73 | Ruadh Draganov | 500 | checking | TN |
| 86 | Theresa Slaski | 500 | checking | WY |
| 43 | Tahirih Malinowski | 400 | checking | MS |
| 16 | Virginia Ruan | 400 | checking | HI |
| 50 | Dusan Mallory | 350 | savings | NV |
+----+--------------------+---------+----------+--------------+
The following sections provide examples. For more details, see Limiting Query Results.
You can reduce the number of results with LIMIT
.
{% include copy-clipboard.html %}
> SELECT id, name
FROM accounts
LIMIT 5;
+----+------------------+
| id | name |
+----+------------------+
| 1 | Bjorn Fairclough |
| 2 | Bjorn Fairclough |
| 3 | Arturo Nevin |
| 4 | Arturo Nevin |
| 5 | Naseem Joossens |
+----+------------------+
To iterate through a table one "page" of results at a time (also known as pagination) there are two options, only one of which is recommended:
- Keyset pagination (fast, recommended)
LIMIT
/OFFSET
pagination (slow, not recommended)
Keyset pagination (also known as the "seek method") is used to fetch a subset of records from a table quickly. It does this by restricting the set of records returned with a combination of WHERE
and LIMIT
clauses. To get the next page, you check the value of the column in the WHERE
clause against the last row returned in the previous page of results.
The general pattern for keyset pagination queries is:
{% include copy-clipboard.html %}
SELECT * FROM t
WHERE key > ${value}
ORDER BY key
LIMIT ${amount}
This is faster than using LIMIT
/OFFSET
because, instead of doing a full table scan up to the value of the OFFSET
, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the WHERE
clause to implement the pagination is indexed and unique. A primary key meets both of these criteria.
The examples in this section use the employees data set, which you can load into CockroachDB as follows:
{% include copy-clipboard.html %}
CREATE DATABASE IF NOT EXISTS employees;
USE employees;
IMPORT PGDUMP 'https://s3-us-west-1.amazonaws.com/cockroachdb-movr/datasets/employees-db/pg_dump/employees-full.sql.gz';
To get the first page of results using keyset pagination, run:
{% include copy-clipboard.html %}
SELECT * FROM employees WHERE emp_no > 10000 LIMIT 25;
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+-------------+--------+---------------------------+
10001 | 1953-09-02 00:00:00+00:00 | Georgi | Facello | M | 1986-06-26 00:00:00+00:00
10002 | 1964-06-02 00:00:00+00:00 | Bezalel | Simmel | F | 1985-11-21 00:00:00+00:00
10003 | 1959-12-03 00:00:00+00:00 | Parto | Bamford | M | 1986-08-28 00:00:00+00:00
... snip
(25 rows)
Time: 1.31ms
To get the second page of results, run:
{% include copy-clipboard.html %}
SELECT * FROM employees WHERE emp_no > 10025 LIMIT 25;
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+------------+--------+---------------------------+
10026 | 1953-04-03 00:00:00+00:00 | Yongqiao | Berztiss | M | 1995-03-20 00:00:00+00:00
10027 | 1962-07-10 00:00:00+00:00 | Divier | Reistad | F | 1989-07-07 00:00:00+00:00
10028 | 1963-11-26 00:00:00+00:00 | Domenick | Tempesti | M | 1991-10-22 00:00:00+00:00
... snip!
(25 rows)
Time: 1.473ms
To get an arbitrary page of results showing employees whose IDs (emp_no
) are in a much higher range, try the following query. Note that it takes about the same amount of time to run as the previous queries.
{% include copy-clipboard.html %}
SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!
(25 rows)
Time: 1.319ms
Compare the execution speed of the previous keyset pagination queries with the query below that uses LIMIT
/ OFFSET
to get the same page of results:
{% include copy-clipboard.html %}
SELECT * FROM employees LIMIT 25 OFFSET 200024;
emp_no | birth_date | first_name | last_name | gender | hire_date
+--------+---------------------------+------------+--------------+--------+---------------------------+
400000 | 1963-11-29 00:00:00+00:00 | Mitsuyuki | Reinhart | M | 1985-08-27 00:00:00+00:00
400001 | 1962-06-02 00:00:00+00:00 | Rosalie | Chinin | M | 1986-11-28 00:00:00+00:00
400002 | 1964-08-16 00:00:00+00:00 | Quingbo | Birnbaum | F | 1986-04-23 00:00:00+00:00
... snip!
(25 rows)
Time: 118.114ms
The query using LIMIT
/OFFSET
for pagination is almost 100 times slower. To see why, let's use EXPLAIN
.
{% include copy-clipboard.html %}
EXPLAIN SELECT * FROM employees LIMIT 25 OFFSET 200024;
tree | field | description
+-----------+-------------+-------------------+
| distributed | true
| vectorized | true
limit | |
│ | offset | 200024
└── scan | |
| table | employees@primary
| spans | ALL
| limit | 200049
The culprit is this: because we used LIMIT
/OFFSET
, we are performing a full table scan (see spans
= ALL
above) from the first record all the way up to the value of the offset. In other words, we are iterating over a big array of rows from 1 to n, where n is 200049 (complexity O(n)
).
Meanwhile, the keyset pagination queries are looking at a much smaller range of table spans, which is much faster (see spans
= 300026-
+ 25 below). Because there is an index on every column in the WHERE
clause, these queries are doing an O(1)
array lookup on the index to jump to the start of the page of results, and then getting an additional 25 rows from there. This is much faster.
{% include copy-clipboard.html %}
EXPLAIN SELECT * FROM employees WHERE emp_no > 300025 LIMIT 25;
tree | field | description
+------+-------------+-------------------+
| distributed | false
| vectorized | false
scan | |
| table | employees@primary
| spans | /300026-
| limit | 25
{{site.data.alerts.callout_danger}}
Using an ordered (i.e., non-UUID) primary key creates hot spots in the database for write-heavy workloads, since concurrent INSERT
s to the table will attempt to write to the same (or nearby) underlying ranges. This can be mitigated by designing your schema with multi-column primary keys which include a monotonically increasing column.
{{site.data.alerts.end}}
{{site.data.alerts.callout_info}}
CockroachDB does not implement cursors. For a scale-out system like CockroachDB, using a cursor would not be recommended for the same reason that paginating with LIMIT
/OFFSET
is not recommended: it forces the server to keep track of state, which means the pagination queries don't scale well.
{{site.data.alerts.end}}
Selection clauses are defined in the context of selection queries. Table expressions are defined in the context of the FROM
sub-clause of SELECT
. Nevertheless, they can be integrated with one another to form more complex queries or statements.
Any selection clause can be used as a selection query with no change.
For example, the construct SELECT * FROM accounts
is a selection clause. It is also a valid selection query, and thus can be used as a stand-alone statement by appending a semicolon:
{% include copy-clipboard.html %}
> SELECT * FROM accounts;
+----+-----------------------+---------+----------+--------------+
| id | name | balance | type | state_opened |
+----+-----------------------+---------+----------+--------------+
| 1 | Bjorn Fairclough | 1200 | checking | AL |
| 2 | Bjorn Fairclough | 2500 | savings | AL |
| 3 | Arturo Nevin | 250 | checking | AK |
[ truncated ]
+----+-----------------------+---------+----------+--------------+
Likewise, the construct VALUES (1), (2), (3)
is also a selection
clause and thus can also be used as a selection query on its own:
{% include copy-clipboard.html %}
> VALUES (1), (2), (3);
+---------+
| column1 |
+---------+
| 1 |
| 2 |
| 3 |
+---------+
(3 rows)
Any table expression can be used as a selection clause (and thus also a selection query) by prefixing it with TABLE
or by using it as an operand to SELECT * FROM
.
For example, the simple table name customers
is a table expression, which designates all rows in that table. The expressions TABLE accounts
and SELECT * FROM accounts
are valid selection clauses.
Likewise, the SQL join expression customers c JOIN orders o ON c.id = o.customer_id
is a table expression. You can turn it into a valid selection clause, and thus a valid selection query as follows:
{% include copy-clipboard.html %}
> TABLE (customers c JOIN orders o ON c.id = o.customer_id);
{% include copy-clipboard.html %}
> SELECT * FROM customers c JOIN orders o ON c.id = o.customer_id;
Any selection query (or selection clause) can be used as a table expression by enclosing it between parentheses, which forms a subquery.
For example, the following construct is a selection query, but is not a valid table expression:
{% include copy-clipboard.html %}
> SELECT * FROM customers ORDER BY name LIMIT 5
To make it valid as operand to FROM
or another table expression, you can enclose it between parentheses as follows:
{% include copy-clipboard.html %}
> SELECT id FROM (SELECT * FROM customers ORDER BY name LIMIT 5);
{% include copy-clipboard.html %}
> SELECT o.id
FROM orders o
JOIN (SELECT * FROM customers ORDER BY name LIMIT 5) AS c
ON o.customer_id = c.id;
Selection queries are also valid as operand in contexts that require tabular data.
For example:
Statement | Example using SELECT |
Example using VALUES |
Example using TABLE |
---|---|---|---|
INSERT |
INSERT INTO foo SELECT * FROM bar |
INSERT INTO foo VALUES (1), (2), (3) |
INSERT INTO foo TABLE bar |
UPSERT |
UPSERT INTO foo SELECT * FROM bar |
UPSERT INTO foo VALUES (1), (2), (3) |
UPSERT INTO foo TABLE bar |
CREATE TABLE AS |
CREATE TABLE foo AS SELECT * FROM bar CREATE TABLE foo AS VALUES (1),(2),(3) |
CREATE TABLE foo AS TABLE bar |
|
ALTER ... SPLIT AT |
ALTER TABLE foo SPLIT AT SELECT * FROM bar ALTER TABLE foo SPLIT AT VALUES (1),(2),(3) |
ALTER TABLE foo SPLIT AT TABLE bar |
|
Subquery in a table expression | SELECT * FROM (SELECT * FROM bar) |
SELECT * FROM (VALUES (1),(2),(3)) |
SELECT * FROM (TABLE bar) |
Subquery in a scalar expression | SELECT * FROM foo WHERE x IN (SELECT * FROM bar) |
SELECT * FROM foo WHERE x IN (VALUES (1),(2),(3)) |
SELECT * FROM foo WHERE x IN (TABLE bar) |