Skip to content

Commit

Permalink
opt/rowexec: support range lookup joins on input columns
Browse files Browse the repository at this point in the history
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing.

Condition (3) is satisfied when the type of the column is one of `IntFamily`,
`OidFamily`, `UuidFamily` or `BoolFamily`. It is necessary because when the
index column is `DESC`, the `idxCol < inputCol` filter will be used in
forming the start key of each span. The spans are expected to be inclusive,
so the value of inputCol will have to be decremented to the value that orders
immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Informs cockroachdb#51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
  • Loading branch information
DrewKimball committed Aug 4, 2022
1 parent 0fbf5f0 commit 4bfa330
Show file tree
Hide file tree
Showing 16 changed files with 1,272 additions and 341 deletions.
205 changes: 205 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/lookup_join
Original file line number Diff line number Diff line change
Expand Up @@ -916,3 +916,208 @@ x y z u v w
2 1 5 2 1 5
2 1 6 2 1 4
2 1 6 2 1 5

# Test inequality lookup joins.
# Case with idxCol <= inputCol.
query IIIIII
SELECT a, b, c, d, e, f FROM abc INNER LOOKUP JOIN def ON f <= a ORDER BY a, b, c, d, e, f
----
1 1 2 NULL 2 1
1 1 2 2 1 1
2 NULL 2 NULL 2 1
2 NULL 2 1 1 2
2 NULL 2 2 1 1
2 1 1 NULL 2 1
2 1 1 1 1 2
2 1 1 2 1 1

# Case with idxCol >= inputCol (same output as last test).
query IIIIII
SELECT a, b, c, d, e, f FROM def INNER LOOKUP JOIN abc ON a >= f ORDER BY a, b, c, d, e, f
----
1 1 2 NULL 2 1
1 1 2 2 1 1
2 NULL 2 NULL 2 1
2 NULL 2 1 1 2
2 NULL 2 2 1 1
2 1 1 NULL 2 1
2 1 1 1 1 2
2 1 1 2 1 1

# Case with idxCol < inputCol.
query IIIIII
SELECT a, b, c, d, e, f FROM abc INNER LOOKUP JOIN def ON f < a ORDER BY a, b, c, d, e, f
----
2 NULL 2 NULL 2 1
2 NULL 2 2 1 1
2 1 1 NULL 2 1
2 1 1 2 1 1

# Case with idxCol > inputCol (same output as last test).
query IIIIII
SELECT a, b, c, d, e, f FROM def INNER LOOKUP JOIN abc ON a > f ORDER BY a, b, c, d, e, f
----
2 NULL 2 NULL 2 1
2 NULL 2 2 1 1
2 1 1 NULL 2 1
2 1 1 2 1 1

# Case where input column used as bound has null.
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a >= d
----
1 1 2 1 1 2
1 1 2 2 1 1
1 1 2 2 NULL 2
2 1 1 2 1 1
2 1 1 2 NULL 2

# Case where input column used as bound has null (idxCol > inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a > d
----
1 1 2 2 1 1
1 1 2 2 NULL 2

# Case where input column used as bound has null (idxCol >= inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a >= d
----
1 1 2 1 1 2
1 1 2 2 1 1
1 1 2 2 NULL 2
2 1 1 2 1 1
2 1 1 2 NULL 2

# Case where input column used as bound has null (idxCol < inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a < d
----
2 1 1 1 1 2

# Case where input column used as bound has null (idxCol <= inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a <= d
----
2 1 1 1 1 2
2 1 1 2 1 1
2 1 1 2 NULL 2
1 1 2 1 1 2

# Case with two inequalities.
query IIIIII
SELECT a, b, c, d, e, f FROM abc INNER LOOKUP JOIN def ON f < a AND f >= b ORDER BY a, b, c, d, e, f
----
2 1 1 NULL 2 1
2 1 1 2 1 1

# Case with two inequalities (same output as last test).
query IIIIII
SELECT a, b, c, d, e, f FROM def INNER LOOKUP JOIN abc ON f < a AND f >= b ORDER BY a, b, c, d, e, f
----
2 1 1 NULL 2 1
2 1 1 2 1 1

# Case with two inequalities, one is a constant.
query IIIIII rowsort
SELECT * FROM abc INNER LOOKUP JOIN def ON f < 2 AND f >= b
----
1 1 2 2 1 1
2 1 1 2 1 1
1 1 2 NULL 2 1
2 1 1 NULL 2 1

# Case with two inequalities, one is a constant.
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN abc ON a < 2 AND a >= d
----
1 1 2 1 1 2

# Case with equality prefix.
query IIIIII rowsort
SELECT * FROM abc INNER LOOKUP JOIN def ON f = c AND e >= b
----
2 1 1 2 1 1
2 1 1 NULL 2 1
1 1 2 1 1 2

# Case with equality prefix.
query IIIIII
SELECT a, b, c, d, e, f FROM abc INNER LOOKUP JOIN def ON f = a AND e >= c ORDER BY a, b, c, d, e, f
----
1 1 2 NULL 2 1
2 1 1 1 1 2

# Case with equality prefix (same output as last test).
query IIIIII
SELECT a, b, c, d, e, f FROM def INNER LOOKUP JOIN abc ON f = a AND e >= c ORDER BY a, b, c, d, e, f
----
1 1 2 NULL 2 1
2 1 1 1 1 2

# Case with descending index column (idxCol < inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN def_e_desc AS def2 ON def.f = def2.f AND def2.e < def.d
----
2 1 1 2 1 1

# Case with descending index column (idxCol <= inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN def_e_desc AS def2
ON def.f = def2.f AND def2.e <= def.d ORDER BY def.d, def.e, def.f, def2.d, def2.e, def2.f
----
1 1 2 1 1 2
2 1 1 NULL 2 1
2 1 1 2 1 1

# Case with descending index column (idxCol > inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN def_e_desc AS def2 ON def.f = def2.f AND def2.e > def.d
----

# Case with descending index column (idxCol >= inputCol).
query IIIIII rowsort
SELECT * FROM def INNER LOOKUP JOIN def_e_desc AS def2 ON def.f = def2.f AND def2.e >= def.d
----
2 1 1 NULL 2 1
1 1 2 1 1 2

# Case with maximum and minimum integer bounds.
query IIII rowsort
SELECT * FROM (SELECT * FROM (VALUES (-9223372036854775807::INT), (9223372036854775807::INT))) v(x)
LEFT LOOKUP JOIN abc ON a < x
----
9223372036854775807 1 1 2
9223372036854775807 2 1 1
9223372036854775807 2 NULL 2
-9223372036854775807 NULL NULL NULL

# Case with maximum and minimum integer bounds.
query IIII rowsort
SELECT * FROM (SELECT * FROM (VALUES (-9223372036854775807::INT), (9223372036854775807::INT))) v(x)
LEFT LOOKUP JOIN abc ON a > x
----
-9223372036854775807 1 1 2
-9223372036854775807 2 1 1
-9223372036854775807 2 NULL 2
9223372036854775807 NULL NULL NULL

# Case with maximum and minimum integer bounds on descending column.
query IIII rowsort
SELECT * FROM (SELECT * FROM (VALUES (-9223372036854775807::INT), (9223372036854775807::INT))) v(x)
LEFT LOOKUP JOIN def_e_desc ON f IN (1, 2) AND e < x
----
9223372036854775807 NULL 2 1
9223372036854775807 2 1 1
9223372036854775807 1 1 2
-9223372036854775807 NULL NULL NULL

# Case with maximum and minimum integer bounds on descending column.
query IIII rowsort
SELECT * FROM (SELECT * FROM (VALUES (-9223372036854775807::INT), (9223372036854775807::INT))) v(x)
LEFT LOOKUP JOIN def_e_desc ON f IN (1, 2) AND e > x
----
-9223372036854775807 NULL 2 1
-9223372036854775807 2 1 1
-9223372036854775807 1 1 2
9223372036854775807 NULL NULL NULL
6 changes: 3 additions & 3 deletions pkg/sql/logictest/testdata/logic_test/lookup_join_spans
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ WHERE
name='cpu'
ORDER BY value
----
2 2020-01-01 00:01:01 +0000 UTC -11 4 2 1 cpu
2 2020-01-01 00:01:01 +0000 UTC -11 4 2 1 cpu

# Test NULL values in <= unbounded lookup span.
query ITIIIIT
Expand All @@ -452,8 +452,8 @@ WHERE
name='cpu'
ORDER BY value
----
2 2020-01-01 00:01:01 +0000 UTC -11 4 2 1 cpu
2 2020-01-01 00:01:02 +0000 UTC -10 5 2 1 cpu
2 2020-01-01 00:01:01 +0000 UTC -11 4 2 1 cpu
2 2020-01-01 00:01:02 +0000 UTC -10 5 2 1 cpu

# Test NULL values in WHERE equality conditions.
query ITIIIIT
Expand Down
Loading

0 comments on commit 4bfa330

Please sign in to comment.