Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: enable lookup joins to lookup spans, not just individual keys #51576

Closed
andy-kimball opened this issue Jul 18, 2020 · 13 comments · Fixed by #66002
Closed

sql: enable lookup joins to lookup spans, not just individual keys #51576

andy-kimball opened this issue Jul 18, 2020 · 13 comments · Fixed by #66002
Assignees
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-wishlist A wishlist feature. T-sql-queries SQL Queries Team

Comments

@andy-kimball
Copy link
Contributor

andy-kimball commented Jul 18, 2020

Imagine you have tables like this:

CREATE TABLE metrics (
	id   SERIAL PRIMARY KEY,
	name STRING,
	INDEX name_index (name)
);
CREATE TABLE metric_values (
	metric_id INT8,
	time      TIMESTAMPTZ,
	value     INT8,
	PRIMARY KEY (metric_id, time)
);

Now say I want to look up a 10 minute interval of metric values for a particular metric:

SELECT *
FROM metric_values
INNER JOIN metrics
ON metric_id=id
WHERE
	time BETWEEN '2020-01-01 00:00:00+00:00' AND '2020-01-01 00:10:00+00:00' AND
	name='cpu.percent'
;

This is the resulting plan:

inner-join (lookup metric_values)
 ├── scan metrics@name_index
 │    └── constraint: /6/5: [/'cpu.percent' - /'cpu.percent']
 └── filters
      └── (time >= '2020-01-01 00:00:00+00:00') AND (time <= '2020-01-01 00:10:00+00:00')

Notice that the time filter, which is actually the most important thing about this query, is not part of the lookup join criteria. That's because lookup joins only allow equality conditions for lookup. But what if lookup joins allowed (multi-column) spans? It'd make dynamic lookup cases like this vastly more efficient. In this example, what we really want is for the lookup join to operate on spans like this:

/id/time
[/1/'2020-01-01 00:00:00+00:00' - /1/'2020-01-01 00:10:00+00:00']
[/5/'2020-01-01 00:00:00+00:00' - /5/'2020-01-01 00:10:00+00:00']
[/6/'2020-01-01 00:00:00+00:00' - /6/'2020-01-01 00:10:00+00:00']
...

rather than simply as it does today:

/id
[/1 - /1]
[/5 - /5]
[/6 - /6]

Jira issue: CRDB-4031
Epic: CRDB-14091

gz#11736

@andy-kimball andy-kimball added A-sql-optimizer SQL logical planning and optimizations. A-sql-execution Relating to SQL execution. labels Jul 18, 2020
@andy-kimball andy-kimball added the C-wishlist A wishlist feature. label Jul 18, 2020
@awoods187
Copy link
Contributor

This is a cool enhancement idea!

@RaduBerinde
Copy link
Member

A solution inside the optimizer would be for LookupJoin to contain a Constraint that is similar to an index constraint, but applies to the index columns after the equality columns. I think the opt part wouldn't be hard (we can reuse code from index constraints), but we'd also need execution support.

@cockroachdb cockroachdb deleted a comment from blathers-crl bot Jul 20, 2020
@andy-kimball
Copy link
Contributor Author

andy-kimball commented Jul 20, 2020

Note that if we don't support this, then it puts the burden on the user to do 2 separate queries for cases where the lookup keys are not known beforehand. For example, the user could manually issue these two queres:

  1. Query # 1: Fetch the list of interesting metrics and store in a temporary app variable:
SELECT id FROM metrics WHERE name='cpu.percent';
  1. Query # 2: Pass the list gathered in step # 1 as a parameter to a query that gets the needed time ranges:
SELECT *
FROM metric_values
WHERE
	time BETWEEN '2020-01-01 00:00:00+00:00' AND '2020-01-01 00:10:00+00:00' AND
	metric_id IN $1
;

The difference in cost between doing it with lookup join vs. doing two manual queries is substantial. In one experiment, the lookup join costs is a whopping 204,149, whereas the combined manual queries cost only 27,343 - an order of magnitude difference.

@andy-kimball
Copy link
Contributor Author

A solution inside the optimizer would be for LookupJoin to contain a Constraint that is similar to an index constraint, but applies to the index columns after the equality columns. I think the opt part wouldn't be hard (we can reuse code from index constraints), but we'd also need execution support.

Adding support for this would definitely require a partnership between the SQL Execution and Optimizer teams. I don't think it'd be terribly hard and would bring a lot of "bang for the buck".

CC @jordanlewis

@jordanlewis
Copy link
Member

Cool, this seems like a great idea. So what would the API look like in the JoinReader proto definition? An "extra constraint column set", and a pair of constant values that fill in that set?

I suppose it'd also be possible to make this dynamic, if you wanted to construct the span range based on values that you find in the data... maybe this is a marginal use case in this example, but I bet we could find some real world ones as well.

CREATE TABLE time_spans(
   id INT PRIMARY KEY,
   start TIMESTAMPTZ,
   end TIMESTAMPTZ
);

SELECT *
FROM time_spans,
metric_values
INNER JOIN metrics
ON metric_id=id
WHERE
	time BETWEEN start AND end AND
	name='cpu.percent' AND
    time_spans.id=$1
;

@RaduBerinde
Copy link
Member

I think we handle constant columns today (by projecting the values before the lookup join).

It would be a list of pseudo-spans, i.e. spans with key pieces that get appended to the keys in each span generated in the lookup joiner. In the example above, the pseudo-spans would be [/'2020-01-01 00:00:00+00:00' - /'2020-01-01 00:10:00+00:00']. The lookup joiner would take a span like [/1 - /1] (note that all spans generated by the lookup joiner have equal start and end keys) and append the pseudo-spans to get [/1/'2020-01-01 00:00:00+00:00' - /1/'2020-01-01 00:10:00+00:00'].

It's a little more tricky with multiple pseudo-spans - we would effectively be breaking up one span [/1 - /1] into multiple spans, e.g. [/1/1 - /1/5] [/1/10 - /1/15] [/1/20-/1/25]. This might require a bit more book-keeping in the lookup joiner. I think we would benefit even from supporting just one pseudo-span at first though.

Conceptually, if we have an index on (a,b,c,d) and a lookup join with equality columns (a,b), we think of each lookup as a scan into an index on (c,d) (and the pseudo-spans allow us to constrain that scan).

@jordanlewis
Copy link
Member

Flagging as a potential project for @helenmhe if she has time after her current work on join reader.

@nvanbenschoten
Copy link
Member

This would be a nice win for TPC-C. It shows up in the Stock-Level transaction. Right now, we issue two read-only statements in a read-only transaction. If we were able to combine them into a single statement then we'd be able to use an implicit transaction instead of an explicit transaction. So doing so would not only avoid one statement but would actually drop the transaction down from 6 statements (BEGIN; SAVEPOINT; SELECT ... district; SELECT ... order_line; RELEASE SAVEPOINT; COMMIT) to 1 statement (SELECT ... (SELECT district) ... stock). As we keep seeing, one of the optimizations with the biggest bang for its buck for workloads that bottleneck on CPU is minimizing the statement count.

For reference:

-- Currently
root@:26257/tpcc> EXPLAIN SELECT d_next_o_id
FROM district
WHERE d_w_id = 0 AND d_id = 3;
  tree |        field        |   description
-------+---------------------+-------------------
       | distribution        | local
       | vectorized          | false
  scan |                     |
       | estimated row count | 1
       | table               | district@primary
       | spans               | [/0/3 - /0/3]

root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
FROM order_line
JOIN stock
ON s_w_id = 0 AND s_i_id = ol_i_id
WHERE ol_w_id = 0
  AND ol_d_id = 3
  AND ol_o_id BETWEEN 3095 - 20 AND 3095 - 1
  AND s_quantity < 15;
            tree           |         field         |                    description
---------------------------+-----------------------+-----------------------------------------------------
                           | distribution          | full
                           | vectorized            | false
  group (scalar)           |                       |
   └── distinct            |                       |
        │                  | distinct on           | s_i_id
        └── lookup join    |                       |
             │             | table                 | stock@primary
             │             | equality              | (project_const_col_@14, ol_i_id) = (s_w_id,s_i_id)
             │             | equality cols are key |
             │             | pred                  | s_quantity < 15
             └── render    |                       |
                  └── scan |                       |
                           | estimated row count   | 117
                           | table                 | order_line@primary
                           | spans                 | [/0/3/3094 - /0/3/3075]


-- With Subquery (today)
root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
  FROM (SELECT d_w_id, d_id, d_next_o_id FROM district WHERE d_w_id = 0 AND d_id = 3)
  JOIN order_line ON  ol_w_id = d_w_id
                  AND ol_d_id = d_id
                  AND ol_o_id BETWEEN (d_next_o_id - 20) AND (d_next_o_id - 1)
  JOIN stock ON s_w_id = d_w_id AND s_i_id = ol_i_id
 WHERE s_quantity < 15;

             tree            |         field         |                            description
-----------------------------+-----------------------+---------------------------------------------------------------------
                             | distribution          | full
                             | vectorized            | false
  group (scalar)             |                       |
   └── distinct              |                       |
        │                    | distinct on           | s_i_id
        └── lookup join      |                       |
             │               | table                 | stock@primary
             │               | equality              | (d_w_id, ol_i_id) = (s_w_id,s_i_id)
             │               | equality cols are key |
             │               | pred                  | s_quantity < 15
             └── lookup join |                       |
                  │          | table                 | order_line@primary
                  │          | equality              | (d_w_id, d_id) = (ol_w_id,ol_d_id)
                  │          | pred                  | (ol_o_id >= (d_next_o_id - 20)) AND (ol_o_id <= (d_next_o_id - 1))
                  └── scan   |                       |
                             | estimated row count   | 1
                             | table                 | district@primary
                             | spans                 | [/0/3 - /0/3]

-- With Subquery (in future with this issue addressed)
root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
  FROM (SELECT d_w_id, d_id, d_next_o_id FROM district WHERE d_w_id = 0 AND d_id = 3)
  JOIN order_line ON  ol_w_id = d_w_id
                  AND ol_d_id = d_id
                  AND ol_o_id BETWEEN (d_next_o_id - 20) AND (d_next_o_id - 1)
  JOIN stock ON s_w_id = d_w_id AND s_i_id = ol_i_id
 WHERE s_quantity < 15;

             tree            |         field         |                            description
-----------------------------+-----------------------+---------------------------------------------------------------------
                             | distribution          | full
                             | vectorized            | false
  group (scalar)             |                       |
   └── distinct              |                       |
        │                    | distinct on           | s_i_id
        └── lookup join      |                       |
             │               | table                 | stock@primary
             │               | equality              | (d_w_id, ol_i_id) = (s_w_id,s_i_id)
             │               | equality cols are key |
             │               | pred                  | s_quantity < 15
             └── lookup join |                       |
                  │          | table                 | order_line@primary
                  │          | equality              | (d_w_id, d_id) = (ol_w_id,ol_d_id)
                  │          | inequality            | (ol_o_id >= (d_next_o_id - 20)) AND (ol_o_id <= (d_next_o_id - 1))
                  └── scan   |                       |
                             | estimated row count   | 1
                             | table                 | district@primary
                             | spans                 | [/0/3 - /0/3]

@rytaft
Copy link
Collaborator

rytaft commented May 3, 2021

There are really two different issues here:

  1. Support lookup joins to ranges of keys where the range is known at optimization time. This is what is described in @andy-kimball's issue description with the predicate time BETWEEN '2020-01-01 00:00:00+00:00' AND '2020-01-01 00:10:00+00:00'. This should be relatively straightforward to add as an extension of the work done in sql,opt: add support for lookup joins with multiple spans #60302 -- we just need to add support for inequality operators (>,>=,<,<=) in addition to the support that already exists for = and IN.
  2. Support lookup joins where the join condition is an inequality between a column in the index and an expression over one or more columns in the input. This is what is described in @nvanbenschoten's comment about TPC-C. The joinReader will need to learn how to evaluate expressions to produce spans, and the optimizer will need to be able to detect that a join condition fits the requirements. In the case of TPC-C, this is as simple as noting that ol_o_id is a single variable on the left hand side, while d_next_o_id is the only outer column on the right hand side. However, we could also support more complex predicates such as CASE WHEN b THEN a > c ELSE a > d END where a is a column in the index and b, c, and d are columns in the input. The optimizer would need to be able to do some sort of constant folding simulation to determine that this is a valid lookup join predicate.

@michae2
Copy link
Collaborator

michae2 commented Mar 15, 2022

This is blocking an important POC and I think we should prioritize it.

@michae2
Copy link
Collaborator

michae2 commented Mar 15, 2022

Here's a demonstration of a kind of query that would benefit:

CREATE TABLE t (i PRIMARY KEY) AS SELECT generate_series(0, 999999);
CREATE TABLE u (j PRIMARY KEY, k) AS SELECT n, n FROM generate_series(0, 999999) AS s(n);
SELECT (SELECT k FROM u WHERE j > i ORDER BY j LIMIT 1) FROM t WHERE i > 999997 ORDER BY i LIMIT 1;

With > in the subquery, we have to use a cross join over a full scan of u:

demo@127.0.0.1:26257/defaultdb> EXPLAIN SELECT (SELECT k FROM u WHERE j > i ORDER BY j LIMIT 1) FROM t WHERE i > 999997 ORDER BY i LIMIT 1;
                                                   info
-----------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ estimated row count: 1
  │
  └── • top-k
      │ estimated row count: 1
      │ order: +i
      │ k: 1
      │
      └── • distinct
          │ distinct on: i
          │
          └── • sort
              │ estimated row count: 67
              │ order: +j
              │
              └── • cross join (right outer)
                  │ estimated row count: 67
                  │ pred: j > i
                  │
                  ├── • scan
                  │     estimated row count: 1,000,000 (100% of the table; stats collected 2 minutes ago)
                  │     table: u@u_pkey
                  │     spans: FULL SCAN
                  │
                  └── • scan
                        estimated row count: 0 (<0.01% of the table; stats collected 2 minutes ago)
                        table: t@t_pkey
                        spans: [/999998 - ]
(31 rows)


Time: 2ms total (execution 2ms / network 0ms)

But we really want to use a lookup join, which we can see if we change the > in the subquery to =:

demo@127.0.0.1:26257/defaultdb> EXPLAIN SELECT (SELECT k FROM u WHERE j = i ORDER BY j LIMIT 1) FROM t WHERE i > 999997 ORDER BY i LIMIT 1;
                                          info
-----------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • render
  │ estimated row count: 1
  │
  └── • lookup join (left outer)
      │ estimated row count: 1
      │ table: u@u_pkey
      │ equality: (i) = (j)
      │ equality cols are key
      │
      └── • scan
            estimated row count: 0 (<0.01% of the table; stats collected 3 minutes ago)
            table: t@t_pkey
            spans: [/999998 - ]
            limit: 1
(17 rows)


Time: 2ms total (execution 1ms / network 1ms)

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 4, 2022
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing.

Condition (3) is satisfied when the type of the column is one of `IntFamily`,
`OidFamily`, `UuidFamily` or `BoolFamily`. It is necessary because when the
index column is `DESC`, the `idxCol < inputCol` filter will be used in
forming the start key of each span. The spans are expected to be inclusive,
so the value of inputCol will have to be decremented to the value that orders
immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Informs cockroachdb#51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 5, 2022
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing other than for the minimum value for that type.

Condition (3) is necessary because when the index column is `DESC`, the
`idxCol < inputCol` filter will be used in forming the start key of each span.
The spans are expected to be inclusive, so the value of inputCol will have to
be decremented to the value that orders immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Informs cockroachdb#51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 8, 2022
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing other than for the minimum value for that type.

Condition (3) is necessary because when the index column is `DESC`, the
`idxCol < inputCol` filter will be used in forming the start key of each span.
The spans are expected to be inclusive, so the value of inputCol will have to
be decremented to the value that orders immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Fixes cockroachdb#51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 9, 2022
Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing other than for the minimum value for that type.

Condition (3) is necessary because when the index column is `DESC`, the
`idxCol < inputCol` filter will be used in forming the start key of each span.
The spans are expected to be inclusive, so the value of inputCol will have to
be decremented to the value that orders immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Fixes cockroachdb#51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.
craig bot pushed a commit that referenced this issue Aug 10, 2022
85339: insights: crdb_internal.cluster_execution_insights r=matthewtodd a=matthewtodd

Here we introduce a new virtual table for a cluster-wide view of
"insights," a subsystem of sqlstats that is currently disabled by
default but that will identify slow- and slower-than-usual statement
executions, along with other potentially problematic behaviors we will
be building support for.

This table will back the upcoming insights UI over the new SQL-over-HTTP
endpoint.

Release note (sql change): A new `crdb_internal` virtual table,
`cluster_execution_insights`, was introduced, offering a cluster-wide view
of the same node-local information available in `node_execution_insights`.
The insights subsystem is, as of this commit, still under development
and disabled by default, so there will not yet be much to see here.

85576: storage: use block-property filters for MVCC range tombstone masking r=nicktrav,erikgrinaker a=jbowens

Use block-property filters to aid in skipping keys that are deleted by a
MVCC range tombstone.

Release note: None

85597: opt/rowexec: support range lookup joins on input columns r=DrewKimball a=DrewKimball

**opt/rowexec: support range lookup joins on input columns**

Previously, it was possible to perform lookup joins using inequality
conditions between index columns and constant values. This commit allows
lookup joins to also use inequalities between index columns and input columns.

There are restrictions on when an inequality can be used in a lookup join:
  1. The left and right sides of the inequality must have identical types.
  2. The inequality is between an index column and input column (or constant).
  3. If the index column is `DESC` and the inequality is of the form
     `idxCol < inputCol`, the column type must support `Datum.Prev` without
     any chance of failing other than for the minimum value for that type.

Condition (3) is necessary because when the index column is `DESC`, the
`idxCol < inputCol` filter will be used in forming the start key of each span.
The spans are expected to be inclusive, so the value of inputCol will have to
be decremented to the value that orders immediately before it.

Unlike the case of retrieving the next possible key (ex: `ASC` index with
`idxCol > inputCol`) it is not possible in general to directly obtain the
immediate previous key, because it would have an infinite number of `0xff`
bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality
bound before adding it to the start key.

Additionally, this commit allows lookup joins to be planned without equality
filters when the following conditions are met:
  1. There is an inequality filter between an index column and an input column
     that can be used to perform lookups.
  2. Either the input has only one row or the join has a LOOKUP hint.

These restrictions ensure that planning lookup joins in more cases does not
lead to performance regressions, since the current execution logic does not
fully de-duplicate spans when inequalities are used.

Fixes #51576

Release note (performance improvement): The execution engine can now perform
lookup joins in more cases. This can significantly improve join performance
when there is a large table with an index that conforms to the join ON
conditions, as well as allow joins to halt early in the presence of a limit.

**opt: extend ExtractJoinEqualities to handle inequalities**

Previously, `ExtractJoinEqualities` would match equalities between
non-constant, non-variable expressions and project the expressions so that
the comparison would be between variables instead. This may allow rules
like `GenerateLookupJoins` to use the equalities later.

This commit modifies `ExtractJoinEqualities` (now `ExtractJoinComparisons`)
so that it also matches inequalities. This is useful because lookup joins
can now use inequalities for lookups, and converting inequalties to
reference variables instead of complex expressions increases the likelihood
that an inequality can be used in a join.

Release note: None

85718: sql: implement drop function in legacy schema changer r=ajwerner a=chengxiong-ruan

There are 5 commits:
(1) have function resolver return `ErrFunctionUndefined` error.
(2) implement drop function in legacy schema changer.
(3) support drop cascade of objects depended on by UDFs.
(4) disallow UDF usages from tables (injected the function resolver into declarative schema changer).
(5) disallow UDF usages from views and UDF.

85823: kvserver: lower priority level for mvcc gc work r=irfansharif a=irfansharif

GC could be expected to be LowPri, so that it does not impact
user-facing traffic when resources (e.g. CPU, write capacity of the
store) are scarce. However long delays in GC can slow down user-facing
traffic due to more versions in the store, and can increase write
amplification of the store since there is more live data. Ideally, we
should adjust this priority based on how far behind we are with respect
to GC-ing a particular range. Keeping the priority level static at
NormalPri proved disruptive when a large volume of MVCC GC work is
suddenly accrued (if an old protected timestamp record was just released
for ex. following a long paused backup job being completed/canceled, or
just an old, long running backup job finishing).

After dynamic priority adjustment, it's not yet clear whether we need
additional pacing mechanisms to provide better latency isolation,
similar to ongoing work for backups. MVCC GC work is CPU intensive:
\#82955. This patch is also speculative in nature and in response to
observed incidents where NormalPri proved too disruptive. Fuller
treatment would entail working off of reliable reproductions of this
behaviour.

We also added a cluster setting (kv.mvcc_gc.queue_interval)
that controls how long the MVCC GC queue waits between
processing replicas. It was previously hardcoded to 1s (which is the
default value), but changing it would've come in handy in support
incidents as a form of manual pacing of MVCC GC work (we have a similar
useful knob for the merge queue).

Release note (performance improvement): Previously if there was sudden
increase in the volume of pending MVCC GC work, there was an impact on
foreground latencies. These sudden increases commonly occurred when:
- gc.ttlseconds was reduced dramatically over tables/indexes that accrue
  a lot of MVCC garbage (think "rows being frequently deleted")
- a paused backup job from a while ago (think > 1 day) was
  canceled/failed
- a backup job that started a while ago (think > 1 day) just finished

Indicators of a large increase in the volume of pending MVCC GC work is
a steep climb in the "GC Queue" graph found in the DB console page, when
navigating to 'Metrics', and selecting the 'Queues' dashboard. With this
patch, the effect on foreground latencies as a result of this sudden
build up should be reduced.

85869: colexecbase: fix bpchar to bpchar cast r=yuzefovich a=yuzefovich

This commit fixes an identity cast between `bpchar`s or arrays of
`bpchar`s. These types require special handling of trimming trailing
whitespace which wasn't done in the "vanilla" identity cast. I believe
that these casts don't get planned by the optimizer ever, so there
should be no impact, but it fixes a test flake and addresses the
corresponding TODO.

Fixes: #85375.

Release note: None

Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Co-authored-by: DrewKimball <drewk@cockroachlabs.com>
Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@DrewKimball
Copy link
Collaborator

Closing as fixed by #85597.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. A-sql-optimizer SQL logical planning and optimizations. C-wishlist A wishlist feature. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.