-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: enable lookup joins to lookup spans, not just individual keys #51576
Comments
This is a cool enhancement idea! |
A solution inside the optimizer would be for LookupJoin to contain a Constraint that is similar to an index constraint, but applies to the index columns after the equality columns. I think the opt part wouldn't be hard (we can reuse code from index constraints), but we'd also need execution support. |
Note that if we don't support this, then it puts the burden on the user to do 2 separate queries for cases where the lookup keys are not known beforehand. For example, the user could manually issue these two queres:
The difference in cost between doing it with lookup join vs. doing two manual queries is substantial. In one experiment, the lookup join costs is a whopping |
Adding support for this would definitely require a partnership between the SQL Execution and Optimizer teams. I don't think it'd be terribly hard and would bring a lot of "bang for the buck". CC @jordanlewis |
Cool, this seems like a great idea. So what would the API look like in the JoinReader proto definition? An "extra constraint column set", and a pair of constant values that fill in that set? I suppose it'd also be possible to make this dynamic, if you wanted to construct the span range based on values that you find in the data... maybe this is a marginal use case in this example, but I bet we could find some real world ones as well.
|
I think we handle constant columns today (by projecting the values before the lookup join). It would be a list of pseudo-spans, i.e. spans with key pieces that get appended to the keys in each span generated in the lookup joiner. In the example above, the pseudo-spans would be It's a little more tricky with multiple pseudo-spans - we would effectively be breaking up one span Conceptually, if we have an index on |
Flagging as a potential project for @helenmhe if she has time after her current work on join reader. |
This would be a nice win for TPC-C. It shows up in the Stock-Level transaction. Right now, we issue two read-only statements in a read-only transaction. If we were able to combine them into a single statement then we'd be able to use an implicit transaction instead of an explicit transaction. So doing so would not only avoid one statement but would actually drop the transaction down from 6 statements ( For reference: -- Currently
root@:26257/tpcc> EXPLAIN SELECT d_next_o_id
FROM district
WHERE d_w_id = 0 AND d_id = 3;
tree | field | description
-------+---------------------+-------------------
| distribution | local
| vectorized | false
scan | |
| estimated row count | 1
| table | district@primary
| spans | [/0/3 - /0/3]
root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
FROM order_line
JOIN stock
ON s_w_id = 0 AND s_i_id = ol_i_id
WHERE ol_w_id = 0
AND ol_d_id = 3
AND ol_o_id BETWEEN 3095 - 20 AND 3095 - 1
AND s_quantity < 15;
tree | field | description
---------------------------+-----------------------+-----------------------------------------------------
| distribution | full
| vectorized | false
group (scalar) | |
└── distinct | |
│ | distinct on | s_i_id
└── lookup join | |
│ | table | stock@primary
│ | equality | (project_const_col_@14, ol_i_id) = (s_w_id,s_i_id)
│ | equality cols are key |
│ | pred | s_quantity < 15
└── render | |
└── scan | |
| estimated row count | 117
| table | order_line@primary
| spans | [/0/3/3094 - /0/3/3075]
-- With Subquery (today)
root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
FROM (SELECT d_w_id, d_id, d_next_o_id FROM district WHERE d_w_id = 0 AND d_id = 3)
JOIN order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
AND ol_o_id BETWEEN (d_next_o_id - 20) AND (d_next_o_id - 1)
JOIN stock ON s_w_id = d_w_id AND s_i_id = ol_i_id
WHERE s_quantity < 15;
tree | field | description
-----------------------------+-----------------------+---------------------------------------------------------------------
| distribution | full
| vectorized | false
group (scalar) | |
└── distinct | |
│ | distinct on | s_i_id
└── lookup join | |
│ | table | stock@primary
│ | equality | (d_w_id, ol_i_id) = (s_w_id,s_i_id)
│ | equality cols are key |
│ | pred | s_quantity < 15
└── lookup join | |
│ | table | order_line@primary
│ | equality | (d_w_id, d_id) = (ol_w_id,ol_d_id)
│ | pred | (ol_o_id >= (d_next_o_id - 20)) AND (ol_o_id <= (d_next_o_id - 1))
└── scan | |
| estimated row count | 1
| table | district@primary
| spans | [/0/3 - /0/3]
-- With Subquery (in future with this issue addressed)
root@:26257/tpcc> EXPLAIN SELECT count(DISTINCT s_i_id)
FROM (SELECT d_w_id, d_id, d_next_o_id FROM district WHERE d_w_id = 0 AND d_id = 3)
JOIN order_line ON ol_w_id = d_w_id
AND ol_d_id = d_id
AND ol_o_id BETWEEN (d_next_o_id - 20) AND (d_next_o_id - 1)
JOIN stock ON s_w_id = d_w_id AND s_i_id = ol_i_id
WHERE s_quantity < 15;
tree | field | description
-----------------------------+-----------------------+---------------------------------------------------------------------
| distribution | full
| vectorized | false
group (scalar) | |
└── distinct | |
│ | distinct on | s_i_id
└── lookup join | |
│ | table | stock@primary
│ | equality | (d_w_id, ol_i_id) = (s_w_id,s_i_id)
│ | equality cols are key |
│ | pred | s_quantity < 15
└── lookup join | |
│ | table | order_line@primary
│ | equality | (d_w_id, d_id) = (ol_w_id,ol_d_id)
│ | inequality | (ol_o_id >= (d_next_o_id - 20)) AND (ol_o_id <= (d_next_o_id - 1))
└── scan | |
| estimated row count | 1
| table | district@primary
| spans | [/0/3 - /0/3] |
There are really two different issues here:
|
This is blocking an important POC and I think we should prioritize it. |
Here's a demonstration of a kind of query that would benefit: CREATE TABLE t (i PRIMARY KEY) AS SELECT generate_series(0, 999999);
CREATE TABLE u (j PRIMARY KEY, k) AS SELECT n, n FROM generate_series(0, 999999) AS s(n);
SELECT (SELECT k FROM u WHERE j > i ORDER BY j LIMIT 1) FROM t WHERE i > 999997 ORDER BY i LIMIT 1; With > in the subquery, we have to use a cross join over a full scan of u:
But we really want to use a lookup join, which we can see if we change the > in the subquery to =:
|
Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing. Condition (3) is satisfied when the type of the column is one of `IntFamily`, `OidFamily`, `UuidFamily` or `BoolFamily`. It is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Informs cockroachdb#51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit.
Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing other than for the minimum value for that type. Condition (3) is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Informs cockroachdb#51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit.
Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing other than for the minimum value for that type. Condition (3) is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Fixes cockroachdb#51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit.
Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing other than for the minimum value for that type. Condition (3) is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Fixes cockroachdb#51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit.
85339: insights: crdb_internal.cluster_execution_insights r=matthewtodd a=matthewtodd Here we introduce a new virtual table for a cluster-wide view of "insights," a subsystem of sqlstats that is currently disabled by default but that will identify slow- and slower-than-usual statement executions, along with other potentially problematic behaviors we will be building support for. This table will back the upcoming insights UI over the new SQL-over-HTTP endpoint. Release note (sql change): A new `crdb_internal` virtual table, `cluster_execution_insights`, was introduced, offering a cluster-wide view of the same node-local information available in `node_execution_insights`. The insights subsystem is, as of this commit, still under development and disabled by default, so there will not yet be much to see here. 85576: storage: use block-property filters for MVCC range tombstone masking r=nicktrav,erikgrinaker a=jbowens Use block-property filters to aid in skipping keys that are deleted by a MVCC range tombstone. Release note: None 85597: opt/rowexec: support range lookup joins on input columns r=DrewKimball a=DrewKimball **opt/rowexec: support range lookup joins on input columns** Previously, it was possible to perform lookup joins using inequality conditions between index columns and constant values. This commit allows lookup joins to also use inequalities between index columns and input columns. There are restrictions on when an inequality can be used in a lookup join: 1. The left and right sides of the inequality must have identical types. 2. The inequality is between an index column and input column (or constant). 3. If the index column is `DESC` and the inequality is of the form `idxCol < inputCol`, the column type must support `Datum.Prev` without any chance of failing other than for the minimum value for that type. Condition (3) is necessary because when the index column is `DESC`, the `idxCol < inputCol` filter will be used in forming the start key of each span. The spans are expected to be inclusive, so the value of inputCol will have to be decremented to the value that orders immediately before it. Unlike the case of retrieving the next possible key (ex: `ASC` index with `idxCol > inputCol`) it is not possible in general to directly obtain the immediate previous key, because it would have an infinite number of `0xff` bytes appended to it. Thus, we have to use `Datum.Prev` on the inequality bound before adding it to the start key. Additionally, this commit allows lookup joins to be planned without equality filters when the following conditions are met: 1. There is an inequality filter between an index column and an input column that can be used to perform lookups. 2. Either the input has only one row or the join has a LOOKUP hint. These restrictions ensure that planning lookup joins in more cases does not lead to performance regressions, since the current execution logic does not fully de-duplicate spans when inequalities are used. Fixes #51576 Release note (performance improvement): The execution engine can now perform lookup joins in more cases. This can significantly improve join performance when there is a large table with an index that conforms to the join ON conditions, as well as allow joins to halt early in the presence of a limit. **opt: extend ExtractJoinEqualities to handle inequalities** Previously, `ExtractJoinEqualities` would match equalities between non-constant, non-variable expressions and project the expressions so that the comparison would be between variables instead. This may allow rules like `GenerateLookupJoins` to use the equalities later. This commit modifies `ExtractJoinEqualities` (now `ExtractJoinComparisons`) so that it also matches inequalities. This is useful because lookup joins can now use inequalities for lookups, and converting inequalties to reference variables instead of complex expressions increases the likelihood that an inequality can be used in a join. Release note: None 85718: sql: implement drop function in legacy schema changer r=ajwerner a=chengxiong-ruan There are 5 commits: (1) have function resolver return `ErrFunctionUndefined` error. (2) implement drop function in legacy schema changer. (3) support drop cascade of objects depended on by UDFs. (4) disallow UDF usages from tables (injected the function resolver into declarative schema changer). (5) disallow UDF usages from views and UDF. 85823: kvserver: lower priority level for mvcc gc work r=irfansharif a=irfansharif GC could be expected to be LowPri, so that it does not impact user-facing traffic when resources (e.g. CPU, write capacity of the store) are scarce. However long delays in GC can slow down user-facing traffic due to more versions in the store, and can increase write amplification of the store since there is more live data. Ideally, we should adjust this priority based on how far behind we are with respect to GC-ing a particular range. Keeping the priority level static at NormalPri proved disruptive when a large volume of MVCC GC work is suddenly accrued (if an old protected timestamp record was just released for ex. following a long paused backup job being completed/canceled, or just an old, long running backup job finishing). After dynamic priority adjustment, it's not yet clear whether we need additional pacing mechanisms to provide better latency isolation, similar to ongoing work for backups. MVCC GC work is CPU intensive: \#82955. This patch is also speculative in nature and in response to observed incidents where NormalPri proved too disruptive. Fuller treatment would entail working off of reliable reproductions of this behaviour. We also added a cluster setting (kv.mvcc_gc.queue_interval) that controls how long the MVCC GC queue waits between processing replicas. It was previously hardcoded to 1s (which is the default value), but changing it would've come in handy in support incidents as a form of manual pacing of MVCC GC work (we have a similar useful knob for the merge queue). Release note (performance improvement): Previously if there was sudden increase in the volume of pending MVCC GC work, there was an impact on foreground latencies. These sudden increases commonly occurred when: - gc.ttlseconds was reduced dramatically over tables/indexes that accrue a lot of MVCC garbage (think "rows being frequently deleted") - a paused backup job from a while ago (think > 1 day) was canceled/failed - a backup job that started a while ago (think > 1 day) just finished Indicators of a large increase in the volume of pending MVCC GC work is a steep climb in the "GC Queue" graph found in the DB console page, when navigating to 'Metrics', and selecting the 'Queues' dashboard. With this patch, the effect on foreground latencies as a result of this sudden build up should be reduced. 85869: colexecbase: fix bpchar to bpchar cast r=yuzefovich a=yuzefovich This commit fixes an identity cast between `bpchar`s or arrays of `bpchar`s. These types require special handling of trimming trailing whitespace which wasn't done in the "vanilla" identity cast. I believe that these casts don't get planned by the optimizer ever, so there should be no impact, but it fixes a test flake and addresses the corresponding TODO. Fixes: #85375. Release note: None Co-authored-by: Matthew Todd <todd@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: DrewKimball <drewk@cockroachlabs.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Closing as fixed by #85597. |
Imagine you have tables like this:
Now say I want to look up a 10 minute interval of metric values for a particular metric:
This is the resulting plan:
Notice that the time filter, which is actually the most important thing about this query, is not part of the lookup join criteria. That's because lookup joins only allow equality conditions for lookup. But what if lookup joins allowed (multi-column) spans? It'd make dynamic lookup cases like this vastly more efficient. In this example, what we really want is for the lookup join to operate on spans like this:
rather than simply as it does today:
Jira issue: CRDB-4031
Epic: CRDB-14091
gz#11736
The text was updated successfully, but these errors were encountered: