opt, sql: LEFT OUTER and ANTI spatial joins cannot be index-accelerated #53576

rytaft · 2020-08-27T21:22:30Z

Inner and semi joins with a spatial ON condition such as ST_Intersects(g1, g2) can take advantage of an inverted index on either g1 or g2 to perform an inverted join and get better performance than the alternative approach of performing a cross join followed by a filter.

For example, running explain on the the following inner join query shows that it will use an inverted join:

> EXPLAIN SELECT *
FROM nyc_census_blocks AS c
JOIN nyc_neighborhoods AS n
ON ST_Covers(c.geom, n.geom);
         tree        |         field         |                 description
---------------------+-----------------------+-----------------------------------------------
                     | distribution          | full
                     | vectorized            | false
  lookup join        |                       |
   │                 | table                 | nyc_census_blocks@primary
   │                 | equality              | (gid) = (gid)
   │                 | equality cols are key |
   │                 | pred                  | st_covers(geom, geom)
   └── inverted join |                       |
        │            | table                 | nyc_census_blocks@nyc_census_blocks_geom_idx
        └── scan     |                       |
                     | estimated row count   | 129
                     | table                 | nyc_neighborhoods@primary
                     | spans                 | FULL SCAN
(13 rows)

Time: 1.599ms

However, this is not the case for left outer or anti joins. For example, running explain on the following left join query shows that it performs an inefficient cross join:

> EXPLAIN SELECT *
FROM nyc_census_blocks AS c
LEFT JOIN nyc_neighborhoods AS n
ON ST_Covers(c.geom, n.geom);
           tree           |        field        |        description
--------------------------+---------------------+----------------------------
                          | distribution        | full
                          | vectorized          | true
  cross join (left outer) |                     |
   │                      | pred                | st_covers(geom, geom)
   ├── scan               |                     |
   │                      | estimated row count | 38794
   │                      | table               | nyc_census_blocks@primary
   │                      | spans               | FULL SCAN
   └── scan               |                     |
                          | estimated row count | 129
                          | table               | nyc_neighborhoods@primary
                          | spans               | FULL SCAN
(12 rows)

Time: 2.357ms

The reason left joins cannot use the inverted index is because as shown in the first explain, the inverted join must be wrapped in a lookup join with the primary index of the table. This is necessary because inverted joins for spatial predicates produce some false positives. To eliminate the false positives, we need the original geometry or geography column. This column is not stored in the inverted index, so we always need to fetch it from the primary index in order to run the original predicate function and remove the false positives.

In order for this to work correctly for a left join, the inverted join would need to return all of the rows returned by an inner join, plus any rows from the left side that didn't match, with the right columns NULL-extended. Then the lookup join would also need to return all of the rows returned by an inner join, plus any rows from the original left side that didn't match after removal of false positives.

The problem is that the lookup join join currently has no way of knowing that the rows passed in from the inverted join actually represent one or more copies of each of the original left side rows. As a result, it incorrectly returns some rows that should have been eliminated as false positives.

For example, suppose we have the following two tables, with an inverted index on g2:

      t1                        t2

 k1 |      g1               k2 |      g2 
----+-------------------   ----+------------------
  1 | POINT(1.0 1.0)        10 | POINT(1.2 1.2)
  2 | POINT(1.5 1.5)        20 | POINT(1.5 1.5)
  3 | POINT(40.0 40.0)      30 | POINT(40.0 40.0)
  4 | POINT(-10.0 40.0)

We want to execute the following query:

SELECT k1, k2 FROM t1 LEFT JOIN t2 ON ST_Intersects(g1, g2);

A left inverted join would return the following rows:

 k1 |        g1         |  k2
----+-------------------+------
  1 | POINT(1.0 1.0)    |   10   // false positive
  1 | POINT(1.0 1.0)    |   20   // false positive
  2 | POINT(1.5 1.5)    |   10   // false positive
  2 | POINT(1.5 1.5)    |   20
  3 | POINT(40.0 40.0)  |   30
  4 | POINT(-10.0 40.0) | NULL

The lookup join (prior to projecting away the unneeded columns) would then return the following rows:

 k1 |        g1         |  k2  |        g2
----+-------------------+------+------------------
  1 | POINT(1.0 1.0)    |   10 | NULL 
  1 | POINT(1.0 1.0)    |   20 | NULL 
  2 | POINT(1.5 1.5)    |   10 | NULL 
  2 | POINT(1.5 1.5)    |   20 | POINT(1.5 1.5)
  3 | POINT(40.0 40.0)  |   30 | POINT(40.0 40.0)
  4 | POINT(-10.0 40.0) | NULL | NULL

This is incorrect. What it should be returning is:

 k1 |        g1         |  k2  |        g2
----+-------------------+------+------------------
  1 | POINT(1.0 1.0)    | NULL | NULL 
  2 | POINT(1.5 1.5)    |   20 | POINT(1.5 1.5)
  3 | POINT(40.0 40.0)  |   30 | POINT(40.0 40.0)
  4 | POINT(-10.0 40.0) | NULL | NULL

However, there is no way to get that result without adding additional bookkeeping to the inverted joiner and/or the lookup joiner.

Workarounds

Left join queries like the one described above return correct results (because we do not use the inverted index), but they may be very slow. An alternative is to rewrite the query so that it first performs an inner join followed by a left join. For example, the above left join can be rewritten as follows:

> EXPLAIN SELECT inr.*
FROM nyc_census_blocks as c_outer LEFT JOIN (SELECT * FROM nyc_census_blocks AS c
JOIN nyc_neighborhoods AS n ON ST_Covers(c.geom, n.geom)) AS inr ON c_outer.gid = inr.gid;
           tree           |         field         |                 description
--------------------------+-----------------------+-----------------------------------------------
                          | distribution          | full
                          | vectorized            | false
  hash join (right outer) |                       |
   │                      | equality              | (gid) = (gid)
   │                      | right cols are key    |
   ├── lookup join        |                       |
   │    │                 | table                 | nyc_census_blocks@primary
   │    │                 | equality              | (gid) = (gid)
   │    │                 | equality cols are key |
   │    │                 | pred                  | st_covers(geom, geom)
   │    └── inverted join |                       |
   │         │            | table                 | nyc_census_blocks@nyc_census_blocks_geom_idx
   │         └── scan     |                       |
   │                      | estimated row count   | 129
   │                      | table                 | nyc_neighborhoods@primary
   │                      | spans                 | FULL SCAN
   └── scan               |                       |
                          | estimated row count   | 38794
                          | table                 | nyc_census_blocks@primary
                          | spans                 | FULL SCAN
(20 rows)

Time: 2.362ms

Notice that it now uses an inverted join, but also requires an additional scan of the nyc_census_blocks table. Depending on the query and data, this may or may not be faster than the cross join.

Possible Solutions

One possible solution would be to create an exploration rule that performs the transformation described in the previous section: convert the original left join to an inner join, then wrap it in an additional left join. We could also do something similar in which the left input is buffered, so it doesn't need to be re-scanned. This would be like moving the left input to a CTE.

Another alternative is to create a new execution operator or augment the existing joinReader and invertedJoiner operators. These operators would need to do some extra bookkeeping to keep track of how many rows had been output from the left side, and ensure that each row with no match on the right side was only output once.

Once left joins are supported, anti joins can be supported by converting them to a left join wrapped in a Select operator followed by a Project.

The text was updated successfully, but these errors were encountered:

sumeerbhola · 2020-08-31T13:10:13Z

augment the existing joinReader and invertedJoiner operators. These operators would need to do some extra bookkeeping to keep track of how many rows had been output from the left side, and ensure that each row with no match on the right side was only output once.

I am leaning towards this (as I mentioned in-person) -- @yuzefovich and @asubiotto do you have concerns about adding another mode to joinReader (I can prototype something to show you). The invertedJoiner would add a count column to its output, representing the size of a group, since it knows all the matches for a left row before outputting anything for a left row. So using the earlier example, the invertedJoiner would output

 k1 |        g1         |  k2   | count
----+-------------------+-------+----
  1 | POINT(1.0 1.0)    |   10  | 2  // false positive
  1 | POINT(1.0 1.0)    |   20  | 2  // false positive
  2 | POINT(1.5 1.5)    |   10  | 2  // false positive
  2 | POINT(1.5 1.5)    |   20  | 2
  3 | POINT(40.0 40.0)  |   30  | 1
  4 | POINT(-10.0 40.0) | NULL  | 1

The joinReader would be configured with the knowledge that k1, g1 are the "real left" columns, and that count is a special book-keeping column. The group of input rows for a "real left" row will be consecutive (I am assuming that when distributing, both an invertedJoiner and joinReader are in the same node and exactly one invertedJoiner is feeding a joinReader), but they can be split across multiple batches (hence the count book-keeping column). The joinReader can continue outputting rows that match the predicate as usual (all the joinReaderStrategy implementations are applicable), and would keep track of whether anything has matched for the current group. This tracking of whether anything matched would extend beyond the batch boundary if the group was not complete -- there would be at most one such group. Only when all the rows of a group were done would it potentially output the NULL-extended row.

yuzefovich · 2020-08-31T19:27:48Z

I think as long as the performance of the joinReader is not impacted in other cases, it's fine to add another mode for this use case.

asubiotto · 2020-09-01T08:55:12Z

I agree, but it makes me slightly uncomfortable that we'll be introducing a dependency on the invertedJoiner. Is there a way to fix this issue by only changing the invertedJoiner behavior?

Release justification: bug fixes and low-risk updates to new functionality This commit adds an exploration rule ConvertLeftToInnerJoin, which converts a left join to an inner join with the same ON condition, and then wraps the expression in another left join with the original left side. In order to avoid computing the left side of the join twice, we create a With expression for the left side, and then reference it with two WithScans. For example (assuming x is the primary key of a): SELECT a.x, b.y FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom); is converted to: WITH a_buf AS ( SELECT * FROM a ) SELECT a_buf.x, inr.y FROM a_buf LEFT JOIN ( SELECT * FROM a_buf JOIN b ON ST_Intersects(a_buf.geom, b.geom) ) AS inr ON a_buf.x = inr.x; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This tranformation allows us to index-accelerate spatial left joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): left outer spatial joins can now be index-accelerated, which can lead to performance improvements in some cases.

Release justification: bug fixes and low-risk updates to new functionality ConvertAntiToLeftJoin converts an anti join to a left join with the same ON condition, wraps the expression in a Select to remove rows that matched the ON condition, and then projects out the left side columns. For example (assuming x is a not-null column in b): SELECT * FROM a WHERE NOT EXISTS ( SELECT * FROM b WHERE ST_Intersects(a.geom, b.geom) ); is converted to: SELECT a.* FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom) WHERE b.x IS NULL; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This tranformation allows us to index-accelerate spatial anti joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): spatial anti joins can now be index-accelerated, which can lead to performance improvements in some cases.

Release justification: bug fixes and low-risk updates to new functionality This commit adds an exploration rule ConvertLeftToInnerJoin, which converts a left join to an inner join with the same ON condition, and then wraps the expression in another left join with the original left side. In order to avoid computing the left side of the join twice, we create a With expression for the left side, and then reference it with two WithScans. For example (assuming x is the primary key of a): SELECT a.x, b.y FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom); is converted to: WITH a_buf AS ( SELECT * FROM a ) SELECT a_buf.x, inr.y FROM a_buf LEFT JOIN ( SELECT * FROM a_buf JOIN b ON ST_Intersects(a_buf.geom, b.geom) ) AS inr ON a_buf.x = inr.x; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial left joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): left outer spatial joins can now be index-accelerated, which can lead to performance improvements in some cases.

53976: opt: add transformation rule to convert left join to inner join r=rytaft a=rytaft Release justification: bug fixes and low-risk updates to new functionality This commit adds an exploration rule `ConvertLeftToInnerJoin`, which converts a left join to an inner join with the same `ON` condition, and then wraps the expression in another left join with the original left side. In order to avoid computing the left side of the join twice, we create a `With` expression for the left side, and then reference it with two `WithScans`. For example (assuming `x` is the primary key of `a`): ``` SELECT a.x, b.y FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom); ``` is converted to: ``` WITH a_buf AS ( SELECT * FROM a ) SELECT a_buf.x, inr.y FROM a_buf LEFT JOIN ( SELECT * FROM a_buf JOIN b ON ST_Intersects(a_buf.geom, b.geom) ) AS inr ON a_buf.x = inr.x; ``` Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial left joins, which was not possible before. Informs #53576 Release note (performance improvement): left outer spatial joins can now be index-accelerated, which can lead to performance improvements in some cases. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

Release justification: bug fixes and low-risk updates to new functionality This commit adds an exploration rule ConvertLeftToInnerJoin, which converts a left join to an inner join with the same ON condition, and then wraps the expression in another left join with the original left side. In order to avoid computing the left side of the join twice, we create a With expression for the left side, and then reference it with two WithScans. For example (assuming x is the primary key of a): SELECT a.x, b.y FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom); is converted to: WITH a_buf AS ( SELECT * FROM a ) SELECT a_buf.x, inr.y FROM a_buf LEFT JOIN ( SELECT * FROM a_buf JOIN b ON ST_Intersects(a_buf.geom, b.geom) ) AS inr ON a_buf.x = inr.x; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial left joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): left outer spatial joins can now be index-accelerated, which can lead to performance improvements in some cases.

Release justification: bug fixes and low-risk updates to new functionality ConvertAntiToLeftJoin converts an anti join to a left join with the same ON condition, wraps the expression in a Select to remove rows that matched the ON condition, and then projects out the left side columns. For example (assuming x is a not-null column in b): SELECT * FROM a WHERE NOT EXISTS ( SELECT * FROM b WHERE ST_Intersects(a.geom, b.geom) ); is converted to: SELECT a.* FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom) WHERE b.x IS NULL; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial anti joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): spatial anti joins can now be index-accelerated, which can lead to performance improvements in some cases.

53539: sql/catalog, backupccl: support offline descriptors, add all new restored descriptors in offline state r=lucy-zhang a=lucy-zhang Closes #53375. ---- backupccl: fix bug in writing database schema map Release justification: Bug fixes to new functionality Release note (bug fix): Fixes a bug introduced in a 20.2 alpha release where incorrect caching could incur extra writes to the store during RESTORE on user-defined schemas. ---- backupccl: reset Version and ModificationTime on new descriptors The `Version` and `ModificationTime` fields are now reset to 1 and the empty timestamp, respectively, on new descriptors created in RESTORE, to make them behave like other newly created descriptors. Release justification: Low risk, high benefit changes to existing functionality Release note: None ---- sql/catalog: add descriptor API support for offline state This change adds methods for the offline state to the descriptor interfaces and makes the state-related API more uniform. Release justification: Low risk, high benefit changes to existing functionality Release note: None ---- sql/catalog: handle dropped and offline descriptors in lookups This change updates the name resolution and descriptor lookup implementations to respect the state of dropped and offline descriptors: All lookups now read the `IncludeOffline` and `IncludeDropped` flags in `tree.CommonLookupFlags` and filter results accordingly. Tests in the context of RESTORE are forthcoming in the following commit. Release justification: Bug fixes and low-risk updates to new functionality Release note: None ---- backupccl: add all new restored descriptors in offline state This change updates the RESTORE job to add all new descriptors prior to restoring data in the OFFLINE state, and update their state to PUBLIC after data has been restored. This makes all descriptors behave like tables. Release justification: Low risk, high benefit changes to existing functionality (for database descriptors; other changes apply only to new functionality) Release note (sql change): Databases being restored will now be in the offline state, invisible to users, until the data has been restored. This is the same as the existing behavior for restored tables. (This change is also applied to enums and user-defined schemas being restored, which is a change relative to only the 20.2 alpha releases.) 54081: opt: add transformation rule to convert anti join to left join r=rytaft a=rytaft Release justification: bug fixes and low-risk updates to new functionality `ConvertAntiToLeftJoin` converts an anti join to a left join with the same `ON` condition, wraps the expression in a `Select` to remove rows that matched the `ON` condition, and then projects out the left side columns. For example (assuming `x` is a not-null column in `b`): ``` SELECT * FROM a WHERE NOT EXISTS ( SELECT * FROM b WHERE ST_Intersects(a.geom, b.geom) ); ``` is converted to: ``` SELECT a.* FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom) WHERE b.x IS NULL; ``` Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial anti joins, which was not possible before. Informs #53576 Release note (performance improvement): spatial anti joins can now be index-accelerated, which can lead to performance improvements in some cases. Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

Release justification: bug fixes and low-risk updates to new functionality ConvertAntiToLeftJoin converts an anti join to a left join with the same ON condition, wraps the expression in a Select to remove rows that matched the ON condition, and then projects out the left side columns. For example (assuming x is a not-null column in b): SELECT * FROM a WHERE NOT EXISTS ( SELECT * FROM b WHERE ST_Intersects(a.geom, b.geom) ); is converted to: SELECT a.* FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom) WHERE b.x IS NULL; Note that this transformation is not desirable in the general case, but it is useful if there is a possibility of creating an inverted join (such as in the above example). For this reason, we only perform this transformation if it is possible to generate an inverted join. This transformation allows us to index-accelerate spatial anti joins, which was not possible before. Informs cockroachdb#53576 Release note (performance improvement): spatial anti joins can now be index-accelerated, which can lead to performance improvements in some cases.

The paired joiners are used to accomplish left {outer,semi,anti} joins when the first joiner will produce false positives (as known to the optimizer). Currently, only the invertedJoiner can function as this first joiner but there is wording in the spec describing how this could also be useful for join expressions that can't be fully evaluated on one non-inverted index. The first joiner outputs an additional bool column representing a continuation value. This is used to demarcate groups of consecutive rows output by the first joiner that represent the same original left row. The first joiner needs to preserve input row order (the invertedJoiner always does this). The second joiner is a lookup join and handles these groups in a special manner. The second join does not need to be order preserving. Informs cockroachdb#53576 Prior to this, the way to do: - a left outer join with an inverted join is to do an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then wrap the expression in another left join with the original left side. - a left anti join with an inverted join is to map it to a left outer join (previous bullet). - a left semi join is to map it to an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then project, sort (or make the lookup join order preserving) and dedup. We expect that the alternative outlined in this PR (it excludes the optimizer changes) will be more efficient since it is simply a pairing of inverted joiner and lookup join (and the latter does not need to be order preserving). Release note: None

sumeerbhola · 2020-09-22T00:49:31Z

Is there a way to fix this issue by only changing the invertedJoiner behavior?

I can't think of any. The problem here is that the inverted joiner (the first joiner in the pair of joins, in the PR I just sent for review), is one that is producing false positives. So the second joiner, which has perfect information (but we used the first joiner for efficiency, because it had an index that was suited to the join expression), has to view the stream of input rows as groups corresponding to an original left row.

The paired joiners are used to accomplish left {outer,semi,anti} joins when the first joiner will produce false positives (as known to the optimizer). Currently, only the invertedJoiner can function as this first joiner but there is wording in the spec describing how this could also be useful for join expressions that can't be fully evaluated on one non-inverted index. The first joiner outputs an additional bool column representing a continuation value. This is used to demarcate groups of consecutive rows output by the first joiner that represent the same original left row. The first joiner needs to preserve input row order (the invertedJoiner always does this). The second joiner is a lookup join and handles these groups in a special manner. The second join does not need to be order preserving. Informs cockroachdb#53576 Prior to this, the way to do: - a left outer join with an inverted join is to do an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then wrap the expression in another left join with the original left side. - a left anti join with an inverted join is to map it to a left outer join (previous bullet). - a left semi join is to map it to an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then project, sort (or make the lookup join order preserving) and dedup. We expect that the alternative outlined in this PR (it excludes the optimizer changes) will be more efficient since it is simply a pairing of inverted joiner and lookup join (and the latter does not need to be order preserving). Release note: None

54041: bulkio: Count corrupt schedules. r=miretskiy a=miretskiy Differentiate between "bad schedules" -- schedules we could not execute, and corrupt ones (schedules we couldn't even parse). Release Notes: None Release Justification: No impact metrics change. 54639: rowexec: paired joiners to accomplish left joins r=sumeerbhola a=sumeerbhola The paired joiners are used to accomplish left {outer,semi,anti} joins when the first joiner will produce false positives (as known to the optimizer). Currently, only the invertedJoiner can function as this first joiner but there is wording in the spec describing how this could also be useful for join expressions that can't be fully evaluated on one non-inverted index. The first joiner outputs an additional bool column representing a continuation value. This is used to demarcate groups of consecutive rows output by the first joiner that represent the same original left row. The first joiner needs to preserve input row order (the invertedJoiner always does this). The second joiner is a lookup join and handles these groups in a special manner. The second join does not need to be order preserving. Informs #53576 Prior to this, the way to do: - a left outer join with an inverted join is to do an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then wrap the expression in another left join with the original left side. - a left anti join with an inverted join is to map it to a left outer join (previous bullet). - a left semi join is to map it to an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then project, sort (or make the lookup join order preserving) and dedup. We expect that the alternative outlined in this PR (it excludes the optimizer changes) will be more efficient since it is simply a pairing of inverted joiner and lookup join (and the latter does not need to be order preserving). Release note: None 54655: backupccl: add opt-in last-backup-time metric r=dt a=dt This adds a metric for the time captured by the last completed backup run by a schedule designated as updating this metric. This can be used to specify a schedule as maintaining your RPO (say you hourly cluster backup) and thus have a timeseries metric tracking that RPO which can then be monitored. Note: if an operator wants granular tracking of separate schedules to monitor the RPO of one database or another, they should use SHOW against those specific schedules or some other means, as we do not want to add an unbounded number of user-defined timeseries to our metrics. Instead, this adds only one metric, and the operator can decide, which, if any, schedules completing will update that metric. Fixes #54412. Release note (sql change): Add an option to scheduled backups to maintain a timeseries metric for last backed up timestamp. 54760: docs: Update BACKUP diagram r=lnhsingh a=lnhsingh Excluding BACKUP INTO from the syntax diagram for now. Release justification: low-risk update to SQL diagram for docs Release notes: none 54774: roachprod: Add support for io2 SSD for AWS r=miretskiy a=miretskiy Support io2 SSDs for aws. Support aws-image-ami override. Release Notes: None Release Justification: No user changes; N/A 54853: cli: fix default value in help text of max-disk-temp-storage flag r=knz a=DuskEagle Previously, in the output of `cockroach start --help`, the `--max-disk-temp-storage` flag section reported that the default value for the flag was "0 B". ``` $ cockroach start --help ... --max-disk-temp-storage bytes ... (default 0 B) ``` This is inaccurate, as the actual default value is dependent on whether the store is a disk store or an in-memory store. To fix this, this patch changes the BytesValue.String() method when the underlying value is nil. Previously, it would return "0 B", which caused our flags library to think the value was a non-zero value. By changing the method to return `"<nil>"`, our flags library recognizes the value as a zero value, and does not print an incorrect default value for the flag in our help text. Release note (cli change): The --help text for --max-disk-temp-storage now properly reports the default value. 54952: backupccl: update default backup telemetry source to ".manual" r=dt a=pbardea This commit updates the telemetry for backup to always include the source (either ".manual" or ".scheduled"). This allows for folks to search by a prefix to get the telemetry for all backups, and the specific source can be appened to filter the telemetry between manual and scheduled backups. Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Lauren <lauren@cockroachlabs.com> Co-authored-by: Joel Kenny <joel@cockroachlabs.com> Co-authored-by: Paul Bardea <pbardea@gmail.com>

The paired joiners are used to accomplish left {outer,semi,anti} joins when the first joiner will produce false positives (as known to the optimizer). Currently, only the invertedJoiner can function as this first joiner but there is wording in the spec describing how this could also be useful for join expressions that can't be fully evaluated on one non-inverted index. The first joiner outputs an additional bool column representing a continuation value. This is used to demarcate groups of consecutive rows output by the first joiner that represent the same original left row. The first joiner needs to preserve input row order (the invertedJoiner always does this). The second joiner is a lookup join and handles these groups in a special manner. The second join does not need to be order preserving. Informs #53576 Prior to this, the way to do: - a left outer join with an inverted join is to do an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then wrap the expression in another left join with the original left side. - a left anti join with an inverted join is to map it to a left outer join (previous bullet). - a left semi join is to map it to an inner join with the same ON condition, which is a pair of inverted join and lookup join, and then project, sort (or make the lookup join order preserving) and dedup. We expect that the alternative outlined in this PR (it excludes the optimizer changes) will be more efficient since it is simply a pairing of inverted joiner and lookup join (and the latter does not need to be order preserving). Release note: None

sumeerbhola · 2020-10-27T19:10:28Z

With #55216, #55742, #55986, we are now using paired joins for left outer/semi/anti joins.

rytaft added the docs-known-limitation label Aug 27, 2020

rytaft mentioned this issue Sep 4, 2020

opt: add transformation rule to convert left join to inner join #53976

Merged

rytaft mentioned this issue Sep 9, 2020

opt: add transformation rule to convert anti join to left join #54081

Merged

rytaft mentioned this issue Sep 9, 2020

release-20.2: opt: add transformation rule to convert left join to inner join #54110

Merged

rytaft mentioned this issue Sep 16, 2020

release-20.2: opt: add transformation rule to convert anti join to left join #54471

Merged

sumeerbhola mentioned this issue Sep 22, 2020

rowexec: paired joiners to accomplish left joins #54639

Merged

sumeerbhola closed this as completed Oct 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt, sql: LEFT OUTER and ANTI spatial joins cannot be index-accelerated #53576

opt, sql: LEFT OUTER and ANTI spatial joins cannot be index-accelerated #53576

rytaft commented Aug 27, 2020

sumeerbhola commented Aug 31, 2020

yuzefovich commented Aug 31, 2020

asubiotto commented Sep 1, 2020

sumeerbhola commented Sep 22, 2020

sumeerbhola commented Oct 27, 2020

opt, sql: LEFT OUTER and ANTI spatial joins cannot be index-accelerated #53576

opt, sql: LEFT OUTER and ANTI spatial joins cannot be index-accelerated #53576

Comments

rytaft commented Aug 27, 2020

sumeerbhola commented Aug 31, 2020

yuzefovich commented Aug 31, 2020

asubiotto commented Sep 1, 2020

sumeerbhola commented Sep 22, 2020

sumeerbhola commented Oct 27, 2020