Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Update YB Bitmap Scan cost model to match Yugabyte details #20573

Closed
1 task done
timothy-e opened this issue Jan 11, 2024 · 0 comments
Closed
1 task done

[YSQL] Update YB Bitmap Scan cost model to match Yugabyte details #20573

timothy-e opened this issue Jan 11, 2024 · 0 comments
Assignees
Labels
2024.1.1_blocker area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@timothy-e
Copy link
Contributor

timothy-e commented Jan 11, 2024

Jira Link: DB-9574

Description

In the initial bitmap scan implementation, the YB Bitmap Table Scan cost model is the same as the Postgres Bitmap Heap Scan cost. Update this to match the changes made for Yugabyte.

This includes accounting for remote pushdown - there are cases where BitmapAnd is not necessary.

Issue Type

kind/enhancement

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@timothy-e timothy-e added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jan 11, 2024
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jan 11, 2024
@timothy-e timothy-e self-assigned this Feb 22, 2024
timothy-e added a commit that referenced this issue Mar 12, 2024
Summary:
== Bitmap Scans ==
Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table.

In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans:
* **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`.
* **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan.
* **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.)
* **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options.
* **are controlled by `enable_bitmapscan`**
* **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax **
* ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that.
* **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`.
* **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred.
* **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up.

Yugabyte bitmap scans do **not** currently:
* **have an accurate cost model:** [[ #20573 | #20573 ]]
* **spill to disk when exceeding work_mem:** [[ #20576 | #20576 ]]
* **parallelise bitmap table scans**: [[ #20575 | #20575 ]]
* **support GIN indexes**: [[ #20574 | #20574 ]]
* **skip fetching ybctids from YB Bitmap Table Scan**: [[ #21036 | #21036 ]]
* **parallelise bitmap index scans**: [[ #21037 | #21037 ]]

See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing
Jira: DB-2547

Test Plan:
```lang=sh
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
```

Stress test on a 10 million row table:
```lang=sql
CREATE TABLE large (a INT, b INT);
CREATE INDEX ON large (a ASC);
CREATE INDEX ON large (b ASC);
INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i;
```

Bitmap Scan:
```
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
NOTICE:  exceeded work_mem, switching to full table scan
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1)
   Storage Table Read Requests: 9767
   Storage Table Read Execution Time: 6288.534 ms
   Storage Table Rows Scanned: 10000000
   Exceeded work_mem: true
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 171
               Storage Index Read Execution Time: 199.258 ms
               Storage Index Rows Scanned: 175104
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1)
               Index Cond: (b > 0)
 Planning Time: 10.735 ms
 Execution Time: 15493.938 ms
 Storage Read Requests: 9938
 Storage Read Execution Time: 6487.792 ms
 Storage Rows Scanned: 10175104
 Storage Write Requests: 0
 Catalog Read Requests: 8
 Catalog Read Execution Time: 6.969 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 6494.761 ms
 Peak Memory Usage: 100 kB
(25 rows)
```

Sequential Scan:
```
EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on large  (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1)
   Storage Filter: ((a > 0) OR (b > 0))
   Storage Table Read Requests: 4885
   Storage Table Read Execution Time: 7578.189 ms
   Storage Table Rows Scanned: 10000000
 Planning Time: 0.082 ms
 Execution Time: 14737.837 ms
 Storage Read Requests: 4885
 Storage Read Execution Time: 7578.189 ms
 Storage Rows Scanned: 10000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7578.189 ms
 Peak Memory Usage: 24 kB
(16 rows)
```

Bitmap scan with high work_mem:
```
SET work_mem TO '1GB';
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1)
   Storage Table Read Requests: 9766
   Storage Table Read Execution Time: 78226.718 ms
   Storage Table Rows Scanned: 10000000
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11136.148 ms
               Storage Index Rows Scanned: 10000000
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1)
               Index Cond: (b > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11419.616 ms
               Storage Index Rows Scanned: 10000000
 Planning Time: 0.153 ms
 Execution Time: 169818.045 ms
 Storage Read Requests: 29298
 Storage Read Execution Time: 100782.482 ms
 Storage Rows Scanned: 30000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 100782.482 ms
 Peak Memory Usage: 1217840 kB
(26 rows)
```

Reviewers: tnayak, amartsinchyk

Reviewed By: tnayak

Subscribers: smishra, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D32015
timothy-e added a commit that referenced this issue Mar 15, 2024
Summary:
To avoid any regressions with CBO on, disable bitmap scan
until we have better defaults for costing Bitmap Scans with
CBO. Tracked by #20573
Jira: DB-10362

Test Plan:
Jenkins: urgent, test regex: .*TestPgRegress.*

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressContribPgTrgm'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressGin'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressThirdPartyExtensionsPgHintPlan'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPartitions'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPgSelect'
```

Reviewers: smishra

Reviewed By: smishra

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D33161
timothy-e added a commit that referenced this issue Mar 15, 2024
Summary:
Original commit: b038851 / D33161
To avoid any regressions with CBO on, disable bitmap scan
until we have better defaults for costing Bitmap Scans with
CBO. Tracked by #20573
Jira: DB-10362

Test Plan:
Jenkins: urgent, test regex: .*TestPgRegress.*

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressContribPgTrgm'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressGin'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressThirdPartyExtensionsPgHintPlan'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPartitions'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPgSelect'
```

Reviewers: smishra

Reviewed By: smishra

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33187
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
Summary:
== Bitmap Scans ==
Bitmap Scans use multiple indexes to answer a query, with only one scan of the main table. Each index produces a “bitmap” indicating which rows of the main table are interesting. Multiple bitmaps can be combined with AND or OR operators to create a final bitmap which is used to collect rows from the main table.

In this diff, initial support for Yugabyte bitmap scans is added. Yugabyte bitmap scans:
* **have two nodes:** `Bitmap Index Scan` shared with PG, and `YB Bitmap Table Scan`.
* **support exact results only**: When PG approaches work_mem, it switches to a lossy bitmap that identifies specific pages of the heap relation to access. When YB approaches work_mem, it simply switches to a full sequential scan.
* **support arbitary nesting of logical operands and index conditions**. Assuming the same indexes and support for conditions, any arrangement of ANDs, ORs, and index scans that Postgres supports are also supported by Yugabyte because the code is reused. Note that Yugabyte is currently extremely unlikely to use an `AND` condition because the planner doesn't account for the cost of each bitmap index access being half the cost of a normal index access. (Bitmap index scans only need to access the ybctids, but normal index scans access ybctids + use them to query the main table.)
* **allow conditions with `AND` to be used as a filter**. This currently happens for all `AND` conditions, see the above bullet point. But with a hack to the planner, `Bitmap And` and filters are both valid options.
* **are controlled by `enable_bitmapscan`**
* **are controlled by pg hint plan's `/*+ BitmapScan(tab) */` syntax **
* ** only print the recheck condition when recheck is required**. This is contrary to PG behavior, but since fetching unnecessary rows has a higher cost in YB, it’s better to clearly communicate that.
* **show distributed storage stats when `EXPLAIN (ANALYZE, DIST)` is run**. `Storage Index Read Requests` are shown under `Bitmap Index Scans` on a secondary index, `Storage Table Read Requests` on`Bitmap Index Scans` on a primary index or `YB Bitmap Table Scans`.
* **do not use unbounded memory**. Similar to Postgres, each bitmap is bounded by `work_mem` (although PG follows this bound loosely, i.e. if it can't lossify the bitmap enough it will exceed it). In the current draft, if Yugabyte exceeds work_mem, we stop collecting ybctids and run a full table scan. There is some output in the explain plan to indicate this occurred.
* **output average ybctid size when `EXPLAIN (ANALYZE, DEBUG)` is run**. This can be used to get an estimate of how much memory a bitmap scan might take up.

Yugabyte bitmap scans do **not** currently:
* **have an accurate cost model:** [[ yugabyte#20573 | yugabyte#20573 ]]
* **spill to disk when exceeding work_mem:** [[ yugabyte#20576 | yugabyte#20576 ]]
* **parallelise bitmap table scans**: [[ yugabyte#20575 | yugabyte#20575 ]]
* **support GIN indexes**: [[ yugabyte#20574 | yugabyte#20574 ]]
* **skip fetching ybctids from YB Bitmap Table Scan**: [[ yugabyte#21036 | yugabyte#21036 ]]
* **parallelise bitmap index scans**: [[ yugabyte#21037 | yugabyte#21037 ]]

See the design document here: https://docs.google.com/document/d/1pRUsXZDywzrZTQpTfXZNTZckh-8tTtkfOcurUoybTr0/edit?usp=sharing
Jira: DB-2547

Test Plan:
```lang=sh
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressYbBitmapScans'
```

Stress test on a 10 million row table:
```lang=sql
CREATE TABLE large (a INT, b INT);
CREATE INDEX ON large (a ASC);
CREATE INDEX ON large (b ASC);
INSERT INTO large SELECT i, i FROM generate_series(1, 10000000) i;
```

Bitmap Scan:
```
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
NOTICE:  exceeded work_mem, switching to full table scan
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=233.920..10990.324 rows=10000000 loops=1)
   Storage Table Read Requests: 9767
   Storage Table Read Execution Time: 6288.534 ms
   Storage Table Rows Scanned: 10000000
   Exceeded work_mem: true
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=231.783..231.783 rows=0 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=231.771..231.771 rows=0 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 171
               Storage Index Read Execution Time: 199.258 ms
               Storage Index Rows Scanned: 175104
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=0.001..0.001 rows=0 loops=1)
               Index Cond: (b > 0)
 Planning Time: 10.735 ms
 Execution Time: 15493.938 ms
 Storage Read Requests: 9938
 Storage Read Execution Time: 6487.792 ms
 Storage Rows Scanned: 10175104
 Storage Write Requests: 0
 Catalog Read Requests: 8
 Catalog Read Execution Time: 6.969 ms
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 6494.761 ms
 Peak Memory Usage: 100 kB
(25 rows)
```

Sequential Scan:
```
EXPLAIN (ANALYZE, DIST) SELECT * FROM large WHERE a > 0 OR b > 0;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Seq Scan on large  (cost=0.00..105.00 rows=1000 width=8) (actual time=2.721..9896.584 rows=10000000 loops=1)
   Storage Filter: ((a > 0) OR (b > 0))
   Storage Table Read Requests: 4885
   Storage Table Read Execution Time: 7578.189 ms
   Storage Table Rows Scanned: 10000000
 Planning Time: 0.082 ms
 Execution Time: 14737.837 ms
 Storage Read Requests: 4885
 Storage Read Execution Time: 7578.189 ms
 Storage Rows Scanned: 10000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 7578.189 ms
 Peak Memory Usage: 24 kB
(16 rows)
```

Bitmap scan with high work_mem:
```
SET work_mem TO '1GB';
EXPLAIN (ANALYZE, DIST) /*+ BitmapScan(large) */ SELECT * FROM large WHERE a > 0 OR b > 0;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 YB Bitmap Table Scan on large  (cost=6.91..11.21 rows=10 width=8) (actual time=76779.066..162561.046 rows=10000000 loops=1)
   Storage Table Read Requests: 9766
   Storage Table Read Execution Time: 78226.718 ms
   Storage Table Rows Scanned: 10000000
   ->  BitmapOr  (cost=6.91..6.91 rows=20 width=0) (actual time=73603.559..73603.559 rows=10000000 loops=1)
         ->  Bitmap Index Scan on large_a_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36774.895..36774.895 rows=10000000 loops=1)
               Index Cond: (a > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11136.148 ms
               Storage Index Rows Scanned: 10000000
         ->  Bitmap Index Scan on large_b_idx  (cost=0.00..3.45 rows=10 width=0) (actual time=36828.655..36828.655 rows=10000000 loops=1)
               Index Cond: (b > 0)
               Storage Index Read Requests: 9766
               Storage Index Read Execution Time: 11419.616 ms
               Storage Index Rows Scanned: 10000000
 Planning Time: 0.153 ms
 Execution Time: 169818.045 ms
 Storage Read Requests: 29298
 Storage Read Execution Time: 100782.482 ms
 Storage Rows Scanned: 30000000
 Storage Write Requests: 0
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 0
 Storage Execution Time: 100782.482 ms
 Peak Memory Usage: 1217840 kB
(26 rows)
```

Reviewers: tnayak, amartsinchyk

Reviewed By: tnayak

Subscribers: smishra, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D32015
asrinivasanyb pushed a commit to asrinivasanyb/yugabyte-db that referenced this issue Mar 18, 2024
Summary:
To avoid any regressions with CBO on, disable bitmap scan
until we have better defaults for costing Bitmap Scans with
CBO. Tracked by yugabyte#20573
Jira: DB-10362

Test Plan:
Jenkins: urgent, test regex: .*TestPgRegress.*

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressContribPgTrgm'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressGin'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressThirdPartyExtensionsPgHintPlan'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPartitions'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressPgSelect'
```

Reviewers: smishra

Reviewed By: smishra

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D33161
timothy-e added a commit that referenced this issue May 13, 2024
Summary:
Implemented CBO for Bitmap Scans, using the same approach as PG: store a partially calculated index cost and use that to compute bitmap index access.

If Bitmap Index Scans or Bitmap Ors are predicted to exceed work_mem, they'll have `disable_cost / 2` added to their cost. This discourages the planner from using them,
but still costs them lower than other actually disabled scan types. Without this change, `/*+ BitmapScan(test) */ SELECT * FROM t1000000 ...` will use a sequential scan because it's cheaper, even though the sequential scan has been disabled by the hint.

To better determine if a Bitmap Or will exceed work_mem, more work has been done to determine an estimate of how many rows a Bitmap Or will return.

Also, a slight refactor was done: I put estimated seeks, nexts, width into a YbPlanInfo struct to reduce code duplication of printing / assigning those three values.
Jira: DB-9574

Test Plan:
Jenkins: compile only

Latest: [[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/231/artifact/taqo/report/20240501-185328/index_yb_vs_yb_cost-validation_.html | TAQO YB CBO with Bitmap Scans enabled / disabled ]]
* Very few regressions in the default case - highest regression was 1.26 on query e57fd1caa18f59c4e40138d377ec2a01. I checked each regression above 1.1, and they were all run-to-run variances using the same execution plan.

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/224/artifact/taqo/report/20240418-134354/index_yb_vs_yb_cost-validation_bitmap_scan_yb_vs_pg.html | TAQO Run vs PG ]]

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/225/artifact/taqo/report/20240419-131907/index_yb_vs_yb_cost-validation_bitmap_scan_ybstatsonly_vs_pg.html | TAQO Run PG vs YB with only stats ]]

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgBaseScansCostModel'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgCostModelSeekNextEstimation'
```

Reviewers: gkukreja, mtakahara, amartsinchyk, tnayak

Reviewed By: gkukreja

Subscribers: gkukreja, yql

Differential Revision: https://phorge.dev.yugabyte.com/D33861
svarnau pushed a commit that referenced this issue May 25, 2024
Summary:
Implemented CBO for Bitmap Scans, using the same approach as PG: store a partially calculated index cost and use that to compute bitmap index access.

If Bitmap Index Scans or Bitmap Ors are predicted to exceed work_mem, they'll have `disable_cost / 2` added to their cost. This discourages the planner from using them,
but still costs them lower than other actually disabled scan types. Without this change, `/*+ BitmapScan(test) */ SELECT * FROM t1000000 ...` will use a sequential scan because it's cheaper, even though the sequential scan has been disabled by the hint.

To better determine if a Bitmap Or will exceed work_mem, more work has been done to determine an estimate of how many rows a Bitmap Or will return.

Also, a slight refactor was done: I put estimated seeks, nexts, width into a YbPlanInfo struct to reduce code duplication of printing / assigning those three values.
Jira: DB-9574

Test Plan:
Jenkins: compile only

Latest: [[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/231/artifact/taqo/report/20240501-185328/index_yb_vs_yb_cost-validation_.html | TAQO YB CBO with Bitmap Scans enabled / disabled ]]
* Very few regressions in the default case - highest regression was 1.26 on query e57fd1caa18f59c4e40138d377ec2a01. I checked each regression above 1.1, and they were all run-to-run variances using the same execution plan.

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/224/artifact/taqo/report/20240418-134354/index_yb_vs_yb_cost-validation_bitmap_scan_yb_vs_pg.html | TAQO Run vs PG ]]

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/225/artifact/taqo/report/20240419-131907/index_yb_vs_yb_cost-validation_bitmap_scan_ybstatsonly_vs_pg.html | TAQO Run PG vs YB with only stats ]]

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgBaseScansCostModel'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgCostModelSeekNextEstimation'
```

Reviewers: gkukreja, mtakahara, amartsinchyk, tnayak

Reviewed By: gkukreja

Subscribers: gkukreja, yql

Differential Revision: https://phorge.dev.yugabyte.com/D33861
svarnau pushed a commit that referenced this issue May 29, 2024
…of Bitmap Scan CBO

Summary:
Original commits:
* D33861 / eac0c26
* D35044 / ca5b8e6
* D35045 / 49a8e47

Changes:
* There are some differences in actual number of seqs and nexts in the table scan. Fix these by just adding `_IgnoreActualResults` to the tests in `testSeekNextEstimationBitmapScan`

== Summary from D33861 ==
Implemented CBO for Bitmap Scans, using the same approach as PG: store a partially calculated index cost and use that to compute bitmap index access.

If Bitmap Index Scans or Bitmap Ors are predicted to exceed work_mem, they'll have `disable_cost / 2` added to their cost. This discourages the planner from using them,
but still costs them lower than other actually disabled scan types. Without this change, `/*+ BitmapScan(test) */ SELECT * FROM t1000000 ...` will use a sequential scan because it's cheaper, even though the sequential scan has been disabled by the hint.

To better determine if a Bitmap Or will exceed work_mem, more work has been done to determine an estimate of how many rows a Bitmap Or will return.

Also, a slight refactor was done: I put estimated seeks, nexts, width into a YbPlanInfo struct to reduce code duplication of printing / assigning those three values.
Jira: DB-9574

== Summary from D35044 ==
[#22387] YSQL: Fix Bitmap Scan CBO tests

2338431 / D34646 introduced a flag `yb_enable_bitmapscan = false`,
so YB bitmap scans are disabled by default, even when `enable_bitmapscan` is true.

At the same time, eac0c26 / D33861 introduced CBO for bitmap
scans. These tests passed before `yb_enable_bitmapscan` was added, but once it was added, they
started failing because they stopped using bitmap scans.

Fix the tests by setting `yb_enable_bitmapscan = true`.
Jira: DB-11284

== Summary from D35045 ==
[##22388] YSQL: Change name of bitmap_exceeded_work_mem_cost

Because `bitmap_exceeded_work_mem_cost` doesn't contain the phrase `disable_cost`, it's easy to
miss when someone try to find all the places that inflates the cost values, e.g.
investigate into cost overflow, precision losses with complex plans, etc.

Change the name to `bitmap_exceeded_work_mem_disable_cost`.
Jira: DB-11285

Test Plan:
Latest: [[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/231/artifact/taqo/report/20240501-185328/index_yb_vs_yb_cost-validation_.html | TAQO YB CBO with Bitmap Scans enabled / disabled ]]
* Very few regressions in the default case - highest regression was 1.26 on query e57fd1caa18f59c4e40138d377ec2a01. I checked each regression above 1.1, and they were all run-to-run variances using the same execution plan.

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/224/artifact/taqo/report/20240418-134354/index_yb_vs_yb_cost-validation_bitmap_scan_yb_vs_pg.html | TAQO Run vs PG ]]

[[ https://jenkins.dev.yugabyte.com/job/optimizer/job/generate-report/225/artifact/taqo/report/20240419-131907/index_yb_vs_yb_cost-validation_bitmap_scan_ybstatsonly_vs_pg.html | TAQO Run PG vs YB with only stats ]]

```
./yb_build.sh --java-test 'org.yb.pgsql.TestPgBaseScansCostModel'
./yb_build.sh --java-test 'org.yb.pgsql.TestPgCostModelSeekNextEstimation'
```

Reviewers: gkukreja, mtakahara, amartsinchyk, tnayak

Reviewed By: mtakahara

Subscribers: yql, gkukreja

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024.1.1_blocker area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants