Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use like to build range when new collation is enabled #48181

Closed
King-Dylan opened this issue Nov 1, 2023 · 7 comments · Fixed by #48522
Closed

Can't use like to build range when new collation is enabled #48181

King-Dylan opened this issue Nov 1, 2023 · 7 comments · Fixed by #48522
Assignees
Labels
affects-6.1 affects-6.5 affects-7.1 affects-7.5 found/gs found by gs report/customer Customers have encountered this bug. sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.

Comments

@King-Dylan
Copy link
Contributor

Bug Report

After pr 31278 , we can't use like to build range when new collation is enabled. This doesn't make sense and will cause bad sql performance.
Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

CREATE TABLE `t1` (
  `id` varchar(40) COLLATE utf8mb4_general_ci NOT NULL,
  `name` varchar(100) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `password` varchar(100) COLLATE utf8mb4_general_ci DEFAULT NULL,
  KEY `ix_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

2. What did you expect to see? (Required)

mysql> explain SELECT * FROM t1 cm use index(ix_name) WHERE cm.name LIKE  '18620391343%';
+-------------------------------+---------+-----------+--------------------------------------------------------+---------------------------------------------------------------------+
| id                            | estRows | task      | access object                                          | operator info                                                       |
+-------------------------------+---------+-----------+--------------------------------------------------------+---------------------------------------------------------------------+
| IndexLookUp_7                 | 250.00  | root      |                                                        |                                                                     |
| ├─IndexRangeScan_5(Build) | 250.00  | cop[tikv] | table:cm, index:ix_name(name) | range:["18620391343","18620391344"), keep order:false, stats:pseudo |
| └─TableRowIDScan_6(Probe) | 250.00  | cop[tikv] | table:cm                                               | keep order:false, stats:pseudo                                      |
+-------------------------------+---------+-----------+--------------------------------------------------------+---------------------------------------------------------------------+
3 rows in set (0.01 sec)

3. What did you see instead (Required)

mysql> explain SELECT * FROM t1 cm use index(ix_name) WHERE cm.name LIKE  '18620391343%';
+-------------------------------+----------+-----------+--------------------------------------------------------+---------------------------------------------------------+
| id                            | estRows  | task      | access object                                          | operator info                                           |
+-------------------------------+----------+-----------+--------------------------------------------------------+---------------------------------------------------------+
| IndexLookUp_8                 | 8000.00  | root      |                                                        |                                                         |
| ├─Selection_7(Build)      | 8000.00  | cop[tikv] |                                                        | like(test.company_member.logonname, "18620391343%", 92) |
| │ └─IndexFullScan_5     | 10000.00 | cop[tikv] | table:cm, index:ix_name(name) | keep order:false, stats:pseudo                          |
| └─TableRowIDScan_6(Probe) | 8000.00  | cop[tikv] | table:cm                                               | keep order:false, stats:pseudo                          |
+-------------------------------+----------+-----------+--------------------------------------------------------+---------------------------------------------------------+
4 rows in set (0.01 sec)

4. What is your TiDB version? (Required)

v6.5.3

@King-Dylan King-Dylan added the type/bug The issue is confirmed as a bug. label Nov 1, 2023
@jackysp
Copy link
Member

jackysp commented Nov 1, 2023

/found gs

@time-and-fate
Copy link
Member

Converting this issue to an enhancement since it's a known limitation, as said in #31278, and needs extra work for implementation and testing.

@time-and-fate
Copy link
Member

/type enhancement

@ti-chi-bot ti-chi-bot bot added the type/enhancement The issue or PR belongs to an enhancement. label Nov 10, 2023
@time-and-fate
Copy link
Member

/remove-type bug

@ti-chi-bot ti-chi-bot bot removed the type/bug The issue is confirmed as a bug. label Nov 10, 2023
@time-and-fate
Copy link
Member

Background

To implement this feature, the key is to build the range for the wildcard case correctly. In brief, our target is to build sortKey(chars) as the start key and sortKey(chars)+1 as the end key where the chars is the part in the pattern before the wildcard.
This sounds simple, but currently, we don't convert the value to the sort key in this stage (newBuildFromPatternLike of "points builder" in util/ranger), we do this conversion in the executor.
Therefore, we need to restructure many code paths to enable us to safely convert the values to the sort key in the ranger, specifically, in newBuildFromPatternLike.

Implementation

  1. Currently, cutting prefix for the prefix index happens after newBuildFromPatternLike. Cutting prefix requires the original value (because prefix length is calculated by characters instead of bytes), so we must first move the cutting prefix logic into "points builder".
  2. Convert the string values to the sort key in the "points builder", and correctly build the range for the LIKE function.
  3. Some usages of the "points builder" require the original value instead of the sort key, so we need to provide an option to the "points builder" so the caller can choose.
  4. After the "points builder", there are some type conversion works, we must make them correctly handle the newly introduced sort key case.
  5. Remove the current limitation that we can't use LIKE function on new collation column to build ranges in the "checker".

Tests

Construct tables and data, and check the execution plan and query result for the following cases:

Charset

  • ascii
  • latin1
  • gbk
  • utf8*

Collation

  • _bin
  • *_ci

Index

  • Normal index
  • Prefix index
  • Multi-column index
  • Unique key
  • Primary key (clustered)
  • Primary key (nonclustered)

LIKE pattern

  • Empty / All spaces
  • Contains / Not contain wildcards
  • Starts / Ends with wildcards
  • Wildcards: % / _
  • Mix spaces in non-space characters

time-and-fate added a commit to time-and-fate/tidb that referenced this issue Dec 5, 2023
time-and-fate added a commit to time-and-fate/tidb that referenced this issue Dec 5, 2023
ti-chi-bot bot pushed a commit that referenced this issue Dec 6, 2023
ti-chi-bot bot pushed a commit that referenced this issue Dec 6, 2023
ti-chi-bot bot pushed a commit that referenced this issue Dec 8, 2023
time-and-fate added a commit to time-and-fate/tidb that referenced this issue Feb 19, 2024
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 affects-6.5 affects-7.1 affects-7.5 found/gs found by gs report/customer Customers have encountered this bug. sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
7 participants