Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition by RANGE COLUMNS on varchar column with collate _bin does not adjust pruning if query includes other collation #32749

Closed
mjonss opened this issue Mar 2, 2022 · 6 comments · Fixed by #33742
Assignees
Labels
affects-6.0 affects-6.1 component/tablepartition This issue is related to Table Partition of TiDB. feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@mjonss
Copy link
Contributor

mjonss commented Mar 2, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table t (a varchar(255) charset utf8mb4 collate utf8mb4_bin) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
insert into t values ("aa");
explain select * from t where a = "aa" collate utf8mb4_general_ci;
select * from t where a = "aa" collate utf8mb4_general_ci;

2. What did you expect to see? (Required)

Skip pruning (or have some way to take into consideration the given collation), like MySQL:

mysql> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_bin) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,02 sec)

mysql> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

mysql> explain select * from t where a = "aa" collate utf8mb4_general_ci;
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | puppera,plowera | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

mysql> select * from t where a = "aa" collate utf8mb4_general_ci;
+------+
| a    |
+------+
| aa   |
+------+
1 row in set (0,00 sec)

3. What did you see instead (Required)

Using the same pruning as if the collate was not given in the select query:

tidb> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_bin) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,00 sec)

tidb> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

tidb> explain select * from t where a = "aa" collate utf8mb4_general_ci;
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| id                      | estRows  | task      | access object              | operator info                  |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| TableReader_8           | 8000.00  | root      |                            | data:Selection_7               |
| └─Selection_7           | 8000.00  | cop[tikv] |                            | eq(test.t.a, "aa")             |
|   └─TableFullScan_6     | 10000.00 | cop[tikv] | table:t, partition:puppera | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
3 rows in set (0,00 sec)

tidb> select * from t where a = "aa" collate utf8mb4_general_ci;
Empty set (0,00 sec)

4. What is your TiDB version? (Required)

tidb_version(): Release Version: v5.5.0-alpha-442-gfbfe36ca7d
Edition: Community
Git Commit Hash: fbfe36ca7dd3a78a714872d468cebfce34e50116
Git Branch: master
UTC Build Time: 2022-03-02 11:21:12
GoVersion: go1.17
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
@mjonss mjonss added the type/bug The issue is confirmed as a bug. label Mar 2, 2022
@mjonss
Copy link
Contributor Author

mjonss commented Mar 2, 2022

Similar with the other way around (column defined with utf8mb4_general_ci but queried with utf8mb4_bin), notice that MySQL does pruning on the correct partition instead of loosening the pruning to all partitions!

mysql> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_general_ci) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,01 sec)

mysql> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

mysql> explain select * from t where a = "aa" collate utf8mb4_bin;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | puppera    | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

mysql> select * from t where a = "aa" collate utf8mb4_bin;
+------+
| a    |
+------+
| aa   |
+------+
1 row in set (0,00 sec)

^^ MySQL vs TiDB vv

tidb> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_general_ci) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,00 sec)

tidb> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

tidb> explain select * from t where a = "aa" collate utf8mb4_bin;
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| id                      | estRows  | task      | access object              | operator info                  |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| TableReader_8           | 8000.00  | root      |                            | data:Selection_7               |
| └─Selection_7           | 8000.00  | cop[tikv] |                            | eq(test.t.a, "aa")             |
|   └─TableFullScan_6     | 10000.00 | cop[tikv] | table:t, partition:plowera | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
3 rows in set (0,00 sec)

tidb> select * from t where a = "aa" collate utf8mb4_bin;
Empty set (0,01 sec)

@mjonss
Copy link
Contributor Author

mjonss commented Mar 2, 2022

/component tablepartition

@ti-chi-bot ti-chi-bot added the component/tablepartition This issue is related to Table Partition of TiDB. label Mar 2, 2022
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. labels Mar 3, 2022
@seiya-annie seiya-annie added sig/sql-infra SIG: SQL Infra and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. labels Mar 3, 2022
@zimulala zimulala added the feature/developing the related feature is in development label Mar 3, 2022
@mjonss mjonss self-assigned this Mar 31, 2022
@bb7133
Copy link
Member

bb7133 commented Mar 31, 2022

Hi @mjonss , some updates from me:

  1. "aa" is inserted into plowera, which is expected("AAA" < "aa" < "aaaa" for utf8mb4_bin).
  2. When doing the partition pruning, the pruning function partitionRangeForExpr follows the collation rules and make comparison based on collation utf8mb4_general_ci, which gives "aa" < "AAA" < "aaaa" for utf8mb4_general_ci, so puppera is chosen but it is empty(wrong result).

The collation-aware comparison pruning sounds reasonable but it leads to the wrong result...I don't know how it should be fixed.

BTW, the pruning was collation-aware after #20879.

PTAL @xiongjiwei and @wjhuang2016 , any idea on this?

@xiongjiwei
Copy link
Contributor

xiongjiwei commented Apr 1, 2022

if the used collation of query is not the same as partition column collation, I think there is no way to prune unless comparing them one by one.

e.g.

tidb> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_bin) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,00 sec)

tidb> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

tidb> explain select * from t where a = "aa" collate utf8mb4_general_ci;

compare aa and AAA using utf8mb4_general_ci and then aaaa and collect the qualified partition as the pruning result.

@bb7133
Copy link
Member

bb7133 commented Apr 1, 2022

Similar with the other way around (column defined with utf8mb4_general_ci but queried with utf8mb4_bin), notice that MySQL does pruning on the correct partition instead of loosening the pruning to all partitions!

mysql> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_general_ci) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,01 sec)

mysql> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

mysql> explain select * from t where a = "aa" collate utf8mb4_bin;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | puppera    | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0,00 sec)

mysql> select * from t where a = "aa" collate utf8mb4_bin;
+------+
| a    |
+------+
| aa   |
+------+
1 row in set (0,00 sec)

^^ MySQL vs TiDB vv

tidb> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_general_ci) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"));
Query OK, 0 rows affected (0,00 sec)

tidb> insert into t values ("aa");
Query OK, 1 row affected (0,00 sec)

tidb> explain select * from t where a = "aa" collate utf8mb4_bin;
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| id                      | estRows  | task      | access object              | operator info                  |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
| TableReader_8           | 8000.00  | root      |                            | data:Selection_7               |
| └─Selection_7           | 8000.00  | cop[tikv] |                            | eq(test.t.a, "aa")             |
|   └─TableFullScan_6     | 10000.00 | cop[tikv] | table:t, partition:plowera | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+----------------------------+--------------------------------+
3 rows in set (0,00 sec)

tidb> select * from t where a = "aa" collate utf8mb4_bin;
Empty set (0,01 sec)

Yes, IMO I think your case is challenge for a database to do the 'correct prune'. BTW are you using MySQL 8? I found that for MySQL 5.7, it did "loosening the pruning to all partitions":

mysql> explain select * from t where a = "aa" collate utf8mb4_bin;
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions      | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | puppera,plowera | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+-----------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.7.35-debug |
+--------------+
1 row in set (0.00 sec)

So some enhancements have been done in MySQL 8.

@xiongjiwei
Copy link
Contributor

mysql> create table t (a varchar(255) charset utf8mb4 collate utf8mb4_bin) partition by range columns (a) (partition puppera values less than ("AAA"), partition plowera values less than ("aaaa"), partition pp values less than ('hhhh'));
Query OK, 0 rows affected (0.09 sec)

mysql> insert into t values ("aa");
Query OK, 1 row affected (0.01 sec)

mysql> explain select * from t where a = "aa" collate utf8mb4_general_ci;
+----+-------------+-------+--------------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions         | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+--------------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | puppera,plowera,pp | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where |
+----+-------------+-------+--------------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

even though we know partition pp can be pruned, MySQL still accesses it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.0 affects-6.1 component/tablepartition This issue is related to Table Partition of TiDB. feature/developing the related feature is in development severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants