Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The result of selecting with hexadecimal is wrong when character set utf8 #23507

Closed
wan1y opened this issue Mar 24, 2021 · 8 comments
Closed
Assignees
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@wan1y
Copy link

wan1y commented Mar 24, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

mysql> CREATE TABLE `t1`  (
  `COL1` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,PRIMARY KEY (`COL1`(5)) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
mysql> insert into t1 values("ý忑辦孈策炠槝衧魮與");
mysql> insert into t1 values("ǎ傦眢否畬傮Ȕ炏芭裪");

2. What did you expect to see? (Required)

mysql> select * from t1 where col1 between 0xC78EE582A6E79CA2E590A6E795ACE582AEC894E7828FE88AADE8A3AA and 0xC3BDE5BF91E8BEA6E5AD88E7AD96E782A0E6A79DE8A1A7E9ADAEE88887;
+---------------------+
| COL1                |
+---------------------+
| ǎ傦眢否畬傮Ȕ炏芭裪  |
| ý忑辦孈策炠槝衧魮與 |
+---------------------+
2 rows in set (0.07 sec)

3. What did you see instead (Required)

mysql> select * from t1 where col1 between 0xC78EE582A6E79CA2E590A6E795ACE582AEC894E7828FE88AADE8A3AA and 0xC3BDE5BF91E8BEA6E5AD88E7AD96E782A0E6A79DE8A1A7E9ADAEE88887;
Empty set

4. What is your TiDB version? (Required)

Release Version: v4.0.0-beta.2-2390-gfd706ab76
Edition: Community
Git Commit Hash: fd706ab76bd09ac859aa0a4de7fe9e07da3c5508
Git Branch: master
UTC Build Time: 2021-03-17 11:37:12
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
@wan1y wan1y added the type/bug The issue is confirmed as a bug. label Mar 24, 2021
@kennytm
Copy link
Contributor

kennytm commented Mar 24, 2021

This is likely just the issue that "new collation" (utf8_general_ci) is not enabled.

select 'ǎ' collate utf8mb4_general_ci < 'ý' collate utf8mb4_general_ci;
-- returns 1 on MySQL 8.0.
-- returns 0 on TiDB nightly with old collation.

The empty output is because x between y and z is equivalent to x <= y and y <= z and if y > z this will always return false. Simplified:

CREATE TABLE `t1`  (
  `COL1` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,PRIMARY KEY (`COL1`(5)) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
insert into t1 values("ý");
insert into t1 values("ǎ");
select * from t1 where col1 between 0xC78E and 0xC3BD;
-- Not empty on MySQL 8
-- Empty on TiDB + old_collation

@aytrack aytrack closed this as completed Mar 24, 2021
@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

@wan1y
Copy link
Author

wan1y commented Mar 24, 2021

This is likely just the issue that "new collation" (utf8_general_ci) is not enabled.

select 'ǎ' collate utf8mb4_general_ci < 'ý' collate utf8mb4_general_ci;
-- returns 1 on MySQL 8.0.
-- returns 0 on TiDB nightly with old collation.

The empty output is because x between y and z is equivalent to x <= y and y <= z and if y > z this will always return false. Simplified:

CREATE TABLE `t1`  (
  `COL1` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,PRIMARY KEY (`COL1`(5)) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;
insert into t1 values("ý");
insert into t1 values("ǎ");
select * from t1 where col1 between 0xC78E and 0xC3BD;
-- Not empty on MySQL 8
-- Empty on TiDB + old_collation
mysql> SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled';
+----------------+
| VARIABLE_VALUE |
+----------------+
| True           |
+----------------+
1 row in set (0.08 sec)

@kennytm
Copy link
Contributor

kennytm commented Mar 24, 2021

what if you make the collation utf8mb4_general_ci

@wan1y
Copy link
Author

wan1y commented Mar 24, 2021

collation utf8mb4_general_ci

There is also this problem on collation utf8mb4_general_ci

@xiongjiwei
Copy link
Contributor

/assign

@xiongjiwei
Copy link
Contributor

it related to #23506.

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'backport-x.y.z',
and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants