Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

charset: Support parsing CHARSET=utf8mb3 #37084

Closed
wants to merge 1 commit into from

Conversation

k0kubun
Copy link

@k0kubun k0kubun commented Aug 13, 2022

What problem does this PR solve?

Issue Number: close #26226 close #31790

Problem Summary: Support parsing CHARSET=utf8mb3

What is changed and how it works?

utf8mb3 charset support is added.

If you specify CHARSET=utf8 in MySQL 8, it's translated to CHARSET=utf8mb3 when you use SHOW CREATE TABLE for example. Here's the definition I used to write this patch:

mysql> SHOW COLLATION WHERE Charset = 'utf8mb3' and `Default` = 'Yes';
+--------------------+---------+----+---------+----------+---------+---------------+
| Collation          | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+----+---------+----------+---------+---------------+
| utf8mb3_general_ci | utf8mb3 | 33 | Yes     | Yes      |       1 | PAD SPACE     |
+--------------------+---------+----+---------+----------+---------+---------------+
1 row in set (0.00 sec)

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Support parsing CHARSET=utf8mb3

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 13, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Aug 13, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot
Copy link
Member

Welcome @k0kubun!

It looks like this is your first PR to pingcap/tidb 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/tidb. 😃

@sre-bot
Copy link
Contributor

sre-bot commented Aug 13, 2022

Please follow PR Title Format:

  • pkg [, pkg2, pkg3]: what is changed

Or if the count of mainly changed packages are more than 3, use

  • *: what is changed

After you have format title, you can leave a comment /run-check_title to recheck it

@k0kubun k0kubun changed the title Support parsing CHARSET=utf8mb3 parser: Support parsing CHARSET=utf8mb3 Aug 13, 2022
@k0kubun k0kubun changed the title parser: Support parsing CHARSET=utf8mb3 charset: Support parsing CHARSET=utf8mb3 Aug 13, 2022
@k0kubun
Copy link
Author

k0kubun commented Aug 13, 2022

/run-check_title

@k0kubun
Copy link
Author

k0kubun commented Aug 13, 2022

/cc @WizardXiao

@ti-chi-bot ti-chi-bot requested a review from WizardXiao August 13, 2022 08:40
@ti-chi-bot ti-chi-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 13, 2022
@k0kubun
Copy link
Author

k0kubun commented Aug 13, 2022

https://ci.pingcap.net/blue/organizations/jenkins/tidb_ghpr_mysql_test/detail/tidb_ghpr_mysql_test/21812/pipeline/89
/home/jenkins/agent/workspace/tidb_ghpr_mysql_test/go/src/github.com/pingcap/tidb-test/mysql_test

I don't find the test code in this repository, and the test output is too cryptic to figure out what's happening.

[2022-08-13T16:59:06.036Z] time="2022-08-14T00:59:05+08:00" level=error msg="run test [index_merge_sqlgen_exprs_orandor_1_no_out_trans] err: sql:select /*+ use_index_merge( tbl_8 ) */ col_81,col_89,col_83,col_88 from tbl_8 where ( tbl_8.col_81 not in ( -9222 , -26107 , -27964 , -27895 , 25669 ) or IsNull( tbl_8.col_83 ) ) and ( tbl_8.col_83 between 'iXNfDRX' and 'xyKWlZxuVz' or IsNull( tbl_8.col_83 ) ) and 120 order by tbl_8.col_80,tbl_8.col_81,tbl_8.col_82,tbl_8.col_83,tbl_8.col_84,tbl_8.col_85,tbl_8.col_86,tbl_8.col_87,tbl_8.col_88,tbl_8.col_89 ;: failed to run query 
\"select /*+ use_index_merge( tbl_8 ) */ col_81,col_89,col_83,col_88 from tbl_8 where ( tbl_8.col_81 not in ( -9222 , -26107 , -27964 , -27895 , 25669 ) or IsNull( tbl_8.col_83 ) ) and ( tbl_8.col_83 between 'iXNfDRX' and 'xyKWlZxuVz' or IsNull( tbl_8.col_83 ) ) and 120 order by tbl_8.col_80,tbl_8.col_81,tbl_8.col_82,tbl_8.col_83,tbl_8.col_84,tbl_8.col_85,tbl_8.col_86,tbl_8.col_87,tbl_8.col_88,tbl_8.col_89 ;\" 
 around line 470, 
we need(260):
col_81	col_89	col_83	col_88
-16576	-7890442	TgDiEHssSokZ	1242042198
-9451	-4468397	xyKWlZxuVz	-622612825
-27807	-3208676	tEYE	1467882115
-6601	6068424	OHdhjAaVrRYTkH	842412332
22049	-4427135	KuyWvs	1920239943
-399	-7906592	U	-1741181598
-18820	451273	sXROwGN	-
but got(260):
col_81	col_89	col_83	col_88
-9451	-4468397	xyKWlZxuVz	-622612825
-27807	-3208676	tEYE	1467882115
-18820	451273	sXROwGN	-403782619
-24816	5344815	NULL	14471039
-25155	5133030	lUovOrDeGvsTGbiGx	239165119
24325	7889453	sJSamyVp	1157277154
8406	7186100	m	21434219

If you can fix the remaining test failures, that'd be appreciated.

@lance6716
Copy link
Contributor

lance6716 commented Aug 15, 2022

@bb7133 @wjhuang2016 PTAL (with some guide)

@xiongjiwei
Copy link
Contributor

I suggest we can add utf8mb3 into the parser, and map it to utf8(including the correlated collation), but not add the real utf8mb3 charset into AST. tidb cannot handle utf8mb3 charset

@k0kubun
Copy link
Author

k0kubun commented Aug 15, 2022

Agreed with your suggestion. I cannot fix this patch immediately, so I'll close my pull request. Thank you!

@k0kubun k0kubun closed this Aug 15, 2022
@k0kubun k0kubun deleted the utf8mb3 branch August 15, 2022 14:46
@Fanduzi
Copy link

Fanduzi commented Apr 19, 2023

Agreed with your suggestion. I cannot fix this patch immediately, so I'll close my pull request. Thank you!

Where is the fix mr for map utf8mb3 to utf8 please? Did you merge it? I didn't find your MR, but I'm asking because we're currently having this problem too.

@k0kubun
Copy link
Author

k0kubun commented Apr 19, 2023

I wasn't able to fix it. Please feel free to take over my patch in this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't restore from mysql 8.0 logical dump if charset is utf8 Support utf8mb3 charset
6 participants