Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: allow more charset/collation modifications for database/table #10958

Merged
merged 5 commits into from
Jul 4, 2019

Conversation

bb7133
Copy link
Member

@bb7133 bb7133 commented Jun 27, 2019

What problem does this PR solve?

Allow modifying collations of databases/tables when their charsets are utf8/utf8mb4.

For example, some TiDB users want to do the following things:

tidb> create table t(a int);
Query OK, 0 rows affected (0.01 sec)

tidb> show create table t;
+-------+-----------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                              |
+-------+-----------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

tidb> alter table t default charset utf8mb4 collate utf8mb4_unicode_ci;

Before this PR, an error is returned:

ERROR 1105 (HY000): unsupported modify collate from utf8mb4_bin to utf8mb4_unicode_ci

This PR fixes this error.

What is changed and how it works?

Some limitation checks are loosed

Check List

Tests

  • Integration test

Code changes

  • Has exported function/method change

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #10958 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #10958   +/-   ##
===========================================
  Coverage   81.0421%   81.0421%           
===========================================
  Files           419        419           
  Lines         89662      89662           
===========================================
  Hits          72664      72664           
  Misses        11750      11750           
  Partials       5248       5248

@coocood
Copy link
Member

coocood commented Jun 27, 2019

But we don't support case insensitive collate.

if toCharset == charset.CharsetUTF8MB4 && origCharset == charset.CharsetUTF8 {
if (origCharset == charset.CharsetUTF8 && toCharset == charset.CharsetUTF8MB4) ||
(origCharset == charset.CharsetUTF8 && toCharset == charset.CharsetUTF8) ||
(origCharset == charset.CharsetUTF8MB4 && toCharset == charset.CharsetUTF8MB4) {
Copy link
Contributor

@Deardrops Deardrops Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is needless, because L2353 has check the case that toCharset is same with origCharset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @Deardrops L2353 is used to report the error message. If the check is passed(2346 ~ 2348), nil will be returned, and the change is used to allow changing collate when the charset is not changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I point out a wrong place before. Not in L2353, but in L2361, there is also a return nil for the case that toCharset is same with origCharset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is that we allow changing the collation while keeping charset unchanged if the charset is utf8/utf8mb4.

Please check the test case https://github.com/pingcap/tidb/pull/10958/files/e23ba9e480618ed994f176cd4f7fdbb2f02b850d#diff-703ae6b7872b425273d1832c198598c8R1751 for this logic. @Deardrops

@bb7133
Copy link
Member Author

bb7133 commented Jun 27, 2019

But we don't support case insensitive collate.

Some users complained that in TiDB, creating a table with some collations like utf8mb4_unicode_ci is supported but the collation cannot be altered.

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala zimulala added the status/LGT2 Indicates that a PR has LGTM 2. label Jun 27, 2019
@tangenta
Copy link
Contributor

tangenta commented Jul 2, 2019

But we don't support case insensitive collate.

Some users complained that in TiDB, creating a table with some collations like utf8mb4_unicode_ci is supported but the collation cannot be altered.

I am not familiar with TiDB's charset. Since TiDB does not support case insensitive collation, would it be better to disallow the creation of utf8mb4_unicode_ci tables?

@bb7133
Copy link
Member Author

bb7133 commented Jul 3, 2019

But we don't support case insensitive collate.

Some users complained that in TiDB, creating a table with some collations like utf8mb4_unicode_ci is supported but the collation cannot be altered.

I am not familiar with TiDB's charset. Since TiDB does not support case insensitive collation, would it be better to disallow the creation of utf8mb4_unicode_ci tables?

Some of TiDB users need this syntax, but they don't really care whether the collation is case-insensitive.

ddl/ddl_api.go Outdated
if toCharset == charset.CharsetUTF8MB4 && origCharset == charset.CharsetUTF8 {
if (origCharset == charset.CharsetUTF8 && toCharset == charset.CharsetUTF8MB4) ||
(origCharset == charset.CharsetUTF8 && toCharset == charset.CharsetUTF8) ||
(origCharset == charset.CharsetUTF8MB4 && toCharset == charset.CharsetUTF8MB4) {
// TiDB only allow utf8 to be changed to utf8mb4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL @zimulala

Copy link
Contributor

@tangenta tangenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bb7133
Copy link
Member Author

bb7133 commented Jul 4, 2019

/rebuild

@bb7133
Copy link
Member Author

bb7133 commented Jul 4, 2019

/run-all-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants