Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDL: fix a bug in column charset and collate when create table and modify column #11300

Merged
merged 7 commits into from
Jul 24, 2019

Conversation

AilinKid
Copy link
Contributor

@AilinKid AilinKid commented Jul 17, 2019

What problem does this PR solve?

This PR tries to fix the following bugs when creating a table and modify the columns:

tidb> create table t (a char(1) collate utf8_bin) charset utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.13 sec)

tidb> show create table t;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                     |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` char(1) CHARACTER SET utf8mb4 COLLATE utf8_bin DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Column a should use utf8_bin as collate and derive utf8 as it's charset as we expected.

tidb> create table t (a char(1) charset utf8 collate utf8_bin) charset utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.12 sec)

tidb> alter table t modify column a char(1) collate utf8_general_ci;
ERROR 1115 (42000): Unknown character set: 'utf8mb4', collation: 'utf8_general_ci'

The reason for the unexpected error is that: the second sql will use the table's charset and collate, the new col is consist of utf8mb4 charset from the table and utf8_general_ci collate from the specified one before it is assigned to old col.

What is changed and how it works?

When creating tables and modifying columns, the specified collate info is in colDef.Option rather colDef.collate.
So we should take this collate info into consideration when we set them or judge conflict or derive the corresponding charset.

Check List

Tests

  • Unit test
  • Integration test

Code changes

Add some logic code to handle collate info in colDef.Option and remove the option when we done to prevent double check.

@codecov
Copy link

codecov bot commented Jul 17, 2019

Codecov Report

Merging #11300 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11300   +/-   ##
===========================================
  Coverage   82.1169%   82.1169%           
===========================================
  Files           424        424           
  Lines         93636      93636           
===========================================
  Hits          76891      76891           
  Misses        11441      11441           
  Partials       5304       5304

ddl/ddl_api.go Outdated
if err := setCharsetCollationFlenDecimal(colDef.Tp, tblCharset, dbCharset); err != nil {
// The specified collate is in colDef.Options, but the charset is in colDef.Tp.
specifiedCollate := ""
for _, op := range colDef.Options {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colDef.options is iterated in the https://github.com/pingcap/tidb/blob/master/ddl/ddl_api.go#L2062, Could we eliminate the iteration here?

Copy link
Contributor Author

@AilinKid AilinKid Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colDef.options is iterated in the https://github.com/pingcap/tidb/blob/master/ddl/ddl_api.go#L2062, Could we eliminate the iteration here?

I've checked it!
The buildColumnAndConstraint func is called by two different logic: one is create table, the other is modify column.
For the later there is an option loop indeed before call this func, if we try to make use of loop, we need pass parameter, it's ok. Consequently, for the former that's means we have no option loop neither in the func or before call it, it will cause an error.
So we still need the loop here.

@winkyao
Copy link
Contributor

winkyao commented Jul 20, 2019

Rest LGTM

ddl/ddl_api.go Outdated
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

Roger that!

@AilinKid AilinKid force-pushed the fix_column_charset_and_collate branch from 00d307c to 7d52034 Compare July 22, 2019 09:02
@AilinKid
Copy link
Contributor Author

/run-all-tests

@AilinKid
Copy link
Contributor Author

/run-common-test

ddl/ddl_api.go Outdated
func extractCollateFromOption(def *ast.ColumnDef) string {
specifiedCollate := ""
collateIndex := -1
for i, op := range def.Options {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you may see, it is possible that we encounter multiple ast.ColumnOptionCollate, we should choose the last one as the final result:

For example, in MySQL 8.0:

mysql> create table tm(a varchar(20) collate utf8_unicode_ci collate utf8_general_ci) default charset=utf8mb4;
Query OK, 0 rows affected, 2 warnings (0.08 sec)

mysql> show create table tm;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                     |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tm    | CREATE TABLE `tm` (
  `a` varchar(20) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Notice that collation of column a is utf8_general_ci.

Please also notice that the when we resolve multiple collations, their charsets must be the same:

mysql> create table tm(a varchar(20) collate ascii_bin collate utf8_general_ci) default charset=utf8mb4;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'ascii'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can follow similar existing logics to handle multiple ColumnOptionCollate:

case ast.DatabaseOptionCollate:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can follow similar existing logics to handle multiple ColumnOptionCollate:

case ast.DatabaseOptionCollate:

That's great !!!
From mysql syntax, charset is unique specified, but collate can be multiple.
That's means I need to follow mysql's behavior when handling multiple collate.

And this reference code do me a great favor!!!

@AilinKid AilinKid force-pushed the fix_column_charset_and_collate branch from a396a72 to 3866f0c Compare July 22, 2019 13:57

// Test charset and multiple collate modification in column
tk.MustExec("drop table if exists modify_column_multiple_collate;")
tk.MustExec("create table modify_column_multiple_collate (a char(1) collate utf8_bin collate utf8_general_ci) charset utf8mb4 collate utf8mb4_bin")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some cases for ErrCollationCharsetMismatch

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// When handle charset and collate of a column, we should take collates(may multiple) in option into
// consideration rather than handling it separately. Then the following option processing logic should
// ignores this cause we will delete them from option here
func extractCollateFromOption(def *ast.ColumnDef) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the golang comment needs start with the function name, // extractCollateFromOption ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the golang comment needs start with the function name, // extractCollateFromOption ....

ok. got it.

@AilinKid AilinKid force-pushed the fix_column_charset_and_collate branch from 900cfba to a730428 Compare July 24, 2019 02:22
Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winkyao winkyao added the status/LGT2 Indicates that a PR has LGTM 2. label Jul 24, 2019
@bb7133
Copy link
Member

bb7133 commented Jul 24, 2019

/run-all-tests

Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ddl/ddl_api.go Outdated Show resolved Hide resolved
@bb7133 bb7133 added status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 24, 2019
@AilinKid AilinKid merged commit 29970ca into pingcap:master Jul 24, 2019
@winkyao
Copy link
Contributor

winkyao commented Jul 24, 2019

/run-cherry-picker

@sre-bot
Copy link
Contributor

sre-bot commented Jul 24, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Jul 24, 2019

cherry pick to release-2.1 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/LGT3 The PR has already had 3 LGTM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants