-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collation: cast charset according to the function's resulting charset #29029
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
c02af07
to
8e098c3
Compare
2eaa454
to
37ebe20
Compare
/run-check_dev_2 |
6a38dfe
to
b0d8916
Compare
@@ -654,23 +654,11 @@ func TestDeriveCollation(t *testing.T) { | |||
false, | |||
&ExprCollation{CoercibilitySysconst, UNICODE, charset.CharsetUTF8MB4, charset.CollationUTF8MB4}, | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We assume all the cast are implicit, keep the collation related fields to their original value, the test is meaningless
@@ -1218,7 +1218,12 @@ func convertUint(val []byte) (*Constant, error) { | |||
func convertString(val []byte, tp *tipb.FieldType) (*Constant, error) { | |||
var d types.Datum | |||
d.SetBytesAsString(val, protoToCollation(tp.Collate), uint32(tp.Flen)) | |||
return &Constant{Value: d, RetType: types.NewFieldType(mysql.TypeVarString)}, nil | |||
return &Constant{Value: d, RetType: &types.FieldType{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pb to string expression should use charset information in pb
@@ -1180,7 +1180,7 @@ func (s *testIntegrationSuite2) TestStringBuiltin(c *C) { | |||
|
|||
// for insert | |||
result = tk.MustQuery(`select insert("中文", 1, 1, cast("aaa" as binary)), insert("ba", -1, 1, "aaa"), insert("ba", 1, 100, "aaa"), insert("ba", 100, 1, "aaa");`) | |||
result.Check(testkit.Rows("aaa文 ba aaa ba")) | |||
result.Check(testkit.Rows("aaa\xb8\xad文 ba aaa ba")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is compatible with mysql version before 8.0.24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change happened? Because of implicit cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, before 8.0.24, MySQL uses 1st and 4th arguments to determine the resulting charset, after it, only uses 1st argument. in this case, the resulting charset will be binary
for the former and utf8mb4
for the latter, and length of 1 for binary charset is a byte, utf8mb4 is a character.
8cb0348
to
370da72
Compare
@xiongjiwei: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Defined2014: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
// if value is NULL or binary string, just skip it. | ||
if isNull || types.IsBinaryStr(c.GetType()) { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move types.IsBinaryStr(c.GetType())
to the beginning of this loop to avoid unnecessary EvalString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiongjiwei Please address this comment.
Close because there is another implementation: #29905 |
What problem does this PR solve?
Issue Number: close #28356
some functions like
concat
,eq
may have different charset among the args. we will infer the charset and collation according to the args, so, if the resulting charset is different from the arg's charset, we need to cast the arg's charset to the resulting charset. e.g.if a is
gbk
charset, we should convert0x31
to gbk charset. If this convert is impossible, for example, 0x81 is not a valid gbk character, we will return an error.Check List
Tests
Side effects
Documentation
Release note