-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tables: fix prefix index, when the charset is utf8, truncate it from runes #7109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
table/tables/index.go
Outdated
rs := bytes.Runes(val) | ||
truncateStr := string(rs[:ic.Length]) | ||
// truncate value and limit its length | ||
v.SetBytes([]byte(truncateStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetString
can save a memory allocation.
ic := c.idxInfo.Columns[i] | ||
if ic.Tp.Charset == charset.CharsetUTF8 || ic.Tp.Charset == charset.CharsetUTF8MB4 { | ||
val := v.GetBytes() | ||
if ic.Length != types.UnspecifiedLength && utf8.RuneCount(val) > ic.Length { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use utf8.RuneCountInString() instead, and thus eliminate the usage of val.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But RuneCountInString needs to convert bytes to string first, it's unworthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
@coocood @birdstorm PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
What have you changed? (mandatory)
fix #7104 , before this PR, the index length is counted by its bytes. when the charset is utf8 or utf8mb4, the length should be counted by its runes. This PR fixes this.
What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
UT
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
NO
Does this PR affect tidb-ansible update? (mandatory)
NO
Does this PR need to be added to the release notes? (mandatory)
release note:
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
Add a few positive/negative examples (optional)