Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collation: fix tidb panic when compare string with collation #23760

Closed
wants to merge 7 commits into from

Conversation

xiongjiwei
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #23506

Problem Summary:

please see #23506 (comment)

Check List

Tests

  • Unit test

Side effects

  • Performance regression
    • Consumes more CPU

since we have to check the string is valid or not, we almost have 20%-25% performance loss.

                                                                before                                after
BenchmarkUtf8mb4Bin_CompareShort-8              162474816                6.57 ns/op  | 179831926                6.58 ns/op
BenchmarkUtf8mb4GeneralCI_CompareShort-8         3879333               303 ns/op     |  3253285               395 ns/op
BenchmarkUtf8mb4UnicodeCI_CompareShort-8         3134563               344 ns/op     |  2195716               530 ns/op
BenchmarkUtf8mb4Bin_CompareMid-8                178230032                7.07 ns/op  | 162510152                6.54 ns/op
BenchmarkUtf8mb4GeneralCI_CompareMid-8             44064             26646 ns/op     |    33656             33831 ns/op
BenchmarkUtf8mb4UnicodeCI_CompareMid-8             39220             31158 ns/op     |    26828             43504 ns/op
BenchmarkUtf8mb4Bin_CompareLong-8               181071398                6.59 ns/op  | 182394769                6.76 ns/op
BenchmarkUtf8mb4GeneralCI_CompareLong-8               40          29085788 ns/op     |       30          37835873 ns/op
BenchmarkUtf8mb4UnicodeCI_CompareLong-8               33          34574309 ns/op     |       24          47649571 ns/op
BenchmarkUtf8mb4Bin_KeyShort-8                  29053438                38.4 ns/op   | 30192324                37.8 ns/op
BenchmarkUtf8mb4GeneralCI_KeyShort-8             4607634               237 ns/op     |  3934675               290 ns/op
BenchmarkUtf8mb4UnicodeCI_KeyShort-8             4929477               242 ns/op     |  3906382               296 ns/op
BenchmarkUtf8mb4Bin_KeyMid-8                     1829863               609 ns/op     |  1790952               623 ns/op
BenchmarkUtf8mb4GeneralCI_KeyMid-8                 70951             16994 ns/op     |    54698             21047 ns/op
BenchmarkUtf8mb4UnicodeCI_KeyMid-8                 61971             19005 ns/op     |    52916             22115 ns/op
BenchmarkUtf8mb4Bin_KeyLong-8                       3148            364115 ns/op     |     4012            347909 ns/op
BenchmarkUtf8mb4GeneralCI_KeyLong-8                   64          18180270 ns/op     |       52          21789504 ns/op
BenchmarkUtf8mb4UnicodeCI_KeyLong-8                   60          20579898 ns/op     |       54          23992131 ns/op

Release note

  • fix tidb panic when compare string with collation

@xiongjiwei xiongjiwei requested a review from a team as a code owner March 31, 2021 12:38
@xiongjiwei xiongjiwei requested review from lzmhhh123 and removed request for a team March 31, 2021 12:38
@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@ti-chi-bot ti-chi-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 31, 2021
@ichn-hu ichn-hu mentioned this pull request Mar 31, 2021
@xiongjiwei
Copy link
Contributor Author

/run-check_dev_2

@xiongjiwei
Copy link
Contributor Author

/cc @eurekaka @wshwsh12

@wjhuang2016
Copy link
Member

/bench

r1, r2 := rune(0), rune(0)
ai, bi := 0, 0
for ai < len(a) && bi < len(b) {
r1, ai = decodeRune(a, ai)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, is it possible to keep decodeRune with some additional validation checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will be the same with the golang library, it's faster because decodeRune has no validation check.

@mahjonp
Copy link
Contributor

mahjonp commented Apr 9, 2021

/build

tk.MustQuery(`select * from t where a > 0x80;`).Check(testkit.Rows("一 一"))
tk.MustQuery(`select * from t where b > 0x80;`).Check(testkit.Rows("a a", "一 一"))

// uncomment this when #23759 fix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#23759 has been fixed, so please uncomment these test.

r2, bsize := utf8.DecodeRuneInString(b)

// Incorrect string, compare bytewise
if (r1 == utf8.RuneError && asize == 1) || (r2 == utf8.RuneError && bsize == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need check size == 1?

for len(str) > 0 {
r, size := utf8.DecodeRuneInString(str)
if r == utf8.RuneError && size == 1 {
return buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any test to cover the change? Or the path is never be taken? I doubt the correctness of returning buf directly...

@lzmhhh123 lzmhhh123 removed their request for review June 21, 2021 09:43
@wjhuang2016 wjhuang2016 removed their request for review June 22, 2021 03:00
@wjhuang2016 wjhuang2016 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 22, 2021
@xiongjiwei xiongjiwei closed this Aug 11, 2021
@xiongjiwei xiongjiwei deleted the fix-collation-panic branch September 23, 2022 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/expression do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tidb panic while query a table which was set collation
6 participants