-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
planner: address collation ambiguity in scalar function construction during predicate simplification. #57049
Conversation
Hi @dash12653. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @dash12653. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
if equalValueCollation != otherValueCollation { | ||
return false | ||
} | ||
equalValue.GetArgs()[1].GetType(evalCtx).SetCollate(equalValueCollation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should we set it back? it has not changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the WHERE clause : WHERE c1 BETWEEN 'string1' AND 'string2' AND (c1 = 'string3' OR IsNull(c1)), 'string1' and 'string2' are set to the column's collation (utf8mb4_unicode_ci) when rewriting BETWEEN ... AND, but 'string3' is set to the connection-level collation (utf8mb4_general_ci). Before comparing "string1" and "string3" here, to avoid such collation mismatches, I explicitly reset the collations for the string constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a deriveCollation(ctx, funcName, args, retType, retType)
inside, which serves collation mismatch, while it's embedded in cast function, so i guess the better way here is to build a wrapper cast function BuildCastFunction
as L237's new child but not for sure, you can have a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! I’ll check out deriveCollation
and try building a wrapper cast function like BuildCastFunction
as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, after discussion with @time-and-fate , seems we could use his suggested way, which will be more clear.
@dash12653 hi, thanks for your contribution, do you mind having some updates for this pull request recently, we are lanching planner-related issue resolution campaign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the problem in the case is actually an error from NewFunction
but ignored, I would suggest changing the existing NewFunctionInternal
to NewFunction
and correctly handling the error.
Thanks for letting me know! I’ll update the PR accordingly. Let me know if you need anything else. |
Thanks for the feedback! Just to clarify, are you suggesting that I should only update the function use and handle the error, and that my previous changes can be discarded? |
Yes. |
/ok-to-test |
Got it. I’ll try |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #57049 +/- ##
================================================
+ Coverage 72.8597% 74.7315% +1.8718%
================================================
Files 1672 1717 +45
Lines 462630 470817 +8187
================================================
+ Hits 337071 351849 +14778
+ Misses 104795 96851 -7944
- Partials 20764 22117 +1353
Flags with carried forward coverage won't be shown. Click here to find out more.
|
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AilinKid, time-and-fate The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
/retest |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #56479
Problem Summary:
What changed and how does it work?
We can simplify the sql as follows:
When rewriting the BETWEEN and AND clause, the collations of string1 and string2 will be set to c1's collation—utf8mb4_unicode_ci rather than collation_connection.
During predicate simplification, a scalar function 'ge' will be constructed using string1 and string3 as parameters. However, since string1 has a collation of utf8mb4_unicode_ci and string3 has a collation of utf8mb4_general_ci (collation_connection), and both of them have a coercibility of 4, there is ambiguity regarding which collation to use. This leads to an failure to construct a new scalar function, which will lead a panic.
There's an additional concern: if we replace "BETWEEN and AND" with "<= and >=", then both string1 and string3 have a collation of utf8mb4_general_ci(collation_connection), then during predicate simplification, string1 and string3 would be compared using the utf8mb4_general_ci collation. This might lead to potential incorrect results.
Maybe we can:
During predicate simplification, we could reset the collation of the constants (string1, string2, string3) to match the collation of the column c1 to address the problems.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.