-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Change string comparing from UTF16 to UTF8 #3611
[Kernel] Change string comparing from UTF16 to UTF8 #3611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this and fixing it.
@@ -17,19 +17,24 @@ package io.delta.kernel.defaults | |||
|
|||
import io.delta.golden.GoldenTableUtils.goldenTablePath | |||
import io.delta.kernel.exceptions.{InvalidTableException, KernelException, TableNotFoundException} | |||
import io.delta.kernel.expressions.{Literal, ScalarExpression} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated changes in this file?
( | ||
ofString("apples"), | ||
ofString("oranges"), | ||
ofString("apples"), | ||
ofNull(StringType.STRING) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same test case as the one above?
( | ||
ofString("abc"), | ||
ofString("abcd"), | ||
ofString("abc"), | ||
ofNull(StringType.STRING) | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
( | |
ofString("abc"), | |
ofString("abcd"), | |
ofString("abc"), | |
ofNull(StringType.STRING) | |
), | |
(ofString("abc"), ofString("abcd"), ofString("abc"), ofNull(STRING)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for other test cases.
ofString("\uD83C\uDF3C"), | ||
ofString("\uFFFD"), | ||
ofNull(StringType.STRING) | ||
// scalastyle:on nonascii |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there more tests we could copy for Spark or Delta-Spark around other surrogate pairs and unsigned char comparison?
Changed string comparing from UTF16 to UTF8. This fixes comparison issues around the characters with surrogate pairs. Tests added to `DefaultExpressionEvaluatorSuite.scala`
Description
Changed string comparing from UTF16 to UTF8. This fixes comparison issues around the characters with surrogate pairs.
How was this patch tested?
Tests added to
DefaultExpressionEvaluatorSuite.scala