-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns with String.toLowerCase()
in default Locale
#9276
Comments
String.toLoweCase()
in default LocaleString.toLowerCase()
in default Locale
Not for me to decide this, but i believe that we have basically these options
Of these I think (3) is least awesome, because Iceberg is a library and should support being embedded in various contesxts. I think we should do (1). |
The toLowerCase & toUpperCase calls are being fixed in #10521. |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Query engine
ALL
Question
This stemmed from a discussion thread on PR 8909:
I did a quick scan of the calls to
String.toLowerCase()
in Iceberg codebase, and I do a few places where it may be a concern:VectorizedSupport seems like it may be a problem, but I do not really know whether the lower case data is exposed and how.
IcebergRecordObjectInspector converts field names to lower case, and that seems to be affected by the locale problem as various case names are accepted as input parameters to its methods.
jQuery
code uses a lot oftoLowerCase()
, but I do not really know how it is supposed to be used.To the best of my knowledge this kind of case conversion can be problematic only in German and Turkish locales. The German locale affects only proper German language words (so it is less of a problem), but the Turkish locale can cause English words to be converted in unexpected ways.
For example, this assertion fails:
assertThat("VIEW".toLowerCase(new Locale("TR"))).isEqualTo("view");
Does Iceberg support using its libraries in user-defined locales?
The text was updated successfully, but these errors were encountered: