Concerns with `String.toLowerCase()` in default Locale #9276

dimas-b · 2023-12-11T17:04:31Z

Query engine

ALL

Question

This stemmed from a discussion thread on PR 8909:

I did a quick scan of the calls to String.toLowerCase() in Iceberg codebase, and I do a few places where it may be a concern:

VectorizedSupport seems like it may be a problem, but I do not really know whether the lower case data is exposed and how.
IcebergRecordObjectInspector converts field names to lower case, and that seems to be affected by the locale problem as various case names are accepted as input parameters to its methods.
jQuery code uses a lot of toLowerCase(), but I do not really know how it is supposed to be used.

To the best of my knowledge this kind of case conversion can be problematic only in German and Turkish locales. The German locale affects only proper German language words (so it is less of a problem), but the Turkish locale can cause English words to be converted in unexpected ways.

For example, this assertion fails: assertThat("VIEW".toLowerCase(new Locale("TR"))).isEqualTo("view");

Does Iceberg support using its libraries in user-defined locales?

The text was updated successfully, but these errors were encountered:

findepi · 2024-06-18T08:12:09Z

Does Iceberg support using its libraries in user-defined locales?

Not for me to decide this, but i believe that we have basically these options

make the code independent of JVM locale (Palantir's error-prone check DefaultLocale can be helpful, but seems insufficient)
test the code on various locales including Turkish
add a runtime check enforcing JVM locale

Of these I think (3) is least awesome, because Iceberg is a library and should support being embedded in various contesxts. I think we should do (1).

findepi · 2024-06-18T08:15:45Z

I did a quick scan of the calls to String.toLowerCase() in Iceberg codebase

The toLowerCase & toUpperCase calls are being fixed in #10521.
There may be other Locale-dependent APIs though.

github-actions · 2025-01-16T00:15:41Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2025-01-31T00:14:59Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

dimas-b mentioned this issue Dec 11, 2023

Nessie: Support views for NessieCatalog #8909

Merged

dimas-b changed the title ~~Concerns with String.toLoweCase() in default Locale~~ Concerns with String.toLowerCase() in default Locale Dec 11, 2023

ajantha-bhat mentioned this issue Jun 18, 2024

Fix lower/upper-case not to depend on JVM locale #10521

Merged

github-actions bot added the stale label Jan 16, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns with `String.toLowerCase()` in default Locale #9276

Concerns with `String.toLowerCase()` in default Locale #9276

dimas-b commented Dec 11, 2023

findepi commented Jun 18, 2024 •

edited

Loading

findepi commented Jun 18, 2024

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 31, 2025

Concerns with String.toLowerCase() in default Locale #9276

Concerns with String.toLowerCase() in default Locale #9276

Comments

dimas-b commented Dec 11, 2023

Query engine

Question

findepi commented Jun 18, 2024 • edited Loading

findepi commented Jun 18, 2024

github-actions bot commented Jan 16, 2025

github-actions bot commented Jan 31, 2025

Concerns with `String.toLowerCase()` in default Locale #9276

Concerns with `String.toLowerCase()` in default Locale #9276

findepi commented Jun 18, 2024 •

edited

Loading