Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dart String toUpperCase and toLowerCase methods are incorrect for Turkish. #28

Closed
DartBot opened this issue Oct 10, 2011 · 15 comments
Closed
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). closed-not-planned Closed as we don't intend to take action on the reported issue

Comments

@DartBot
Copy link

DartBot commented Oct 10, 2011

This issue was originally filed by @mdakin


String toLowerCase and toUpperCase does not work correctly for Turkish dotless i and capital dotted i.

Run this application (Unfortunately http://try-dart-lang.appspot.com/ loses Turkish characters after I tried to link it):

main() {

  // Expected conversions
  String trUpper = "A,B,C,Ç,D,E,F,G,Ğ,H,I,İ,J,K,L,M,N,O,Ö,P,R,S,Ş,T,U,Ü,V,Y,Z";
  String trLower = "a,b,c,ç,d,e,f,g,ğ,h,ı,i,j,k,l,m,n,o,ö,p,r,s,ş,t,u,ü,v,y,z";
  
  // Actual conversions
  String dartTrUpper = trLower.toUpperCase();
  String dartTrLower = trUpper.toLowerCase();
  
  if (dartTrUpper != trUpper) {
    print ("Incorrect Turkish toUpper conversion. \nExpected: ${trUpper} \nFound: ${dartTrUpper}");
  }
  if (dartTrLower != trLower) {
    print ("Incorrect Turkish toLower conversion. \nExpected: ${trLower} \nFound: ${dartTrLower}");
  }
}

Expected: Program does not print anything.
Actual: Prints 2 messages with outputs.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by @ahmetaa


I think without proper Locale (is there any?) settings 'I' cannot be converted to 'ı'.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by jat@google.com


String.toLowerCase/toUpperCase will not be locale aware. The plan (which I do not personally agree with) is to provide a separate i18n library that has basically the important functionality of ICU (TBD).


Added WontFix label.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by @mdakin


So in order to make Dart applications that use Strings properly in Turkish (and Azeri etc.), programmers will have to use an external library? This is a huge disappointment.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by @ahmetaa


If it is well documented, I guess it is o.k. Otherwise it would cause subtle bugs.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by @mdakin


So in order to make Dart applications that use Strings properly in Turkish (and Azeri etc.), programmers will have to use an external library? This is a huge disappointment.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by jat@google.com


The i18n library will be provided, but you will have to #import it, and use its methods rather than the ones on String.

@DartBot
Copy link
Author

DartBot commented Oct 10, 2011

This comment was originally written by jat@google.com


I agree personally - I think it would be better to leave off toLowerCase/toUpperCase from String than to provide one that produces unexpected behavior for non-ASCII characters and incorrect behavior in some locales even for ASCII characters, and we should be learning from i18n mistakes in Java and elsewhere rather than repeating them.

@DartBot
Copy link
Author

DartBot commented Oct 11, 2011

This comment was originally written by mda...@google.com


If plan for Dart is to use an external library to support basic string case operations for Turkic languages with latin script, will it look like this: (lets assume an application that gets username and lowercases it before processing it further, a common use case)

name = get Name
locale = Get Users locale
if locale uses non Ascii Latin script
  normalized = i18n.toLowerCase(name, locale)
else
  normalized = name.toLowerCase()

I don't expect anybody to write code like this to convert cases, and as a result most applications written will be automatically broken for several Turkic languages.

I actually reported same error for go, http://code.google.com/p/go/issues/detail?id=703 they ended up adding toLowerSpecial and toUpperSpecial methods to Strings package (http://code.google.com/p/go/source/detail?r=477b3015c0 ) However I am not sure this solution is any better.

Would you consider reopening the bug, maybe in a different form? What can we do to fix this problem?

@DartBot
Copy link
Author

DartBot commented Oct 11, 2011

This comment was originally written by jat@google.com


Personally, I agree with you, but I think it is going to take more than a few people to change this plan, as the VM guys do not want to have to bundle something like ICU into the VM, as that would preclude implementation on very small platforms. Personally, I think any program that doesn't consider localization is broken at this day and age, and it shouldn't even be possible to write one that doesn't.

Regarding your example, the expectation is that anyone who does care about writing internationalized apps will always use the i18n library and not use String.toLowercase/toUpperCase, so it would look something like this:

normalized = I18n.toLowerCase(name);

There is work to be done on how locales are specified (on the browser it will almost certainly be just one locale at a time by default, the server gets a little messier but by default one locale per request), but the idea is if you care about localization you always use the localizable API.

@iposva-google
Copy link
Contributor

Added Area-Library label.

@DartBot
Copy link
Author

DartBot commented Oct 11, 2011

This comment was originally written by @mdakin


jat@, then, as you said before, maybe the safest path is to remove toLowerCase and toUpperCase methods from String class.

Would Dart team consider removing "WontFix" status for this bug?

@DartBot
Copy link
Author

DartBot commented Oct 11, 2011

This comment was originally written by jat@google.com


Set owner to @floitschG.
Added Area-Language label.

@DartBot
Copy link
Author

DartBot commented Oct 11, 2011

This comment was originally written by @mdakin


After thinking about it, I have one last comment on this,

If toLowerCase and toUpperCase are by default locale sensitive, this would introduce a new class of terrible bugs, as Turkish case conversion i <-> İ and ı <-> I converts normal Ascii strings into non-Ascii and this breaks other things like URLs, database column names etc. Java's default locale sensitive case conversion approach caused tons of issues , most applications stopped working on Turkish locale.

The best scenario, as you already mentioned, would be providing both locale sensitive and insensitive case converson methods in a separate core library.

@floitschG
Copy link
Contributor

The core Dart language avoids Unicode as much as possible. It has toLower/UpperCase but not much more.
toLowerCase and toUpperCase are using Unicode's (locale independent) lower and uppercase mappings (see first and fifth question at http://unicode.org/faq/casemap_charprop.html)
String.compareTo uses the unicode charCode and is hence locale independent, too.

As John said: for i18n support one will need to import the i18n library and not use the toLower/UpperCase functions for locale dependent strings.
Hopefully this will be as automatic as importing the html-library.

@DartBot
Copy link
Author

DartBot commented Oct 20, 2013

This comment was originally written by @ahmetaa


Just to add some info,
There is a pub package for this now: http://pub.dartlang.org/packages/dotless_i
Also another issue is opened: https://code.google.com/p/dart/issues/detail?id=12721

copybara-service bot pushed a commit that referenced this issue Jul 21, 2023
Here's a minimal repro that this CL fixes:

`ui.dart`

```dart
library dart.ui;

import 'dart:ffi';

part 'foo.dart';
```

`foo.dart`

```dart
part of dart.ui;

@Native<Void Function()>(symbol: 'foo_func', isLeaf: true)
external void foo_func();
```

When compiling with `compile_platform.dart` with `--target=dart2wasm`, the following error appears:


```
Unhandled exception:
Verification error: Target=wasm, VerificationStage.afterModularTransformations: Invalid location with target 'wasm' on FunctionNode() (FunctionNode): RangeError (offset): Invalid value: Not in inclusive range 0..56: 91
Context: 'foo_func_$import'.
Node: 'FunctionNode()'.
#0      VerificationErrorListener.reportError (package:kernel/verifier.dart:81:5)
#1      VerifyingVisitor.problem (package:kernel/verifier.dart:222:14)
#2      VerifyingVisitor._getLocation (package:kernel/verifier.dart:1361:7)
#3      VerifyingVisitor._hasLocation (package:kernel/verifier.dart:1370:26)
#4      VerifyingVisitor.getSameLibraryLastSeenTreeNode (package:kernel/verifier.dart:1342:28)
#5      VerifyingVisitor.localContext (package:kernel/verifier.dart:1382:24)
#6      VerifyingVisitor.defaultDartType (package:kernel/verifier.dart:1491:41)
#7      Visitor.visitVoidType (package:kernel/visitor.dart:1309:37)
#8      VoidType.accept (package:kernel/ast.dart:11190:42)
#9      FunctionNode.visitChildren (package:kernel/ast.dart:3919:16)
#10     VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10)
#11     VerifyingVisitor.visitWithLocalScope (package:kernel/verifier.dart:266:5)
#12     VerifyingVisitor.visitFunctionNode (package:kernel/verifier.dart:721:5)
#13     FunctionNode.accept (package:kernel/ast.dart:3908:38)
#14     VerifyingVisitor.visitProcedure (package:kernel/verifier.dart:620:19)
#15     Procedure.accept (package:kernel/ast.dart:3311:40)
#16     visitList (package:kernel/ast.dart:14488:14)
#17     Library.visitChildren (package:kernel/ast.dart:591:5)
#18     VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10)
#19     VerifyingVisitor.defaultTreeNode (package:kernel/verifier.dart:196:5)
#20     TreeVisitor.visitLibrary (package:kernel/visitor.dart:503:35)
#21     VerifyingVisitor.visitLibrary (package:kernel/verifier.dart:367:11)
#22     Library.accept (package:kernel/ast.dart:577:38)
#23     visitList (package:kernel/ast.dart:14488:14)
#24     Component.visitChildren (package:kernel/ast.dart:14320:5)
#25     VerifyingVisitor.visitChildren (package:kernel/verifier.dart:259:10)
#26     VerifyingVisitor.visitComponent (package:kernel/verifier.dart:342:7)
#27     Component.accept (package:kernel/ast.dart:14313:38)
#28     VerifyingVisitor.check (package:kernel/verifier.dart:171:15)
#29     verifyComponent (package:kernel/verifier.dart:69:20)
...
```

The issue seems to be that after doing this native transformation, the node's `fileUri` references the enclosing library (`ui.dart` above), but the `node.location` references the actual source file (`foo.dart` above) indirectly through `node.fileOffset`.

This ends up being an issue when compiling the platform dill in Google3,   but I didn't look into why `flutter build web --wasm` isn't broken.

Internal bug: b/292172146

Change-Id: I2b8d7d215b2c36354860257ce651d50168e9523d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/315360
Reviewed-by: Ömer Ağacan <omersa@google.com>
Commit-Queue: Jia Hao Goh <jiahaog@google.com>
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). closed-not-planned Closed as we don't intend to take action on the reported issue
Projects
None yet
Development

No branches or pull requests

4 participants