Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The escape sequence \uD is not valid code point #26620

Closed
luisvt opened this issue Jun 4, 2016 · 5 comments
Closed

The escape sequence \uD is not valid code point #26620

luisvt opened this issue Jun 4, 2016 · 5 comments
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-enhancement A request for a change that isn't a bug

Comments

@luisvt
Copy link

luisvt commented Jun 4, 2016

When creating string that contains unicode between \uD800 and \uE000 I get next error:

The escape sequence \uD is not valid code point

For example having this code:

    print('\uD834\uDF06');

throws me an error similar to this:

Unable to spawn isolate: 'file:///file.dart': error: line 13 pos 23: invalid code point
    print('\uD834\uDF06');
                      ^

My Operating system is Linux Mint 17.3 and dart version is 1.17.0-dev.6.4

@lrhn
Copy link
Member

lrhn commented Jun 5, 2016

Dart string literals only allow valid code points, not individual code units. That made more sense when Dart strings were code point sequences, but now that they are UTF-16 code unit sequences, we might want to reconsider it.

There are workarounds. For '\uD834\uDF06 the workaround is easy because it's a valid surrogate pair so you can just write "\u{1D306}". Another workaround is new String.fromCharCodes([0xd823,0xdf06]) which works for any code unit.

@lrhn lrhn added area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-enhancement A request for a change that isn't a bug labels Jun 5, 2016
@alan-knight
Copy link
Contributor

I think this is somewhat inconsistent. There are definitely some tests that do produce this, and it's valid in JavaScript, so I think we had removed some similar checks in Dartium. It was last August, but I can't seem to find the CL right now.

@luisvt
Copy link
Author

luisvt commented Jun 6, 2016

I also tried using \u{D834} and it didn't work. Maybe I should use \u{0D834}.

@lrhn
Copy link
Member

lrhn commented Jun 7, 2016

I also tried using \u{D834} and it didn't work. Maybe I should use \u{0D834}.

That won't work either, it means exactly the same thing.

You can't have a single code point in the range U+D800..U+DFFF in a string literal. It's not a valid UTF-16 sequence, and string literals can only generate valid Unicode strings.
The reason that \u{1d306} works is that it adds an entire surrogate pair as one, not two individual surrogate code points.
The restriction comes from the spec saying:

It is a compile-time error if the value of the HEX DIGIT SEQUENCE is not a valid unicode scalar value.

because a Unicode scalar value is any code point except the surrogate code points.

It's somewhat inconsistent because you can actually create string literals containing unpaired surrogates, as long as you encode them directly in the UTF-8 source file without using escapes.

Mind you, I'm not saying it isn't annoying. I usually only hit it while writing unit tests. If you are using strings to store binary data then you get into problems much easier (you might want to just store values in the range 0.255 then, both the VM and most JavaScript engines have special treatment of Latin-1 strings)..

@alan-knight
Copy link
Contributor

I note that it works fine in dart2js, although the analyzer claims there's an error.

@floitschG floitschG mentioned this issue Aug 25, 2016
16 tasks
lrhn added a commit that referenced this issue Sep 26, 2016
Fixes issue #26620
BUG: http://dartbug.com/26620

R=asiva@google.com, brianwilkerson@google.com, floitsch@google.com, hausner@google.com, sigmund@google.com

Review URL: https://codereview.chromium.org/2304923002 .
lrhn added a commit that referenced this issue Sep 26, 2016
Fixes issue #26620
BUG: http://dartbug.com/26620

R=asiva@google.com, brianwilkerson@google.com, floitsch@google.com, hausner@google.com, sigmund@google.com

Review URL: https://codereview.chromium.org/2304923002 .

Committed: 574ae43
@lrhn lrhn closed this as completed Oct 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

3 participants