Add specific unicode string literal type (and make ascii the default) #5167

chriseth · 2018-10-08T18:05:37Z

From the zeppelin audit:

Strings in Solidity are not only used for displaying information: for example, it is very common to have them be a key of a mapping. Because UTF-8 allows for multiple invisible characters (e.g. ZERO WIDTH SPACE), and for characters that look almost like common characters (e.g. GREEK QUESTION MARK), this usage can be extremely problematic, and lead to underhanded backdoors, exploits, etc. OpenZeppelin’s main access-control contracts are affected by this, as are multiple other string-based implementations.
Consider adding a non-UTF-8 string type to prevent these situations from arising in the first place.

maraoz · 2018-10-24T20:58:33Z

See OpenZeppelin/openzeppelin-contracts#1090 (comment) to learn how OpenZeppelin was affected by this

chriseth · 2018-11-07T14:45:18Z

An obvious way to introduce them is via a prefix (like in hex"0101") - the main question would be whether the non-prefixed string should allow utf8 or only ascii.

axic · 2018-11-07T16:02:04Z

If we do not want a breaking change: ascii"abcd"

If we are happy to do a breaking change, then the current strings would need to be prefixed with unicode or utf8: unicode"this is a string...".

chriseth · 2018-11-07T19:36:27Z

Note that things like "\u1234" should still be allowed in the "non-utf8-strings".

axic · 2018-11-14T14:50:38Z

Note that things like "\u1234" should still be allowed in the "non-utf8-strings".

Why?

chriseth · 2018-11-29T12:47:18Z

Because the idea is that the source representation does not have any "weird" characters, but the internal representation can be anything.

@maraoz would you agree?

maraoz · 2018-11-29T19:22:20Z

@chriseth agreed!

leonardoalt · 2019-11-05T15:14:27Z

Should this go to the backlog?

chriseth · 2019-12-11T14:46:26Z

Preliminary vote: make ascii strings the default and require a prefix for unicode strings

axic · 2020-01-15T14:51:10Z

Decision on meeting:

Change string literal to not allow anything but ASCII printable characters and escape codes.
Introduce unicode prefix for string literals, which also allows Unicode characters.

axic · 2020-07-08T16:01:30Z

Change string literal to not allow anything but ASCII printable characters and escape codes.

Does an ascii string allow unicode escape?

string a = "\u1234"; // is this valid?
string b = unicode"\u1234"; // is this valid?

chriseth · 2020-07-08T16:06:59Z

Yes, we said unicode escapes in default strings are fine, but not unicode characters.

axic · 2020-07-14T13:16:01Z

While implementing I had realised a few things: it is quite a large change allowing escapes in non-unicode literals, because the scanner just turns the escape into codepoints.

First thought implementation should have no effect on the design, but this I think is a useful consideration:

Hex string literals can contain any kind of data (ascii, unicode, etc.)
Regular string literals (i.e. "hello world") should only contain ASCII characters, and cannot contain unicode escapes (disabling the escape in the scanner)
Unicode string literals (i.e. unicode"⚠️" or unicode"\u00a0") can contain ASCII, Unicode or Unicode escapes

Assigning any literal to a string type should check for UTF-8 encoding (this is something we have now).

axic · 2020-07-28T10:10:42Z

The rules described in #5167 (comment) were implemented.

chriseth added the language design Any changes to the language, e.g. new features label Nov 7, 2018

axic mentioned this issue Dec 11, 2019

Reserve from and unsafe as keywords #7955

Closed

chriseth changed the title ~~Add a non-utf8 string type~~ Add specific unicode string literal type (and make ascii the default) Jan 15, 2020

axic mentioned this issue Apr 5, 2020

Review escaping and new line support in string literals #4966

Closed

chriseth added the breaking change ⚠️ label Jul 1, 2020

axic self-assigned this Jul 1, 2020

axic mentioned this issue Jul 14, 2020

[BREAKING] Support unicode string literal type #9412

Merged

4 tasks

chriseth closed this as completed in #9412 Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specific unicode string literal type (and make ascii the default) #5167

Add specific unicode string literal type (and make ascii the default) #5167

chriseth commented Oct 8, 2018

maraoz commented Oct 24, 2018

chriseth commented Nov 7, 2018

axic commented Nov 7, 2018

chriseth commented Nov 7, 2018

axic commented Nov 14, 2018

chriseth commented Nov 29, 2018

maraoz commented Nov 29, 2018

leonardoalt commented Nov 5, 2019

chriseth commented Dec 11, 2019

axic commented Jan 15, 2020

axic commented Jul 8, 2020

chriseth commented Jul 8, 2020

axic commented Jul 14, 2020

axic commented Jul 28, 2020

Add specific unicode string literal type (and make ascii the default) #5167

Add specific unicode string literal type (and make ascii the default) #5167

Comments

chriseth commented Oct 8, 2018

maraoz commented Oct 24, 2018

chriseth commented Nov 7, 2018

axic commented Nov 7, 2018

chriseth commented Nov 7, 2018

axic commented Nov 14, 2018

chriseth commented Nov 29, 2018

maraoz commented Nov 29, 2018

leonardoalt commented Nov 5, 2019

chriseth commented Dec 11, 2019

axic commented Jan 15, 2020

axic commented Jul 8, 2020

chriseth commented Jul 8, 2020

axic commented Jul 14, 2020

axic commented Jul 28, 2020