Allow splitting string literals into multiple parts #7524

ghallak · 2019-10-07T20:54:31Z

This should work for both normal and hex string literals, and the splitting can be done either on the same line or on multiple lines.

Closes #7292

erak

Please add a Changelog entry and also adjust the documentation.

erak

I fear that the current implementation might lead to some confusion. Please consider the following example:

function f() public pure returns (bytes32) {
    bytes32 escapeCharacters = hex"0000"
    "deaf"
    "feed"
    "beef"
    "feed"
    "fade";
    return escapeCharacters;
}

Is it apparent that we're dealing with a hexadecimal literal here? I think it could be confused with a string literal and would therefor suggest to require a new line to start with another hex:

function f() public pure returns (bytes32) {
    bytes32 escapeCharacters = hex"0000"
    hex"deaf"
    hex"feed"
    hex"beef"
    hex"feed"
    hex"fade";
    return escapeCharacters;
}

In addition to that, I think only allowing whitespaces or newlines in hexstrings after an even amount of nibbles should be allowed. Please also see #7374, which applies these rules to underscores.

ghallak · 2019-10-17T23:58:43Z

@erak I have modified in this commit 9eb1096 Scanner::scanHexString to accept a sequence of hex string literals in a way that will solve both of the problems that you've mentioned above.

I tried to do the same in Scanner::scanString but that caused the command line test standard_yul_embedded_object_name to fail, because in this line:

solidity/test/cmdlineTests/standard_yul_embedded_object_name/input.json

Line 7 in 5ea1d90

    
           "content": "object \"NamedObject\" { code { let x := dataoffset(\"DataName\") sstore(add(x, 0), 0) } data \"DataName\" \"abc\" object \"OtherObject\" { code { revert(0, 0) } } }"

\"DataName\" \"abc\" is being parsed as a single string DataNameabc which is not correct.

Now, both the scanner (to allow multipart hex string literals) and the parser (to allow multipart regular string literals) are modified. Can this be done better (probably by modifying either the scanner or the parser, not both)?

erak · 2019-10-23T14:54:19Z

@ghallak Great, thanks. I think the implementation is fine. I've tested your changes and found out, that hex"aa" hex"ZZ" "cc" is still allowed, but we need to return an error here.

Could you also please add more syntax test for hex string literals?

Marenz · 2019-11-11T16:59:19Z

Tests are failing

chriseth · 2019-11-19T15:26:29Z

docs/types/value-types.rst

@@ -498,7 +498,7 @@ terminate the string literal. Newline only terminates the string literal if it i
 Hexadecimal Literals
 --------------------

-Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.
+Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" "44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.


I would actually prefer

Suggested change

Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" "44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.

Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.

@chriseth Do you mean that hex"00112233" hex"44556677" should be an allowed syntax? Are you suggesting that each part of a hex string should start with hex?

In the original issue description here #7292 (comment) the hex"00112233" "44556677" was given as an example, and from what I understood from the original issue description and from the previous comments on this PR, I have tried to implement the following:

For regular strings, no parts should start with hex.

Right:

"aa" "bb" "cc" "dd"

Wrong:

"aa" "bb" hex"cc" "dd"

For hex strings on a single line, only the first part should start with hex and the rest of the strings on the same line should not.

Right:

hex"aa" "22"

Wrong:

hex"aa" hex"22"

For hex strings on multiple lines, only the first part on each line should start with hex.

Right:

hex"11" "22" hex"33" "44"

Wrong:

hex"11" "22" "33" "44"

What do you think of the above rules?

@ghallak newlines are fragile creatures, so I would like to avoid having to distinguish newlines from other whitespace as much as possible, so I think it would be better to force people to use a hex prefix even for non-newline whitespace. What do the others think?

I've adjusted the docs here 7a2b6dd

chriseth · 2019-11-19T15:34:48Z

libsolidity/parsing/Parser.cpp

@@ -1614,9 +1614,18 @@ ASTPointer<Expression> Parser::parsePrimaryExpression()
 		}
 		break;
 	case Token::StringLiteral:
+	{
+		string literal = m_scanner->currentLiteral();
+		while (m_scanner->peekNextToken() == Token::StringLiteral)


I think this whole change could be simplified and shortened by introducing Token::HexStringLiteral and using this loop here to concatenate string literals and hex string literals while checking that their type stays the same. This would also allow us to create a better error message.

I've introduced Token::HexStringLiteral here af7d002 and made the relevant changes.

ghallak · 2019-11-25T00:17:52Z

libyul/ObjectParser.cpp

+	if (currentToken() == Token::HexStringLiteral)
+		expectToken(Token::HexStringLiteral, false);
+	else
+		expectToken(Token::StringLiteral, false);


@chriseth Is there a better way to do this? Maybe a function like expectToken that works for multiple tokens instead of a single one?

ghallak · 2019-11-25T00:34:33Z

libsolidity/parsing/Parser.cpp

+		if (m_scanner->currentToken() == Token::Illegal)
+			fatalParserError(to_string(m_scanner->currentError()));


@chriseth I had to add this so that in order to get the error message Expected even number of hex-nibbles within double-quotes. instead of Expected ';' but got 'Illegal' for this test. Is it necessary?

I think the solution is to just not call next() above in case you have an illegal token.

Or maybe you can transform the loop above into a do-while loop?

I have tried both of these solution but neither of them worked.

I think that in order for them to work this line which results in the error message Expected even number of hex-nibbles within double-quotes should be executed, and in order for this line to be executed, the function Parser::parsePrimaryExpression should be called after it parses the hex string but that can't happen, and this line is being hit, which is producing the error message Expected ';' but got 'Illegal'.

In case I don't call next() when an illegal token is found, I'll get the following message instead Expected ';' but got 'HexStringLiteral'.

Ok - now I understand. I think it's fine like that!

chriseth · 2019-11-25T22:42:10Z

Could you squash everything into a single commit, please?

ghallak · 2019-11-26T09:47:10Z

@chriseth Done here fa2541a

erak

Just some minor changes that I'd like to see done. Looks food otherwise. Thanks @ghallak :)

erak · 2019-11-26T11:03:39Z

Changelog.md

@@ -2,6 +2,7 @@

 Language Features:
 * Allow to obtain the selector of public or external library functions via a member ``.selector``.
+ * Parser: Allow splitting string literals into multiple parts.


Suggested change

* Parser: Allow splitting string literals into multiple parts.

* Parser: Allow splitting (hexadecimal) string literals into multiple parts.

I think we should mention both.

Suggested change

* Parser: Allow splitting string literals into multiple parts.

* Parser: Allow splitting string and hexadecimal string literals into multiple parts.

Or perhaps the long version?

erak · 2019-11-26T11:04:38Z

docs/types/value-types.rst

@@ -498,7 +498,7 @@ terminate the string literal. Newline only terminates the string literal if it i
 Hexadecimal Literals
 --------------------

-Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.
+Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.


Suggested change

Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.

Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and they can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.

erak · 2019-11-26T11:05:14Z

liblangutil/Token.h

@@ -221,6 +221,7 @@ namespace langutil
 	K(FalseLiteral, "false", 0)                                        \
 	T(Number, nullptr, 0)                                              \
 	T(StringLiteral, nullptr, 0)                                       \
+	T(HexStringLiteral, nullptr, 0)                                       \


Suggested change

T(HexStringLiteral, nullptr, 0) \

T(HexStringLiteral, nullptr, 0) \

erak · 2019-11-26T11:09:31Z

test/libsolidity/syntaxTests/string/string_multipart_hex_valid_parts.sol

@@ -0,0 +1,8 @@
+contract test {
+    function f() public pure returns (bytes32) {
+        bytes32 escapeCharacters = hex"aa" hex"ax";


Perhaps move the invalid hex"ax" to another invalid hex string test and replace this by a "valid", but too short hex string? It took me a second to realize that the x in the second part makes this a hex string with uneven nibbles. But I'm also fine with the current test.

ghallak · 2019-11-26T11:40:03Z

@erak All the requested changes were implemented here f7d33ea.

erak

Looks good! @chriseth Do you want to have another look?

chriseth · 2019-11-26T14:36:06Z

Thanks a lot for your help, @ghallak!

…5.14` (#799) Introduced in ethereum/solidity#7524

erak requested changes Oct 16, 2019

View reviewed changes

ghallak force-pushed the multipart-strings branch from 0e052b1 to 0e2b3bc Compare October 16, 2019 14:03

ghallak requested a review from erak October 16, 2019 14:05

erak requested changes Oct 16, 2019

View reviewed changes

erak mentioned this pull request Oct 18, 2019

Bumps XCode version for CircleCI builds #7552

Merged

ghallak force-pushed the multipart-strings branch from 9eb1096 to b43761f Compare October 18, 2019 23:45

ghallak requested a review from erak October 19, 2019 16:53

ghallak force-pushed the multipart-strings branch from 6256a54 to 813c80e Compare November 18, 2019 11:22

chriseth reviewed Nov 19, 2019

View reviewed changes

ghallak force-pushed the multipart-strings branch from 4382022 to 7a2b6dd Compare November 25, 2019 00:07

ghallak commented Nov 25, 2019

View reviewed changes

ghallak force-pushed the multipart-strings branch from 7a2b6dd to fa2541a Compare November 26, 2019 00:24

erak requested changes Nov 26, 2019

View reviewed changes

ghallak force-pushed the multipart-strings branch from f7d33ea to bcc7d0c Compare November 26, 2019 11:40

Allow splitting string literals into multiple parts

4a1e854

ghallak force-pushed the multipart-strings branch from bcc7d0c to 4a1e854 Compare November 26, 2019 11:41

erak approved these changes Nov 26, 2019

View reviewed changes

chriseth approved these changes Nov 26, 2019

View reviewed changes

chriseth merged commit ba8ff17 into ethereum:develop Nov 26, 2019

ghallak deleted the multipart-strings branch November 26, 2019 17:16

forshtat mentioned this pull request Feb 6, 2020

Multiline string literals seem to be broken protofire/solhint#187

Closed

fvictorio mentioned this pull request Feb 6, 2020

Add support for multiline string literals Consensys/solidity-parser-antlr#4

Closed

mariocao mentioned this pull request Apr 29, 2020

fix: preventing burning user funds witnet/witnet-solidity-bridge#86

Merged

junderw mentioned this pull request Nov 12, 2020

Support multiline hex literals. solidity-parser/parser#32

Closed

cameel mentioned this pull request Apr 10, 2023

bytes/string literal concatenation #13661

Closed

Xanewok mentioned this pull request Feb 13, 2024

prevent parsing multiple literals under StringExpression before 0.5.14 NomicFoundation/slang#799

Merged

github-merge-queue bot pushed a commit to NomicFoundation/slang that referenced this pull request Feb 13, 2024

prevent parsing multiple literals under StringExpression before `0.…

303dda9

…5.14` (#799) Introduced in ethereum/solidity#7524

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow splitting string literals into multiple parts #7524

Allow splitting string literals into multiple parts #7524

ghallak commented Oct 7, 2019

erak left a comment

erak left a comment •

edited

Loading

ghallak commented Oct 17, 2019 •

edited

Loading

erak commented Oct 23, 2019

Marenz commented Nov 11, 2019

chriseth Nov 19, 2019

ghallak Nov 20, 2019

chriseth Nov 21, 2019

ghallak Nov 25, 2019

chriseth Nov 19, 2019

ghallak Nov 25, 2019

ghallak Nov 25, 2019

ghallak Nov 25, 2019

chriseth Nov 25, 2019

chriseth Nov 25, 2019

ghallak Nov 25, 2019

chriseth Nov 25, 2019

chriseth commented Nov 25, 2019

ghallak commented Nov 26, 2019

erak left a comment

erak Nov 26, 2019

erak Nov 26, 2019

erak Nov 26, 2019

erak Nov 26, 2019

erak Nov 26, 2019

ghallak commented Nov 26, 2019

erak left a comment

chriseth commented Nov 26, 2019

	Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" "44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.
	Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values.

		if (m_scanner->currentToken() == Token::Illegal)
		fatalParserError(to_string(m_scanner->currentError()));

	* Parser: Allow splitting string literals into multiple parts.
	* Parser: Allow splitting (hexadecimal) string literals into multiple parts.

	* Parser: Allow splitting string literals into multiple parts.
	* Parser: Allow splitting string and hexadecimal string literals into multiple parts.

	T(HexStringLiteral, nullptr, 0) \
	T(HexStringLiteral, nullptr, 0) \

Allow splitting string literals into multiple parts #7524

Allow splitting string literals into multiple parts #7524

Conversation

ghallak commented Oct 7, 2019

erak left a comment

Choose a reason for hiding this comment

erak left a comment • edited Loading

Choose a reason for hiding this comment

ghallak commented Oct 17, 2019 • edited Loading

erak commented Oct 23, 2019

Marenz commented Nov 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chriseth commented Nov 25, 2019

ghallak commented Nov 26, 2019

erak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghallak commented Nov 26, 2019

erak left a comment

Choose a reason for hiding this comment

chriseth commented Nov 26, 2019

erak left a comment •

edited

Loading

ghallak commented Oct 17, 2019 •

edited

Loading