-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow splitting string literals into multiple parts #7524
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a Changelog entry and also adjust the documentation.
0e052b1
to
0e2b3bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that the current implementation might lead to some confusion. Please consider the following example:
function f() public pure returns (bytes32) {
bytes32 escapeCharacters = hex"0000"
"deaf"
"feed"
"beef"
"feed"
"fade";
return escapeCharacters;
}
Is it apparent that we're dealing with a hexadecimal literal here? I think it could be confused with a string literal and would therefor suggest to require a new line to start with another hex
:
function f() public pure returns (bytes32) {
bytes32 escapeCharacters = hex"0000"
hex"deaf"
hex"feed"
hex"beef"
hex"feed"
hex"fade";
return escapeCharacters;
}
In addition to that, I think only allowing whitespaces or newlines in hexstrings after an even amount of nibbles should be allowed. Please also see #7374, which applies these rules to underscores.
@erak I have modified in this commit 9eb1096 I tried to do the same in
\"DataName\" \"abc\" is being parsed as a single string DataNameabc which is not correct.
Now, both the scanner (to allow multipart hex string literals) and the parser (to allow multipart regular string literals) are modified. Can this be done better (probably by modifying either the scanner or the parser, not both)? |
9eb1096
to
b43761f
Compare
@ghallak Great, thanks. I think the implementation is fine. I've tested your changes and found out, that Could you also please add more syntax test for hex string literals? |
Tests are failing |
6256a54
to
813c80e
Compare
docs/types/value-types.rst
Outdated
@@ -498,7 +498,7 @@ terminate the string literal. Newline only terminates the string literal if it i | |||
Hexadecimal Literals | |||
-------------------- | |||
|
|||
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. | |||
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" "44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually prefer
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" "44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. | |
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseth Do you mean that hex"00112233" hex"44556677"
should be an allowed syntax? Are you suggesting that each part of a hex string should start with hex
?
In the original issue description here #7292 (comment) the hex"00112233" "44556677"
was given as an example, and from what I understood from the original issue description and from the previous comments on this PR, I have tried to implement the following:
- For regular strings, no parts should start with
hex
.
Right:
"aa" "bb" "cc"
"dd"
Wrong:
"aa" "bb" hex"cc"
"dd"
- For hex strings on a single line, only the first part should start with
hex
and the rest of the strings on the same line should not.
Right:
hex"aa" "22"
Wrong:
hex"aa" hex"22"
- For hex strings on multiple lines, only the first part on each line should start with
hex
.
Right:
hex"11" "22"
hex"33" "44"
Wrong:
hex"11" "22"
"33" "44"
What do you think of the above rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghallak newlines are fragile creatures, so I would like to avoid having to distinguish newlines from other whitespace as much as possible, so I think it would be better to force people to use a hex
prefix even for non-newline whitespace. What do the others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted the docs here 7a2b6dd
libsolidity/parsing/Parser.cpp
Outdated
@@ -1614,9 +1614,18 @@ ASTPointer<Expression> Parser::parsePrimaryExpression() | |||
} | |||
break; | |||
case Token::StringLiteral: | |||
{ | |||
string literal = m_scanner->currentLiteral(); | |||
while (m_scanner->peekNextToken() == Token::StringLiteral) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole change could be simplified and shortened by introducing Token::HexStringLiteral
and using this loop here to concatenate string literals and hex string literals while checking that their type stays the same. This would also allow us to create a better error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've introduced Token::HexStringLiteral
here af7d002 and made the relevant changes.
4382022
to
7a2b6dd
Compare
if (currentToken() == Token::HexStringLiteral) | ||
expectToken(Token::HexStringLiteral, false); | ||
else | ||
expectToken(Token::StringLiteral, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chriseth Is there a better way to do this? Maybe a function like expectToken
that works for multiple tokens instead of a single one?
if (m_scanner->currentToken() == Token::Illegal) | ||
fatalParserError(to_string(m_scanner->currentError())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution is to just not call next()
above in case you have an illegal token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you can transform the loop above into a do-while loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried both of these solution but neither of them worked.
I think that in order for them to work this line which results in the error message Expected even number of hex-nibbles within double-quotes
should be executed, and in order for this line to be executed, the function Parser::parsePrimaryExpression should be called after it parses the hex string but that can't happen, and this line is being hit, which is producing the error message Expected ';' but got 'Illegal'
.
In case I don't call next()
when an illegal token is found, I'll get the following message instead Expected ';' but got 'HexStringLiteral'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - now I understand. I think it's fine like that!
Could you squash everything into a single commit, please? |
7a2b6dd
to
fa2541a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor changes that I'd like to see done. Looks food otherwise. Thanks @ghallak :)
Changelog.md
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
Language Features: | |||
* Allow to obtain the selector of public or external library functions via a member ``.selector``. | |||
* Parser: Allow splitting string literals into multiple parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Parser: Allow splitting string literals into multiple parts. | |
* Parser: Allow splitting (hexadecimal) string literals into multiple parts. |
I think we should mention both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Parser: Allow splitting string literals into multiple parts. | |
* Parser: Allow splitting string and hexadecimal string literals into multiple parts. |
Or perhaps the long version?
docs/types/value-types.rst
Outdated
@@ -498,7 +498,7 @@ terminate the string literal. Newline only terminates the string literal if it i | |||
Hexadecimal Literals | |||
-------------------- | |||
|
|||
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. | |||
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and the can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. | |
Hexadecimal literals are prefixed with the keyword ``hex`` and are enclosed in double or single-quotes (``hex"001122FF"``), and they can also be split into multiple consecutive parts (``hex"00112233" hex"44556677"`` is equivalent to ``hex"0011223344556677"``). Their content must be a hexadecimal string and their value will be the binary representation of those values. |
liblangutil/Token.h
Outdated
@@ -221,6 +221,7 @@ namespace langutil | |||
K(FalseLiteral, "false", 0) \ | |||
T(Number, nullptr, 0) \ | |||
T(StringLiteral, nullptr, 0) \ | |||
T(HexStringLiteral, nullptr, 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T(HexStringLiteral, nullptr, 0) \ | |
T(HexStringLiteral, nullptr, 0) \ |
@@ -0,0 +1,8 @@ | |||
contract test { | |||
function f() public pure returns (bytes32) { | |||
bytes32 escapeCharacters = hex"aa" hex"ax"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move the invalid hex"ax"
to another invalid hex string test and replace this by a "valid", but too short hex string? It took me a second to realize that the x
in the second part makes this a hex string with uneven nibbles. But I'm also fine with the current test.
f7d33ea
to
bcc7d0c
Compare
bcc7d0c
to
4a1e854
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @chriseth Do you want to have another look?
Thanks a lot for your help, @ghallak! |
…5.14` (#799) Introduced in ethereum/solidity#7524
This should work for both normal and hex string literals, and the splitting can be done either on the same line or on multiple lines.
Closes #7292