-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the spec deviation in Unicode #20714
Fix the spec deviation in Unicode #20714
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20714 +/- ##
=======================================
Coverage 14.59% 14.59%
=======================================
Files 51 51
Lines 1398 1398
Branches 214 214
=======================================
Hits 204 204
Misses 1178 1178
Partials 16 16 Continue to review full report at Codecov.
|
tests/jballerina-unit-test/src/test/java/org/ballerinalang/test/types/string/UniCodeTest.java
Outdated
Show resolved
Hide resolved
@@ -40,4 +40,5 @@ private Constants() { | |||
|
|||
public static final int INIT_METHOD_SPLIT_SIZE = 50; | |||
|
|||
public static final String UNICODE_REGEX = "\\\\u[{]([a-fA-F0-9]*)[}]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need [{]
and [}]
instead we should be able to use \\{
and \\}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like \\\\u\\{([a-fA-F0-9]{1,6})\\}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need
[{]
and[}]
instead we should be able to use\\{
and\\}
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like\\\\u\\{([a-fA-F0-9]{1,6})\\}
According to the spec, there should be at least one hex value. So if the user enters an empty string, parse throws an error. Also, the Unicodes which are out of range(greater than 0x10FFFF) are handled here, because we need to throw a compile error.
f9719f4
to
528c5cc
Compare
...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java
Show resolved
Hide resolved
...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java
Outdated
Show resolved
Hide resolved
compiler/ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/util/Constants.java
Outdated
Show resolved
Hide resolved
String text = node.getText(); | ||
text = text.substring(1, text.length() - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to merge these two lines in this PR as well -
String text = node.getText().substring(1, text.length() - 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do that because we need to get text.length()
...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java
Outdated
Show resolved
Hide resolved
@@ -341,7 +341,7 @@ EscapeSequence | |||
|
|||
fragment | |||
UnicodeEscape | |||
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit | |||
: '\\' 'u' LEFT_BRACE HexDigit+ RIGHT_BRACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change HexDigit* here to avoid a syntax error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, Unicode is defined as,
StringNumericEscape := \u{ CodePoint }
CodePoint := HexDigit+
80923f5
to
d62c58a
Compare
d62c58a
to
fd1c7b8
Compare
@@ -1,4 +1,4 @@ | |||
// Generated from BallerinaLexer.g4 by ANTLR 4.5.3 | |||
// Generated from /home/kavindu/WSO2-GIT/test/ballerina-lang/compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4 by ANTLR 4.5.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we generate the parser from compiler/ballerina-lang/src/main/resources/grammar/
directory
Navigate to grammar directory:
java -jar ~/Downloads/antlr-4.5.3-complete.jar *.g4 -package org.wso2.ballerinalang.compiler.parser.antlr4 -o ../../java/org/wso2/ballerinalang/compiler/parser/antlr4/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get fixed, Once I combine All PRs. But it is a good practice to generate parser as @rdhananjaya mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK.
Purpose
$subject
Fixes #13180
Approach
Change the lexer to add the curly braces to Unicode.
Check List