Fix the spec deviation in Unicode #20714

KavinduZoysa · 2020-01-23T10:52:25Z

Purpose

$subject

Fixes #13180

Approach

Change the lexer to add the curly braces to Unicode.

Check List

codecov-io · 2020-01-23T13:35:40Z

Codecov Report

Merging #20714 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #20714   +/-   ##
=======================================
  Coverage   14.59%   14.59%           
=======================================
  Files          51       51           
  Lines        1398     1398           
  Branches      214      214           
=======================================
  Hits          204      204           
  Misses       1178     1178           
  Partials       16       16

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f82b1e...fd1c7b8. Read the comment docs.

compiler/ballerina-lang/src/main/resources/compiler.properties

tests/jballerina-unit-test/src/test/java/org/ballerinalang/test/types/string/UniCodeTest.java

rdhananjaya · 2020-01-24T05:31:00Z

compiler/ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/util/Constants.java

@@ -40,4 +40,5 @@ private Constants() {

    public static final int INIT_METHOD_SPLIT_SIZE = 50;

+    public static final String UNICODE_REGEX = "\\\\u[{]([a-fA-F0-9]*)[}]";


do we need [{] and [}] instead we should be able to use \\{ and \\}

And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like \\\\u\\{([a-fA-F0-9]{1,6})\\}

do we need [{] and [}] instead we should be able to use \\{ and \\}

+1

And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like \\\\u\\{([a-fA-F0-9]{1,6})\\}

According to the spec, there should be at least one hex value. So if the user enters an empty string, parse throws an error. Also, the Unicodes which are out of range(greater than 0x10FFFF) are handled here, because we need to throw a compile error.

...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java

compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4

compiler/ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/util/Constants.java

Shan1024 · 2020-01-29T03:24:39Z

...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java

            String text = node.getText();
            text = text.substring(1, text.length() - 1);


I think it is better to merge these two lines in this PR as well -

String text = node.getText().substring(1, text.length() - 1);

We cannot do that because we need to get text.length()

...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java

hasithaa · 2020-01-29T04:32:16Z

compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4

@@ -341,7 +341,7 @@ EscapeSequence

 fragment
 UnicodeEscape
-    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
+    :   '\\' 'u' LEFT_BRACE HexDigit+ RIGHT_BRACE


Should we change HexDigit* here to avoid a syntax error?

According to the spec, Unicode is defined as,

StringNumericEscape := \u{ CodePoint } CodePoint := HexDigit+

rdhananjaya · 2020-02-07T07:59:47Z

...llerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/antlr4/BallerinaLexer.java

@@ -1,4 +1,4 @@
-// Generated from BallerinaLexer.g4 by ANTLR 4.5.3
+// Generated from /home/kavindu/WSO2-GIT/test/ballerina-lang/compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4 by ANTLR 4.5.3


Shall we generate the parser from compiler/ballerina-lang/src/main/resources/grammar/ directory

Navigate to grammar directory: java -jar ~/Downloads/antlr-4.5.3-complete.jar *.g4 -package org.wso2.ballerinalang.compiler.parser.antlr4 -o ../../java/org/wso2/ballerinalang/compiler/parser/antlr4/

This will get fixed, Once I combine All PRs. But it is a good practice to generate parser as @rdhananjaya mentioned.

KavinduZoysa requested review from pubudu91, MaryamZi, dulvinw, gimantha, grainier, hasithaa, irshadnilam, KRVPerera and rdhananjaya January 23, 2020 10:53

MaryamZi reviewed Jan 24, 2020

View reviewed changes

compiler/ballerina-lang/src/main/resources/compiler.properties Outdated Show resolved Hide resolved

tests/jballerina-unit-test/src/test/java/org/ballerinalang/test/types/string/UniCodeTest.java Outdated Show resolved Hide resolved

rdhananjaya reviewed Jan 24, 2020

View reviewed changes

KavinduZoysa force-pushed the issue-13180-master branch from f9719f4 to 528c5cc Compare January 24, 2020 06:10

KRVPerera reviewed Jan 24, 2020

View reviewed changes

...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java Show resolved Hide resolved

KRVPerera approved these changes Jan 28, 2020

View reviewed changes

Shan1024 reviewed Jan 29, 2020

View reviewed changes

hasithaa added the Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. label Jan 29, 2020

hasithaa added this to the Ballerina 1.2.0 milestone Jan 29, 2020

hasithaa reviewed Jan 29, 2020

View reviewed changes

...ballerina-lang/src/main/java/org/wso2/ballerinalang/compiler/parser/BLangParserListener.java Outdated Show resolved Hide resolved

hasithaa reviewed Jan 29, 2020

View reviewed changes

KavinduZoysa force-pushed the issue-13180-master branch from 80923f5 to d62c58a Compare February 5, 2020 10:11

KavinduZoysa added 9 commits February 5, 2020 17:13

Add the curly braces to unicode

36d7878

Handle errors

3428572

Add the test cases

a6081dd

Add the doc comments

0ac4d17

Change the error message

c5bb179

Change the REGEX

1c723e6

Change the way of matching

75c9e64

Fixed the suggested changes

25198bc

resolve conflicts

fd1c7b8

KavinduZoysa force-pushed the issue-13180-master branch from d62c58a to fd1c7b8 Compare February 5, 2020 11:58

rdhananjaya reviewed Feb 7, 2020

View reviewed changes

hasithaa approved these changes Feb 7, 2020

View reviewed changes

hasithaa mentioned this pull request Feb 7, 2020

Lang Changes - Combined PR #20920

Merged

hasithaa merged commit fd1c7b8 into ballerina-platform:master Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the spec deviation in Unicode #20714

Fix the spec deviation in Unicode #20714

KavinduZoysa commented Jan 23, 2020 •

edited

Loading

codecov-io commented Jan 23, 2020 •

edited

Loading

rdhananjaya Jan 24, 2020

rdhananjaya Jan 24, 2020

KavinduZoysa Jan 24, 2020

KavinduZoysa Jan 28, 2020

Shan1024 Jan 29, 2020

KavinduZoysa Feb 2, 2020

hasithaa Jan 29, 2020

KavinduZoysa Feb 2, 2020

rdhananjaya Feb 7, 2020

hasithaa Feb 7, 2020

KavinduZoysa Feb 7, 2020

		@@ -40,4 +40,5 @@ private Constants() {

		public static final int INIT_METHOD_SPLIT_SIZE = 50;

		public static final String UNICODE_REGEX = "\\\\u[{]([a-fA-F0-9]*)[}]";

		String text = node.getText();
		text = text.substring(1, text.length() - 1);

		@@ -1,4 +1,4 @@
		// Generated from BallerinaLexer.g4 by ANTLR 4.5.3
		// Generated from /home/kavindu/WSO2-GIT/test/ballerina-lang/compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4 by ANTLR 4.5.3

Fix the spec deviation in Unicode #20714

Fix the spec deviation in Unicode #20714

Conversation

KavinduZoysa commented Jan 23, 2020 • edited Loading

Purpose

Approach

Check List

codecov-io commented Jan 23, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KavinduZoysa commented Jan 23, 2020 •

edited

Loading

codecov-io commented Jan 23, 2020 •

edited

Loading