Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the spec deviation in Unicode #20714

Merged
merged 9 commits into from
Feb 7, 2020

Conversation

KavinduZoysa
Copy link
Contributor

@KavinduZoysa KavinduZoysa commented Jan 23, 2020

Purpose

$subject

Fixes #13180

Approach

Change the lexer to add the curly braces to Unicode.

Check List

  • Read the Contributing Guide
  • Updated Change Log
  • Checked Tooling Support (#)
  • Added necessary tests
    • Unit Tests
    • Spec Conformance Tests
    • Integration Tests
    • Ballerina By Example Tests
  • Increased Test Coverage
  • Added necessary documentation
    • API documentation
    • Module documentation in Module.md files
    • Ballerina By Examples

@codecov-io
Copy link

codecov-io commented Jan 23, 2020

Codecov Report

Merging #20714 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #20714   +/-   ##
=======================================
  Coverage   14.59%   14.59%           
=======================================
  Files          51       51           
  Lines        1398     1398           
  Branches      214      214           
=======================================
  Hits          204      204           
  Misses       1178     1178           
  Partials       16       16

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f82b1e...fd1c7b8. Read the comment docs.

@@ -40,4 +40,5 @@ private Constants() {

public static final int INIT_METHOD_SPLIT_SIZE = 50;

public static final String UNICODE_REGEX = "\\\\u[{]([a-fA-F0-9]*)[}]";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need [{] and [}] instead we should be able to use \\{ and \\}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like \\\\u\\{([a-fA-F0-9]{1,6})\\}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need [{] and [}] instead we should be able to use \\{ and \\}

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please check what happens if an empty string is matched here and too large number matches here.
I think it will be a number format exception when parsing the int.
Shall we therefore use something like \\\\u\\{([a-fA-F0-9]{1,6})\\}

According to the spec, there should be at least one hex value. So if the user enters an empty string, parse throws an error. Also, the Unicodes which are out of range(greater than 0x10FFFF) are handled here, because we need to throw a compile error.

Comment on lines 2652 to 2713
String text = node.getText();
text = text.substring(1, text.length() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to merge these two lines in this PR as well -

String text = node.getText().substring(1, text.length() - 1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do that because we need to get text.length()

@hasithaa hasithaa added the Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times. label Jan 29, 2020
@hasithaa hasithaa added this to the Ballerina 1.2.0 milestone Jan 29, 2020
@@ -341,7 +341,7 @@ EscapeSequence

fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
: '\\' 'u' LEFT_BRACE HexDigit+ RIGHT_BRACE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change HexDigit* here to avoid a syntax error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, Unicode is defined as,

StringNumericEscape := \u{ CodePoint }
CodePoint := HexDigit+

@@ -1,4 +1,4 @@
// Generated from BallerinaLexer.g4 by ANTLR 4.5.3
// Generated from /home/kavindu/WSO2-GIT/test/ballerina-lang/compiler/ballerina-lang/src/main/resources/grammar/BallerinaLexer.g4 by ANTLR 4.5.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we generate the parser from compiler/ballerina-lang/src/main/resources/grammar/ directory

Navigate to grammar directory:
java -jar ~/Downloads/antlr-4.5.3-complete.jar *.g4 -package org.wso2.ballerinalang.compiler.parser.antlr4 -o ../../java/org/wso2/ballerinalang/compiler/parser/antlr4/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get fixed, Once I combine All PRs. But it is a good practice to generate parser as @rdhananjaya mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@hasithaa hasithaa merged commit fd1c7b8 into ballerina-platform:master Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team/CompilerFE All issues related to Language implementation and Compiler, this exclude run times.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spec Deviation] Support StringNumericEscape
7 participants