Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compiler)!: Apply correct rules for parsing Unicode whitespace #1554

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

ospencer
Copy link
Member

While working on the language reference, I realized that the category used for whitespace wasn't quite right, disallowing some common whitespace while allowing some uncommon ones.

This PR states explicitly that Grain follows https://unicode.org/reports/tr31/#Pattern_Syntax for Unicode allowed in the syntax of the language.

Whitespace in Grain now properly adheres to Pattern_White_Space, with additional Grain semantics described below.

Whitespace includes:

Spaces, namely

  • Horizontal tab, U+0009
  • Vertical tab, U+000B
  • Space, U+0020
  • Left-to-right mark, U+200E
  • Right-to-left mark, U+200F

Line separators, namely

  • Line feed, U+000A
  • Form feed, U+000C
  • Carriage return, U+000D
  • Next line, U+0085
  • Line separator, U+2028
  • Paragraph separator, U+2029

Line separators act as end-of-statement characters in Grain. Note that this is distinct from file line endings—Grain supports only LF and CRLF (relevant for compiler error messages and tooling).

@@ -113,7 +116,9 @@ let dec_float = [%sedlex.regexp?

let unsigned_float = [%sedlex.regexp? dec_float];

let uident = [%sedlex.regexp? (lu, Star(xid_continue))];
let uident = [%sedlex.regexp?
(Intersect(xid_start, lu), Star(xid_continue))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to what it was before, but I felt it made sense to make it more explicit since it's not obvious that Lu is a subset of xid_start.

Copy link
Member

@phated phated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ospencer ospencer force-pushed the oscar/fix-unicode-spaces branch 2 times, most recently from 9b823b2 to 7f0f63f Compare January 11, 2023 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants