fix(compiler)!: Apply correct rules for parsing Unicode whitespace #1554
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While working on the language reference, I realized that the category used for whitespace wasn't quite right, disallowing some common whitespace while allowing some uncommon ones.
This PR states explicitly that Grain follows https://unicode.org/reports/tr31/#Pattern_Syntax for Unicode allowed in the syntax of the language.
Whitespace in Grain now properly adheres to Pattern_White_Space, with additional Grain semantics described below.
Whitespace includes:
Spaces, namely
U+0009
U+000B
U+0020
U+200E
U+200F
Line separators, namely
U+000A
U+000C
U+000D
U+0085
U+2028
U+2029
Line separators act as end-of-statement characters in Grain. Note that this is distinct from file line endings—Grain supports only LF and CRLF (relevant for compiler error messages and tooling).