-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
xid_start/xid_continue lexer classes and the Unicode subsystem refact…
…oring (#16) - `$xid_start` and `$xid_continue` Lexer regex classes added to address an issue with Unicode identifier parsing. - Support for classes with combined Unicode properties introduced. Users can now write combined classes using the `${...}` syntax in the Token macro rules: `${alpha | num}` means alphabetic or numeric character. - The choice between individual Unicode classes is now forbidden. Programmers can no longer write `$alpha | $num` expressions (but they can write `${alpha | num}`). This syntax was allowed in the previous version, but it didn't work properly because the corresponding classes intersected in their code-point subsets. In future versions, I will consider partially relaxing this restriction. - The behavior of `$alpha` has been fixed in this pull request. Previously, it was interpreted as `${upper | lower}`, which does not fit the UCD specification. - The `lexis::Char` and `lexis::CharProperties` types have been introduced in the main crate. These types allow users to test Unicode properties of characters based on UCD data. These changes also make it easier to introduce new lexer classes into the Token macro regex syntax.
- Loading branch information
1 parent
cc36876
commit e909fbb
Showing
15 changed files
with
3,733 additions
and
266 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.