xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

Eliah-Lakhin · 2024-07-04T19:15:02Z

Fixes #14

Summary of changes:

$xid_start and $xid_continue Lexer regex classes added to address an issue with Unicode identifier parsing.
Support for classes with combined Unicode properties introduced. Users can now write combined classes using the ${...} syntax in the Token macro rules: ${alpha | num} means alphabetic or numeric character.
The choice between individual Unicode classes is now forbidden. Programmers can no longer write $alpha | $num expressions (but they can write ${alpha | num}). This syntax was allowed in the previous version, but it didn't work properly because the corresponding classes intersected in their code-point subsets. In future versions, I will consider partially relaxing this restriction.
The behavior of $alpha has been fixed in this pull request. Previously, it was interpreted as ${upper | lower}, which does not fit the UCD specification.
The lexis::Char and lexis::CharProperties types have been introduced in the main crate. These types allow users to test Unicode properties of characters based on UCD data. These changes also make it easier to introduce new lexer classes into the Token macro regex syntax.

…pdated

Semantic Analysis Framework: - #22: Introduced the new `Slot` object. This object is similar to `Attr`, except that its value is edited directly by the API user, and it does not have an associated computable function. This allows users to inject external metadata into the semantic model. - #22: Implemented the common semantic feature. This feature enables the user to specify Analyzer-wide semantic graph nodes (attributes and slots) that are shared across all compilation units. These enhancements address the lack of global state within the Analyzer's semantic analysis framework and introduce a conventional method to organize cross-document relationships. Lexer: - #16: Introduced new `$xid_start` and `$xid_continue` Lexer Regex classes. - #16: Added support for classes with combined Unicode properties: `${alpha | num}`. - #16: Adjusted the `$alpha` class in accordance with UCD specifications. - #16: Introduced the `lexis::Char` and `lexis::CharProperties` types in the main crate, enabling users to test characters for Unicode properties. - #20: Added a new Token operator `i("abc")` that expands to case-insensitive matching. - Fixed an edge-case bug in the Document (MutableUnit). The Mutable Document's lexer sometimes misinterpreted trailing token bounds when the user rewrote the end of the text. - Fixed a minor bug in the `#[constructor]` attribute of the Token macro. Syntax Parser: - #19: Fixed a minor bug in the conflict resolutions of the Node macro's root rule. Breaking Changes: - #22: The `analysis::Feature` and `analysis::AbstractFeature` traits have received new members.

Eliah-Lakhin added 11 commits July 3, 2024 06:12

#14 alpha class bug fixed (missing non-case chars); Token macro doc u…

272e923

…pdated

#14 UCD generator WIP

2cdde98

#14 UCD generator WIP

a422393

#14 UCD generator WIP

3dc8d1a

#14 UCD generator improvements

1837d7d

#14 UCD module added to the lexis module

e891b6a

#14 Diaply and Ord impls for CharProperties

0e76c3f

#14 Token macro incrorporates new UCD features

c1f375d

#14 Token macro fallback cases bug fixed

371ccfb

#14 Token macro documentation updated

fa59aee

#14 Token macro documentation fix

fcbf115

Eliah-Lakhin linked an issue Jul 4, 2024 that may be closed by this pull request

Is it possible support more unicode regex rule? #14

Closed

Eliah-Lakhin merged commit e909fbb into master Jul 4, 2024

Eliah-Lakhin deleted the issue-14-xid-start-and-continue-classes branch July 4, 2024 19:34

Eliah-Lakhin added the enhancement New feature proposal label Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

Eliah-Lakhin commented Jul 4, 2024 •

edited

Loading

xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

Conversation

Eliah-Lakhin commented Jul 4, 2024 • edited Loading

Eliah-Lakhin commented Jul 4, 2024 •

edited

Loading