-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Eliah-Lakhin
added a commit
that referenced
this pull request
Sep 3, 2024
Semantic Analysis Framework: - #22: Introduced the new `Slot` object. This object is similar to `Attr`, except that its value is edited directly by the API user, and it does not have an associated computable function. This allows users to inject external metadata into the semantic model. - #22: Implemented the common semantic feature. This feature enables the user to specify Analyzer-wide semantic graph nodes (attributes and slots) that are shared across all compilation units. These enhancements address the lack of global state within the Analyzer's semantic analysis framework and introduce a conventional method to organize cross-document relationships. Lexer: - #16: Introduced new `$xid_start` and `$xid_continue` Lexer Regex classes. - #16: Added support for classes with combined Unicode properties: `${alpha | num}`. - #16: Adjusted the `$alpha` class in accordance with UCD specifications. - #16: Introduced the `lexis::Char` and `lexis::CharProperties` types in the main crate, enabling users to test characters for Unicode properties. - #20: Added a new Token operator `i("abc")` that expands to case-insensitive matching. - Fixed an edge-case bug in the Document (MutableUnit). The Mutable Document's lexer sometimes misinterpreted trailing token bounds when the user rewrote the end of the text. - Fixed a minor bug in the `#[constructor]` attribute of the Token macro. Syntax Parser: - #19: Fixed a minor bug in the conflict resolutions of the Node macro's root rule. Breaking Changes: - #22: The `analysis::Feature` and `analysis::AbstractFeature` traits have received new members.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #14
Summary of changes:
$xid_start
and$xid_continue
Lexer regex classes added to address an issue with Unicode identifier parsing.${...}
syntax in the Token macro rules:${alpha | num}
means alphabetic or numeric character.$alpha | $num
expressions (but they can write${alpha | num}
). This syntax was allowed in the previous version, but it didn't work properly because the corresponding classes intersected in their code-point subsets. In future versions, I will consider partially relaxing this restriction.$alpha
has been fixed in this pull request. Previously, it was interpreted as${upper | lower}
, which does not fit the UCD specification.lexis::Char
andlexis::CharProperties
types have been introduced in the main crate. These types allow users to test Unicode properties of characters based on UCD data. These changes also make it easier to introduce new lexer classes into the Token macro regex syntax.