Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xid_start/xid_continue lexer classes and the Unicode subsystem refactoring #16

Merged
merged 11 commits into from
Jul 4, 2024

Conversation

Eliah-Lakhin
Copy link
Owner

@Eliah-Lakhin Eliah-Lakhin commented Jul 4, 2024

Fixes #14

Summary of changes:

  • $xid_start and $xid_continue Lexer regex classes added to address an issue with Unicode identifier parsing.
  • Support for classes with combined Unicode properties introduced. Users can now write combined classes using the ${...} syntax in the Token macro rules: ${alpha | num} means alphabetic or numeric character.
  • The choice between individual Unicode classes is now forbidden. Programmers can no longer write $alpha | $num expressions (but they can write ${alpha | num}). This syntax was allowed in the previous version, but it didn't work properly because the corresponding classes intersected in their code-point subsets. In future versions, I will consider partially relaxing this restriction.
  • The behavior of $alpha has been fixed in this pull request. Previously, it was interpreted as ${upper | lower}, which does not fit the UCD specification.
  • The lexis::Char and lexis::CharProperties types have been introduced in the main crate. These types allow users to test Unicode properties of characters based on UCD data. These changes also make it easier to introduce new lexer classes into the Token macro regex syntax.

@Eliah-Lakhin Eliah-Lakhin linked an issue Jul 4, 2024 that may be closed by this pull request
@Eliah-Lakhin Eliah-Lakhin merged commit e909fbb into master Jul 4, 2024
@Eliah-Lakhin Eliah-Lakhin deleted the issue-14-xid-start-and-continue-classes branch July 4, 2024 19:34
@Eliah-Lakhin Eliah-Lakhin added the enhancement New feature proposal label Jul 9, 2024
Eliah-Lakhin added a commit that referenced this pull request Sep 3, 2024
Semantic Analysis Framework:

- #22: Introduced the new `Slot` object. This object is similar to `Attr`, except that its value is edited directly by the API user, and it does not have an associated computable function. This allows users to inject external metadata into the semantic model.
- #22: Implemented the common semantic feature. This feature enables the user to specify Analyzer-wide semantic graph nodes (attributes and slots) that are shared across all compilation units. These enhancements address the lack of global state within the Analyzer's semantic analysis framework and introduce a conventional method to organize cross-document relationships.

Lexer:

- #16: Introduced new `$xid_start` and `$xid_continue` Lexer Regex classes.
- #16: Added support for classes with combined Unicode properties: `${alpha | num}`.
- #16: Adjusted the `$alpha` class in accordance with UCD specifications.
- #16: Introduced the `lexis::Char` and `lexis::CharProperties` types in the main crate, enabling users to test characters for Unicode properties.
- #20: Added a new Token operator `i("abc")` that expands to case-insensitive matching.
- Fixed an edge-case bug in the Document (MutableUnit). The Mutable Document's lexer sometimes misinterpreted trailing token bounds when the user rewrote the end of the text.
- Fixed a minor bug in the `#[constructor]` attribute of the Token macro.

Syntax Parser:

- #19: Fixed a minor bug in the conflict resolutions of the Node macro's root rule.

Breaking Changes:

- #22: The `analysis::Feature` and `analysis::AbstractFeature` traits have received new members.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it possible support more unicode regex rule?
1 participant