Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(developer): regex: support ranges 🙀 #10316

Closed
srl295 opened this issue Jan 4, 2024 · 6 comments · Fixed by #10614
Closed

feat(developer): regex: support ranges 🙀 #10316

srl295 opened this issue Jan 4, 2024 · 6 comments · Fixed by #10614

Comments

@srl295
Copy link
Member

srl295 commented Jan 4, 2024

… If I have a transform from [\u{00E8}-\u{00EB}] (èéêë) to [\u{00EC}-\u{00EF}] (ìíîï), how will that work with decomposition? Do we need to expand ranges (beware 0020-10FFFF)?

Note that actually we don't transform TO a range. That would be a Set which is well-defined here.

but the matching is still an issue to solve in this ticket.

This is under core because it has to be handled on the core side.

Split from #9468
Supports #9121

@srl295 srl295 added this to the A17S30 milestone Jan 4, 2024
@srl295 srl295 self-assigned this Jan 4, 2024
@srl295
Copy link
Member Author

srl295 commented Jan 4, 2024

warnings will be on the Developer side

@srl295 srl295 changed the title feat(developer): regex: support ranges 🙀 feat(core): regex: support ranges 🙀 Jan 4, 2024
@keymanapp-test-bot keymanapp-test-bot bot added core/ Keyman Core and removed developer/ labels Jan 4, 2024
@srl295
Copy link
Member Author

srl295 commented Jan 12, 2024

Icu can support \Uhhhhhhhh escapes, so this can be a pass-through if ther'es no decomposition: [\U00000-\UFFFFFF]

@mcdurdin
Copy link
Member

Idea: we decide to not decompose any characters represented within ranges. This means that some characters will never be matched, but it's not going to break things. There will be some benefit, e.g. allowing large ranges, allowing out-of-alphabet matches (negative matches) etc.

So if we decide not to decompose characters within ranges, then we will end up with something like:
/èéêë[^èéêë]/ ==> /e\U00301...[^èéêë]/ or /e\U00301...[^\U00eb...]/

@mcdurdin mcdurdin modified the milestones: A17S30, A17S31 Jan 20, 2024
@srl295
Copy link
Member Author

srl295 commented Jan 30, 2024

  • HINT if the range crosses non-NFD
  • WARN if the range starts or ends with non-NFD

@mcdurdin
Copy link
Member

This should make the Core changes minimal, correct? Because we no longer need to convert to a Set.

@mcdurdin mcdurdin changed the title feat(core): regex: support ranges 🙀 feat(developer): regex: support ranges 🙀 Jan 31, 2024
@keymanapp-test-bot keymanapp-test-bot bot added developer/ and removed core/ Keyman Core labels Jan 31, 2024
@mcdurdin
Copy link
Member

Changing this to be a Developer issue. We can push this one into beta if it's only hints and warnings. If there is other work to be done, can you detail that here @srl295?

srl295 added a commit that referenced this issue Feb 2, 2024
srl295 added a commit that referenced this issue Feb 2, 2024
srl295 added a commit that referenced this issue Feb 2, 2024
srl295 added a commit that referenced this issue Feb 2, 2024
srl295 added a commit that referenced this issue Feb 2, 2024
- add warning into tran.ts - will be one warning per <tranform> element

#10316
@mcdurdin mcdurdin linked a pull request Feb 3, 2024 that will close this issue
@mcdurdin mcdurdin modified the milestones: A17S31, B17S1 Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants