Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Globbing #297

Open
isomarcte opened this issue Jan 4, 2023 · 0 comments
Open

Globbing #297

isomarcte opened this issue Jan 4, 2023 · 0 comments

Comments

@isomarcte
Copy link
Member

isomarcte commented Jan 4, 2023

https://github.com/typelevel/case-insensitive/blob/main/core/src/main/scala/org/typelevel/ci/package.scala#L34

The Unicode standard provides quite a few different ways to do case folding (the operation which yields a caseless string), with different trade offs on space usage and strictness. In general, we would like the default behavior to be a full case folded string using Canonical Equivalence between characters. This is modeled as CanonicalFullCaseFoldedString in the WIP PR #232.

In 1.x.x of case-insensitive we have a globbing matcher. The current implementation is based on the 1.x.x default case folded string, which I think (though am not 100% sure) is the same as a simple canonical case folded string.

The distinction here between "simple" and "full" is that a simple case fold will not change the number of char values needed to represent the string, but a full case fold may change the number of char values needed.

In 2.x.x we'd like all the default code paths to use full case folded operations, as they are the most correct (where incorrectness can introduce security issues and runtime failures for certain RFCs). However, I'm not 100% sure we can implement globbing safely for a full case folded string due to cases where a glyph may be represented by both N and N+M (where M is usually 1, 2, or 3) characters. See combining sequences.

I will follow up with some more concrete examples shortly.

If we can't adapt this for a full case folded string, we will need to deprecate it or leave it as a simple case folded implementation if we want to avoid a bincompat break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant