Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode normalization #10

Closed
rurban opened this issue Apr 4, 2021 · 1 comment
Closed

unicode normalization #10

rurban opened this issue Apr 4, 2021 · 1 comment

Comments

@rurban
Copy link

rurban commented Apr 4, 2021

Unaccent is a nice feature, but fails with denormalized ordered sequences, and on all non-mark sequences, such as all non-european languages. There really should be normalization step added, like NFD, and maybe even add a field to cache this NFD string and a flag if already done (and equal as in 95% of all cases).

@nalgeon
Copy link
Owner

nalgeon commented Apr 4, 2021

Yeah, probably. Unfortunately, I'm not nearly as good with Unicode (or C programming) as necessary to even try adding that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants