unicode normalization #10

rurban · 2021-04-04T10:22:32Z

Unaccent is a nice feature, but fails with denormalized ordered sequences, and on all non-mark sequences, such as all non-european languages. There really should be normalization step added, like NFD, and maybe even add a field to cache this NFD string and a flag if already done (and equal as in 95% of all cases).

nalgeon · 2021-04-04T12:08:14Z

Yeah, probably. Unfortunately, I'm not nearly as good with Unicode (or C programming) as necessary to even try adding that.

nalgeon closed this as completed Apr 22, 2021

LifeCANvs mentioned this issue Jun 10, 2024

upstream issues LifeCANvs/LCvs-Lib-SQLite-ext#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicode normalization #10

unicode normalization #10

rurban commented Apr 4, 2021

nalgeon commented Apr 4, 2021

unicode normalization #10

unicode normalization #10

Comments

rurban commented Apr 4, 2021

nalgeon commented Apr 4, 2021