Extend Unicode range #2618

ldci · 2024-08-17T11:35:20Z

To-char integer! is functional with R3. This only works for code points in the range of 0 to 65535. But, most Unicode characters have code points beyond that range, going up to 1114111. Would it be possible to extend this to integers greater than 65535?

Oldes · 2024-08-17T20:48:09Z

I would not say most Unicode characters. It is mostly the emoticon poison which requires 32bit characters.
It is possible to enhance the range, but it is quite a lot of work and not my priority at this moment.

Oldes · 2024-08-17T20:58:07Z

See Carl's comment here: #683

Oldes · 2024-08-17T21:13:07Z

And it is good to read Brian comments as well: #2024

I'm not decided which model to use. But currently UTF-8 everywhere (the path used in Ren-C) is a little bit winning. But it is also a huge amount of work. But have all strings to use 32bit chars just because someone used an emoticon in a text is not good.

Oldes · 2024-08-17T21:23:24Z

Implementing the UCS switching model is easier, but using just UTF-8 internally has many advantages. Of course, it would be best to implement both and compare their real performance.

Oldes added Type.wish Type.Unicode labels Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Unicode range #2618

Extend Unicode range #2618

ldci commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024

Extend Unicode range #2618

Extend Unicode range #2618

Comments

ldci commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024

Oldes commented Aug 17, 2024