Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Unicode range #2618

Open
ldci opened this issue Aug 17, 2024 · 4 comments
Open

Extend Unicode range #2618

ldci opened this issue Aug 17, 2024 · 4 comments

Comments

@ldci
Copy link

ldci commented Aug 17, 2024

To-char integer! is functional with R3. This only works for code points in the range of 0 to 65535. But, most Unicode characters have code points beyond that range, going up to 1114111. Would it be possible to extend this to integers greater than 65535?

@Oldes
Copy link
Owner

Oldes commented Aug 17, 2024

I would not say most Unicode characters. It is mostly the emoticon poison which requires 32bit characters.
It is possible to enhance the range, but it is quite a lot of work and not my priority at this moment.

@Oldes
Copy link
Owner

Oldes commented Aug 17, 2024

See Carl's comment here: #683

@Oldes
Copy link
Owner

Oldes commented Aug 17, 2024

And it is good to read Brian comments as well: #2024

I'm not decided which model to use. But currently UTF-8 everywhere (the path used in Ren-C) is a little bit winning. But it is also a huge amount of work. But have all strings to use 32bit chars just because someone used an emoticon in a text is not good.

@Oldes
Copy link
Owner

Oldes commented Aug 17, 2024

Implementing the UCS switching model is easier, but using just UTF-8 internally has many advantages. Of course, it would be best to implement both and compare their real performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants