[Feature request] Switch from UTF-8 to MEWTF-128 #990

evie-calico · 2022-04-01T12:57:57Z

While the new syntax changes in 0.6.0 have finally brought RGBASM into the 22nd century, there is still one glaring issue with the language:

See anything missing? Where's ( •̀A•́)?

As many users are still stuck using the outdated Latin alphabet, along with terribly inefficient character encodings such at the so-called "UTF-8" (only 8? Everyone knows more bits is better), I propose a new character encoding which would finally make RGBASM usable by average programmers.

This encoding is called MEWTF-128, and contains 128 codepoints (for compatibility with ASCII text), each of which is 128-bits long. (This has the advantage of breaking ASCII compatibility, so users are required to migrate to MEWTF-128)

Here is a short example of some of the codepoints in MEWTF-128:

18446744073709551713 | ( •̀A•́)
18446744073709551714 | =B
18446744073709551715 | ♥(˘⌣˘ C)
18446744073709551716 | ;D
18446744073709551717 | (´ε｀ )♡
18446744073709551718 | 𝓕𝓾𝓬𝓴
18446744073709551719 | g
18446744073709551720 | н
18446744073709551721 | i
18446744073709551722 | j
18446744073709551723 | k
18446744073709551724 | ∠( ᐛ 」∠)＿)
18446744073709551725 | m
18446744073709551726 | n
18446744073709551727 | o
18446744073709551728 | p

You may notice that MEWTF-128 codepoints have very large numeric values. This is a security feature as it causes each number to be littered with 𝓕𝓾𝓬𝓴s, a word too vulgar for any malicious peoples to read.

You may also recall that RGBASM uses some of the outdated latin characters in its instructions; these characters must be available for use in MEWTF-128. For compatibility, MEWTF-128 uses the NYULL (0x00000000000000000000000000000000) character to signal that the next character should be displayed as its latin equivalent. Here is an example C program (See this issue for an explanation of C) to demonstrate:

#include <sty( •̀A•́)n;D( •̀A•́)r;D👁🅾️>

♥(˘⌣˘ C)н( •̀A•́)r128_t (´ε｀ )♡x( •̀A•́)mpl(´ε｀ )♡[] = {'p', 'u', 's', 0x00000000000000000000000000000000, 'н', ' ', '( •̀A•́)', '𝓕𝓾𝓬𝓴'}

int m( •̀A•́)in(int ( •̀A•́)rg♥(˘⌣˘ C), ♥(˘⌣˘ C)н( •̀A•́)r ** ( •̀A•́)rgv) {
    puts((´ε｀ )♡x( •̀A•́)mpl(´ε｀ )♡);
    r(´ε｀ )turn 0;
}

This program would output the following, just as the user intends:

pus

Ultimately I hope this can be implemented quickly so that we can finally stop using useless character encodings like UTF-8.

The text was updated successfully, but these errors were encountered:

ISSOtm · 2022-04-02T13:34:49Z

Closing like #989 because I'm allergic to fish.

ISSOtm closed this as completed Apr 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Switch from UTF-8 to MEWTF-128 #990

[Feature request] Switch from UTF-8 to MEWTF-128 #990

evie-calico commented Apr 1, 2022 •

edited

Loading

ISSOtm commented Apr 2, 2022

[Feature request] Switch from UTF-8 to MEWTF-128 #990

[Feature request] Switch from UTF-8 to MEWTF-128 #990

Comments

evie-calico commented Apr 1, 2022 • edited Loading

ISSOtm commented Apr 2, 2022

evie-calico commented Apr 1, 2022 •

edited

Loading