Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Switch from UTF-8 to MEWTF-128 #990

Closed
evie-calico opened this issue Apr 1, 2022 · 1 comment
Closed

[Feature request] Switch from UTF-8 to MEWTF-128 #990

evie-calico opened this issue Apr 1, 2022 · 1 comment

Comments

@evie-calico
Copy link
Contributor

evie-calico commented Apr 1, 2022

While the new syntax changes in 0.6.0 have finally brought RGBASM into the 22nd century, there is still one glaring issue with the language:
image

See anything missing? Where's ( •̀A•́)?

As many users are still stuck using the outdated Latin alphabet, along with terribly inefficient character encodings such at the so-called "UTF-8" (only 8? Everyone knows more bits is better), I propose a new character encoding which would finally make RGBASM usable by average programmers.

This encoding is called MEWTF-128, and contains 128 codepoints (for compatibility with ASCII text), each of which is 128-bits long. (This has the advantage of breaking ASCII compatibility, so users are required to migrate to MEWTF-128)

Here is a short example of some of the codepoints in MEWTF-128:

18446744073709551713 | ( •̀A•́)
18446744073709551714 | =B
18446744073709551715 | ♥(˘⌣˘ C)
18446744073709551716 | ;D
18446744073709551717 | (´ε` )♡
18446744073709551718 | 𝓕𝓾𝓬𝓴
18446744073709551719 | g
18446744073709551720 | н
18446744073709551721 | i
18446744073709551722 | j
18446744073709551723 | k
18446744073709551724 | ∠( ᐛ 」∠)_)
18446744073709551725 | m
18446744073709551726 | n
18446744073709551727 | o
18446744073709551728 | p

You may notice that MEWTF-128 codepoints have very large numeric values. This is a security feature as it causes each number to be littered with 𝓕𝓾𝓬𝓴s, a word too vulgar for any malicious peoples to read.

You may also recall that RGBASM uses some of the outdated latin characters in its instructions; these characters must be available for use in MEWTF-128. For compatibility, MEWTF-128 uses the NYULL (0x00000000000000000000000000000000) character to signal that the next character should be displayed as its latin equivalent. Here is an example C program (See this issue for an explanation of C) to demonstrate:

#include <sty( •̀A•́)n;D( •̀A•́)r;D👁🅾️>

♥(˘⌣˘ C)н( •̀A•́)r128_t (´ε` )♡x( •̀A•́)mpl(´ε` )♡[] = {'p', 'u', 's', 0x00000000000000000000000000000000, 'н', ' ', '( •̀A•́)', '𝓕𝓾𝓬𝓴'}

int m( •̀A•́)in(int ( •̀A•́)rg♥(˘⌣˘ C), ♥(˘⌣˘ C)н( •̀A•́)r ** ( •̀A•́)rgv) {
    puts((´ε` )♡x( •̀A•́)mpl(´ε` )♡);
    r(´ε` )turn 0;
}

This program would output the following, just as the user intends:

pus

Ultimately I hope this can be implemented quickly so that we can finally stop using useless character encodings like UTF-8.

@ISSOtm
Copy link
Member

ISSOtm commented Apr 2, 2022

Closing like #989 because I'm allergic to fish.

@ISSOtm ISSOtm closed this as completed Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants