Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GB2312's em dash #18

Open
Artoria2e5 opened this issue Dec 1, 2016 · 4 comments
Open

GB2312's em dash #18

Artoria2e5 opened this issue Dec 1, 2016 · 4 comments

Comments

@Artoria2e5
Copy link
Collaborator

Artoria2e5 commented Dec 1, 2016

bsdconv's GB2312 table which comes from unicode.org and went missing after EASTASIA charts became obsolete is, to some extent, similar to Unicode's Big5 table in quality. (I will use unicode.org's whatever hex to refer to GB codepoints, so add 0x8080 for EUC-CN.)

In GB2312-1980, 212A is defined as 破折号 (em dash), but the Unicode mapping gives a U+2015 (horizontal bar) instead of U+2014, apparently without reading the Chinese text at all. Hence GB2312's decoder should be changed to emit U+2014 just for proper punctuation; the encoder should be made to accept U+2014 too.

By the way, 212A is one of "Unicode" gb2312-80's incompatibilities with GBK; the other one is at 2124. You may choose to use a non-fullwidth, regular "middle dot" as GBK does and W3C CLREQ recommends typographically, but what I hope for now is just the encoder accepting U+00B7.

@buganini
Copy link
Owner

buganini commented Dec 1, 2016

Please feel free to change anything about simplified chinese, since I am not native user for it, the current state is just enough for my previous use cases.

@Artoria2e5
Copy link
Collaborator Author

Sure.

@Artoria2e5
Copy link
Collaborator Author

Wait... With #17 how did it even work...

@buganini
Copy link
Owner

buganini commented Dec 1, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants