-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spaces around East Asian punctuations in decorated text should not be required #1076
Comments
👋 Thanks for the report. Please note that the For this issue specifically, the root cause is in the CommonMark specification, which we adhere to. The section of the specification on emphasis states:
"punctuation character" is defined as "an ASCII punctuation character or anything in the Unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps", and The problem here is that this definition of punctuation character makes sense in the context of the specification if we assume "Unicode whitespace" is a part of the text used (as with most Latin alphabet-derived languages); we expect to see Hence, when we add emphasis (e.g. around With the English text, the opening With the Japanese text, however, the opening In short, this is a deficiency with the CommonMark specification's handling of East Asian text in general, because of the way the specification assumes interaction between punctuation characters and whitespace characters. I'll raise this issue (along with all the above information) in the CommonMark Discussion forum and work toward a solution. Thanks for your patience and for the report! |
Thread opened here: https://talk.commonmark.org/t/emphasis-and-east-asian-text/2491 |
@kivikakk thanks. I'll comment on the new thread. |
It's been over a year and we still haven't had movement here; pinging the upstream repo now. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Problem Description
In East Asian texts in general, word separators (spaces) never be written explicitly. So
> 前の**文字列**の後
should be rendered as
(Image)
data:image/s3,"s3://crabby-images/08f8c/08f8c44a8843f827563ce19885500d2728cfa92e" alt="ex000"
and in practice this works as expected.
However, if the text fragment to be decorated ends and/or starts with punctuation:
they won't render as expected:
(Image)
data:image/s3,"s3://crabby-images/85bca/85bcaaa7ecb223b2d506edfc66c6ab08735f5378" alt="ex001"
Possible workaround is inserting space before or after punctuations ("␣" means space):
but it will generate ugry text with an extra space before or after punctuations:
(Image)
data:image/s3,"s3://crabby-images/ba4a2/ba4a2f55c1bf51f6fe336d9f25092ff13e7bca82" alt="ex002"
Suggested modification
East Asian punctuations should be treated in the way same as normal East Asian characters (Chinese ideographs and so on).
FYI: Almost all of East Asian punctuations are listed here:
The text was updated successfully, but these errors were encountered: