Spaces around East Asian punctuations in decorated text should not be required #1076

ikedas · 2017-06-19T08:01:56Z

Problem Description

In East Asian texts in general, word separators (spaces) never be written explicitly. So

> 前の**文字列**の後

should be rendered as

前の文字列の後

(Image)

and in practice this works as expected.

However, if the text fragment to be decorated ends and/or starts with punctuation:

> 前の**前の「文字列」**の後、前の**「文字列」の後**の後、そのあと。
> 
> 前の**「文字列」**の後。

they won't render as expected:

前の**前の「文字列」の後、前の「文字列」**の後、そのあと。

前の**「文字列」**の後。

(Image)

Possible workaround is inserting space before or after punctuations ("␣" means space):

> 前の**前の「文字列」**␣の後、前の␣**「文字列」の後**の後、そのあと。
> 
> 前の␣**「文字列」**␣の後。

but it will generate ugry text with an extra space before or after punctuations:

前の前の「文字列」 の後、前の 「文字列」の後の後、そのあと。

前の 「文字列」 の後。

(Image)

Suggested modification

East Asian punctuations should be treated in the way same as normal East Asian characters (Chinese ideographs and so on).

FYI: Almost all of East Asian punctuations are listed here:

W3C. Requirements for Japanese Text Layout, Appendix A Character Classes.
- Punctuations are categorized in any of cl-01 to cl-08.
- Note that ASCII punctuations (e.g. U+0028 parenthesis) listed above must be read as their fullwidth counterparts (U+FF08).

kivikakk · 2017-06-20T06:52:18Z

👋 Thanks for the report. Please note that the github/markup repository's issues are really just for issues regarding the github-markup gem itself, which doesn't have anything to do with Markdown processing. You'd be better off contacting our support team with these kinds of issues in future, because we have lots of support staff but only a couple busy engineers who monitor this repo.

For this issue specifically, the root cause is in the CommonMark specification, which we adhere to. The section of the specification on emphasis states:

A left-flanking delimiter run is a delimiter run that is (a) not followed by Unicode whitespace, and (b) either not followed by a punctuation character, or preceded by Unicode whitespace or a punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is (a) not preceded by Unicode whitespace, and (b) either not preceded by a punctuation character, or followed by Unicode whitespace or a punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

"punctuation character" is defined as "an ASCII punctuation character or anything in the Unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps", and 「 and 」 are in the Ps and Pe categories respectively.

The problem here is that this definition of punctuation character makes sense in the context of the specification if we assume "Unicode whitespace" is a part of the text used (as with most Latin alphabet-derived languages); we expect to see The cat is called "Nodoka". but not 猫は「のどか」という。, where the latter has no space or punctuation character separating the 「」 from the surrounding text.

Hence, when we add emphasis (e.g. around "Nodoka"), we get: The cat is called **"Nodoka"**. but not 猫は**「のどか」**という。

With the English text, the opening ** satisfies the definition of a "left-flanking delimiter run": it is (a) not followed by Unicode whitespace ("), and (b) preceded by Unicode whitespace. The closing ** satisfies the definition of a "right-flanking delimiter run": it is (a) not preceded by Unicode whitespace ("), and (b) followed by a punctuation character (.).

With the Japanese text, however, the opening ** does not satisfy the definition of a "left-flanking delimiter run": it is (a) not followed by Unicode whitespace (「), but (b) it is followed by a punctuation character, and it is not preceded by Unicode whitespace or a punctuation character (は). Likewise, the closing ** does not satisfy the definition of a "right-flanking delimiter" run: it is (a) not preceded by Unicode whitespace (」), but (b) it is preceded by a punctuation character, and it is not followed by Unicode whitespace or punctuation (と).

In short, this is a deficiency with the CommonMark specification's handling of East Asian text in general, because of the way the specification assumes interaction between punctuation characters and whitespace characters. I'll raise this issue (along with all the above information) in the CommonMark Discussion forum and work toward a solution.

Thanks for your patience and for the report!

kivikakk · 2017-06-20T07:13:29Z

Thread opened here: https://talk.commonmark.org/t/emphasis-and-east-asian-text/2491

ikedas · 2017-06-21T03:16:40Z

@kivikakk thanks. I'll comment on the new thread.

kivikakk · 2018-09-05T02:42:19Z

It's been over a year and we still haven't had movement here; pinging the upstream repo now.

github-actions · 2024-12-11T12:11:13Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ikedas mentioned this issue Jun 25, 2017

Emphasis and East Asian text commonmark/cmark#208

Open

github-actions bot added the Stale label Dec 11, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spaces around East Asian punctuations in decorated text should not be required #1076

Spaces around East Asian punctuations in decorated text should not be required #1076

ikedas commented Jun 19, 2017 •

edited

Loading

kivikakk commented Jun 20, 2017 •

edited

Loading

kivikakk commented Jun 20, 2017

ikedas commented Jun 21, 2017

kivikakk commented Sep 5, 2018

github-actions bot commented Dec 11, 2024

Spaces around East Asian punctuations in decorated text should not be required #1076

Spaces around East Asian punctuations in decorated text should not be required #1076

Comments

ikedas commented Jun 19, 2017 • edited Loading

Problem Description

Suggested modification

kivikakk commented Jun 20, 2017 • edited Loading

kivikakk commented Jun 20, 2017

ikedas commented Jun 21, 2017

kivikakk commented Sep 5, 2018

github-actions bot commented Dec 11, 2024

ikedas commented Jun 19, 2017 •

edited

Loading

kivikakk commented Jun 20, 2017 •

edited

Loading