Skip to content

Plain CommonMark/GFM is incomaptible with Japanese/Chinese + bold output #185

@tats-u

Description

@tats-u

Bug Description

Plain CommonMark/GFM has a specification bug that is incompatible with Chinese/Japanese Markdown output using bold text:

Image

↑ChatGPT + Japanese response

Humans can take care of ** around punctuation, but It is much more difficult to get LLMs to pay attention to that. It is the best to modify the Markdown specification itself to eliminate that pitfall for LLMs.

See https://github.com/tats-u/markdown-cjk-friendly for the details.

Please include additional remark plugin(s) to deal with this bug or at lease add a note to the documentation.

Steps to Reproduce

Parse the following Markdown content and render it in a GFM-compliant Markdown parser including streamdown:

**この文は太字になりません(This sentence will not be bolded)。**この文のせいで(It is due to this sentence)。

In real productions, this is likely to occur when an LLM generates the output in the following situations:

  • Japanese or Chinese
  • It tries to emphasize phrases that are surrounded by or end with ideographic parenthesis or brackets

Expected Behavior

この文は太字になりません(This sentence will not be bolded)。この文のせいで(It is due to this sentence)。

↑I used a raw HTML tag <strong> here (see below for the reason)

Actual Behavior

**この文は太字になりません(This sentence will not be bolded)。**この文のせいで(It is due to this sentence)。

↑ I pasted the Markdown source as is here. GitHub's Markdown parser fails to parse **, too!

Code Sample

import { Streamdown } from "streamdown";

export default function App() {
  const markdown = "**この文は太字になりません(This code is not bolded)。**この文のせいで(It is due to this sentense)。";

  return <Streamdown>{markdown}</Streamdown>;
}

Streamdown Version

1.4.0

React Version

This is not concerned with React's version.

Node.js Version

This is not concerned with Node's version.

Browser(s)

No response

Operating System

None

Additional Context

These managed to get over this bug thanks to my plugin remark-cjk-friendly.

Specification Demo (Official): https://tats-u.github.io/markdown-cjk-friendly/
Demo (GitLab Flavored Markdown / Comrak): https://gitlab-org.gitlab.io/ruby/gems/gitlab-glfm-markdown/

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions