-
Notifications
You must be signed in to change notification settings - Fork 200
Description
Bug Description
Plain CommonMark/GFM has a specification bug that is incompatible with Chinese/Japanese Markdown output using bold text:
↑ChatGPT + Japanese response
Humans can take care of ** around punctuation, but It is much more difficult to get LLMs to pay attention to that. It is the best to modify the Markdown specification itself to eliminate that pitfall for LLMs.
See https://github.com/tats-u/markdown-cjk-friendly for the details.
Please include additional remark plugin(s) to deal with this bug or at lease add a note to the documentation.
Steps to Reproduce
Parse the following Markdown content and render it in a GFM-compliant Markdown parser including streamdown:
**この文は太字になりません(This sentence will not be bolded)。**この文のせいで(It is due to this sentence)。In real productions, this is likely to occur when an LLM generates the output in the following situations:
- Japanese or Chinese
- It tries to emphasize phrases that are surrounded by or end with ideographic parenthesis or brackets
Expected Behavior
この文は太字になりません(This sentence will not be bolded)。この文のせいで(It is due to this sentence)。
↑I used a raw HTML tag <strong> here (see below for the reason)
Actual Behavior
**この文は太字になりません(This sentence will not be bolded)。**この文のせいで(It is due to this sentence)。
↑ I pasted the Markdown source as is here. GitHub's Markdown parser fails to parse **, too!
Code Sample
import { Streamdown } from "streamdown";
export default function App() {
const markdown = "**この文は太字になりません(This code is not bolded)。**この文のせいで(It is due to this sentense)。";
return <Streamdown>{markdown}</Streamdown>;
}Streamdown Version
1.4.0
React Version
This is not concerned with React's version.
Node.js Version
This is not concerned with Node's version.
Browser(s)
No response
Operating System
None
Additional Context
- https://github.com/lobehub/lobe-chat
- https://github.com/lobehub/lobe-ui
- https://github.com/CherryHQ/cherry-studio
These managed to get over this bug thanks to my plugin remark-cjk-friendly.
Specification Demo (Official): https://tats-u.github.io/markdown-cjk-friendly/
Demo (GitLab Flavored Markdown / Comrak): https://gitlab-org.gitlab.io/ruby/gems/gitlab-glfm-markdown/