It's not parsing LaTex syntax correctly, even with plugins #785

lyzy0906 · 2023-10-20T13:30:08Z

Initial checklist

I read the support docs
I read the contributing guide
I agree to follow the code of conduct
I searched issues and couldn’t find anything (or linked relevant results below)

Affected packages and versions

9.0.0

Link to runnable example

No response

Steps to reproduce

#784

Actually, I'm already using these plugins: remarkGfm, remarkMath, rehypeKatex!
But I found the plugins CANNOT identify LaTex syntax with this format: \[ ... \]
It only recognize LaTex with $$ ... $$ format.

So I am trying to parse the \[ ... \] syntax myself, and found the text will be modified by the component, which is the slashes are gone after going through the component!
Please look into my screenshot. The slashes in the middle are kept. Only the start and end ones are gone. Why is this happening?

\[ \int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2} \]

Expected behavior

The slashes should be kept.

Actual behavior

The slashes are gone.

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

wooorm · 2023-10-20T13:36:41Z

You don’t have to open new issues. Closed ones can still be commented on.
Also, questions go to discussions, these aren’t issues. See the support docs.

That’s not how math works with remark-math. See the examples in the docs. The one I mentioned earlier and https://github.com/remarkjs/remark-math#examples.

lyzy0906 · 2023-10-20T13:43:50Z

You don’t have to open new issues. Closed ones can still be commented on. Also, questions go to discussions, these aren’t issues. See the support docs.

That’s not how math works with remark-math. See the examples in the docs. The one I mentioned earlier and https://github.com/remarkjs/remark-math#examples.

Sorry, I thought I cannot comment after closed.
And I think this is a bug, not a question………………

So let's not talk about formula, any regular string, starts with "\[", will be parsed into "[" only. Here my input is: '\[ something ]', and the output is:

What I mean is, the first slash is omitted by the Markdown component.

wooorm · 2023-10-20T13:45:44Z

You probably need to provide more info. Please read the support guide. Don’t post a screenshot. Post code. Post what versions you are using.

You use \[. That is not supported. See the examples. Use dollars. Read the syntax section.

lyzy0906 · 2023-10-20T13:47:39Z

You probably need to provide more info. Please read the support guide. Don’t post a screenshot. Post code. Post what versions you are using.

You use \[. That is not supported. See the examples. Use dollars. Read the syntax section.

OK. I am using v9.0.0. My code is something like:
<ReactMarkdown remarkPlugins={[remarkMath, texPlugin]} rehypePlugins={[rehypeKatex]}> {'\\[ something ]'} </ReactMarkdown>

And I expect it to show \[ something ], but it is showing [ something ].

lyzy0906 · 2023-10-20T13:52:57Z

BTW, I cannot use dollars. Because the LaTex input is generated by ChatGPT automatically, which is in \[ ... \] format. So I am trying to parse it myself. And during this process, I found the slashes are gone in the remark tree:

function customPlugin() { return (tree) => { visit(tree, (node, index) => { console.log(node, index); if ( node.type === 'paragraph' && node.children && node.children.length === 1 && node.children[0].type === 'text' && node.children[0].value.startsWith('\\[') ) { const data = node.data || (node.data = {}); data.hName = 'tex'; data.hProperties = { value: node.children[0].value, }; } }); }; }

The condition node.children[0].value.startsWith('\\[') is not working.

wooorm · 2023-10-20T13:59:00Z

Then ask ChatGPT to solve it.

Dollars are what is used here, not \\[ and such.

Escapes working as escapes is how markdown works. If you put © in HTML, you see ©, not those literal characters. It’s the same here.
And it’s the same as you do with JS: \\ turns into \.

lyzy0906 · 2023-10-20T14:01:21Z

Then ask ChatGPT to solve it.

Dollars are what is used here, not \\[ and such.

Escapes working as escapes is how markdown works. If you put © in HTML, you see ©, not those literal characters. It’s the same here. And it’s the same as you do with JS: \\ turns into \.

?
But the component only omit the first slash. Let's say my input is:
<ReactMarkdown remarkPlugins={[remarkMath, texPlugin]} rehypePlugins={[rehypeKatex]}> {'\\[ \\something ]'} </ReactMarkdown>

Now the output is [ \something ]. The middle slash is there. Only the first slash is gone……

wooorm · 2023-10-20T14:05:31Z

right, that’s how markdown works. Escapes work on punctuation. [ is punctuation, so \[ turns into [. s is not punctuation so \s remains as \s. See https://spec.commonmark.org/0.30/#backslash-escapes

lyzy0906 · 2023-10-20T14:12:08Z

right, that’s how markdown works. Escapes work on punctuation. [ is punctuation, so \[ turns into [. s is not punctuation so \s remains as \s. See https://spec.commonmark.org/0.30/#backslash-escapes

I see. Thank you for your detailed explaination. I was not aware that [ is punctuation.

wooorm · 2023-10-20T14:44:37Z

No problem! Good luck! :)

mandeep511 · 2023-11-09T08:44:10Z

No problem! Good luck! :)

So how do you recommend we solve it? In ChatGPT's web-ui they are somehow parsing [ \ ] properly. So there must be a simple way to take care of all corner cases when compiling latex and react

wooorm · 2023-11-09T08:51:05Z

Regexp? Build your own plugins?

They parsing things does not equal that it is simple.

prashantbhudwal · 2024-02-27T13:00:03Z

For processing chatgpt or openai latex.

export const preprocessLaTeX = (content: string) => {
  // Replace block-level LaTeX delimiters \[ \] with $$ $$

  
  const blockProcessedContent = content.replace(
    /\\\[(.*?)\\\]/gs,
    (_, equation) => `$$${equation}$$`,
  );
  // Replace inline LaTeX delimiters \( \) with $ $
  const inlineProcessedContent = blockProcessedContent.replace(
    /\\\((.*?)\\\)/gs,
    (_, equation) => `$${equation}$`,
  );
  return inlineProcessedContent;
};

shubh675 · 2024-04-10T09:25:06Z

@prashantbhudwal it's good idea

pavloko · 2024-06-05T15:05:08Z

@mandeep511 @shubh675 @prashantbhudwal I'm investigating the same issue and wondering how chatgpt.com is doing it because they seem to be using react-markdown.... and the supplied children have \[ delimiter...

I'm sorry this discussion is obviously not related to react-markdown, but I guess many people will come here looking for this answer.

prashantbhudwal · 2024-06-05T15:29:44Z

@pavloko this is just additional preprocessing to handle latex edge cases. Everything else is done with react-markdown with plugins.

You can give examples of how to format latex and most of the times katex plugin will render it. If it doesn't, this might help.

pavloko · 2024-06-05T15:35:14Z

@prashantbhudwal the formula you supplied has worked - big thank you!

I'm just wondering how it works on their website since from the screenshot, the supplied markdown still contains \[ as delimiters.

prashantbhudwal · 2024-06-05T15:55:38Z

@pavloko They are using this property for the math plugin - [remarkMath, { singleDollarTextMath: false }],

That could change stuff. Please feel free to try.

danny-avila · 2024-08-23T18:05:14Z

Just want to add to the discussion.
[remarkMath, { singleDollarTextMath: false }],

It makes sense for them to do this because enabling it (default behavior, I believe) will cause issues when users don't expect LaTeX at all, i.e.:

I have $50 in my wallet and $100 in the bank.

Their pre-processing may involve detecting LaTeX more robustly for single-line, perhaps converting them to multi-line formatting, I'm not sure, since the setting drops the native in-line rendering, or maybe the settings is triggered on by specific LaTeX identifiers, who knows.

Before @prashantbhudwal suggested his function, I'd implemented my own pre-processing function. Upon revisiting this problem, I was inspired to tackle some edge cases users of LibreChat have experienced with the LaTeX rendering taking some lessons learned from this thread, so I'm here to share back, since the solution and others, including my previous implementation, exhibit those edge cases.

We can set singleDollarTextMath to true but we need to escape several uses of $ as suggested, otherwise, users will see weird rendering for the simple "wallet" statement above, to name one of the edge cases.

I was inspired by the implementation here: lobehub/lobe-ui#168 but it was not complete.

Here's what I came up with after several hours of trial and error:

/**
 * Preprocesses LaTeX content by replacing delimiters and escaping certain characters.
 *
 * @param content The input string containing LaTeX expressions.
 * @returns The processed string with replaced delimiters and escaped characters.
 */
export function preprocessLaTeX(content: string): string {
  // Step 1: Protect code blocks
  const codeBlocks: string[] = [];
  content = content.replace(/(```[\s\S]*?```|`[^`\n]+`)/g, (match, code) => {
    codeBlocks.push(code);
    return `<<CODE_BLOCK_${codeBlocks.length - 1}>>`;
  });

  // Step 2: Protect existing LaTeX expressions
  const latexExpressions: string[] = [];
  content = content.replace(/(\$\$[\s\S]*?\$\$|\\\[[\s\S]*?\\\]|\\\(.*?\\\))/g, (match) => {
    latexExpressions.push(match);
    return `<<LATEX_${latexExpressions.length - 1}>>`;
  });

  // Step 3: Escape dollar signs that are likely currency indicators
  content = content.replace(/\$(?=\d)/g, '\\$');

  // Step 4: Restore LaTeX expressions
  content = content.replace(/<<LATEX_(\d+)>>/g, (_, index) => latexExpressions[parseInt(index)]);

  // Step 5: Restore code blocks
  content = content.replace(/<<CODE_BLOCK_(\d+)>>/g, (_, index) => codeBlocks[parseInt(index)]);

  // Step 6: Apply additional escaping functions
  content = escapeBrackets(content);
  content = escapeMhchem(content);

  return content;
}

I also wrote some tests for this:

  preprocessLaTeX
    ✓ returns the same string if no LaTeX patterns are found (1 ms)
    ✓ escapes dollar signs followed by digits
    ✓ does not escape dollar signs not followed by digits
    ✓ preserves existing LaTeX expressions (1 ms)
    ✓ handles mixed LaTeX and currency
    ✓ converts LaTeX delimiters
    ✓ escapes mhchem commands
    ✓ handles complex mixed content
    ✓ handles empty string
    ✓ preserves code blocks
    ✓ handles multiple currency values in a sentence
    ✓ preserves LaTeX expressions with numbers
    ✓ handles currency values with commas
    ✓ preserves LaTeX expressions with special characters

I'm sure it's not perfect, nor the most optimal, but it handles both those expecting LaTeX to render correctly with most AI model providers (OpenAI, Anthropic, Llama3.1, whose formatting of LaTeX largely depend on their training)

I'm happy to share and collect feedback, and be corrected on this approach, maybe the way OpenAI does with ChatGPT is the better route, but I chose this one as it was more apparent to me.

Example rendering:

pavloko · 2024-08-23T18:49:33Z

@danny-avila This is great. Thank you very much! I'll try to use the shared solution and get back to you.

supuwoerc · 2024-09-13T17:04:41Z

@danny-avila Your drilling spirit helps everyone, thank you very much, I researched for two hours and you saved me

joseph-mccombs · 2024-11-13T17:23:04Z

@danny-avila this algo is amazing. i was having similar issues with some of the other algorithms, but yours took care of every single issue. 🐐

github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Oct 20, 2023

wooorm closed this as completed Oct 20, 2023

wooorm added the 🙋 no/question This does not need any changes label Oct 20, 2023

This comment has been minimized.

Sign in to view

github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Oct 20, 2023

miurla mentioned this issue May 15, 2024

Support LaTeX format miurla/morphic#154

Merged

rudyhuynh mentioned this issue May 22, 2024

Support latex ttdatt/llmchat#5

Merged

canisminor1990 mentioned this issue Jun 24, 2024

[Request] 增加Markdown数学公式渲染条件 lobehub/lobe-chat#1609

Closed

danny-avila mentioned this issue Aug 23, 2024

🧮 feat: Improve LaTeX rendering consistency danny-avila/LibreChat#3763

Merged

8 tasks

KwokKwok mentioned this issue Sep 28, 2024

增加 LaTex 支持 KwokKwok/Silo#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It's not parsing LaTex syntax correctly, even with plugins #785

It's not parsing LaTex syntax correctly, even with plugins #785

lyzy0906 commented Oct 20, 2023 •

edited

Loading

wooorm commented Oct 20, 2023

This comment has been minimized.

lyzy0906 commented Oct 20, 2023 •

edited

Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023 •

edited

Loading

lyzy0906 commented Oct 20, 2023 •

edited

Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023 •

edited

Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023

wooorm commented Oct 20, 2023 •

edited

Loading

mandeep511 commented Nov 9, 2023

wooorm commented Nov 9, 2023

prashantbhudwal commented Feb 27, 2024

shubh675 commented Apr 10, 2024

pavloko commented Jun 5, 2024 •

edited

Loading

prashantbhudwal commented Jun 5, 2024

pavloko commented Jun 5, 2024 •

edited

Loading

prashantbhudwal commented Jun 5, 2024

danny-avila commented Aug 23, 2024

pavloko commented Aug 23, 2024

supuwoerc commented Sep 13, 2024

joseph-mccombs commented Nov 13, 2024

It's not parsing LaTex syntax correctly, even with plugins #785

It's not parsing LaTex syntax correctly, even with plugins #785

Comments

lyzy0906 commented Oct 20, 2023 • edited Loading

Initial checklist

Affected packages and versions

Link to runnable example

Steps to reproduce

Expected behavior

Actual behavior

Runtime

Package manager

OS

Build and bundle tools

wooorm commented Oct 20, 2023

This comment has been minimized.

lyzy0906 commented Oct 20, 2023 • edited Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023 • edited Loading

lyzy0906 commented Oct 20, 2023 • edited Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023 • edited Loading

wooorm commented Oct 20, 2023

lyzy0906 commented Oct 20, 2023

wooorm commented Oct 20, 2023 • edited Loading

mandeep511 commented Nov 9, 2023

wooorm commented Nov 9, 2023

prashantbhudwal commented Feb 27, 2024

shubh675 commented Apr 10, 2024

pavloko commented Jun 5, 2024 • edited Loading

prashantbhudwal commented Jun 5, 2024

pavloko commented Jun 5, 2024 • edited Loading

prashantbhudwal commented Jun 5, 2024

danny-avila commented Aug 23, 2024

pavloko commented Aug 23, 2024

supuwoerc commented Sep 13, 2024

joseph-mccombs commented Nov 13, 2024

lyzy0906 commented Oct 20, 2023 •

edited

Loading

lyzy0906 commented Oct 20, 2023 •

edited

Loading

lyzy0906 commented Oct 20, 2023 •

edited

Loading

lyzy0906 commented Oct 20, 2023 •

edited

Loading

lyzy0906 commented Oct 20, 2023 •

edited

Loading

wooorm commented Oct 20, 2023 •

edited

Loading

pavloko commented Jun 5, 2024 •

edited

Loading

pavloko commented Jun 5, 2024 •

edited

Loading