-
-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's not parsing LaTex syntax correctly, even with plugins #785
Comments
You don’t have to open new issues. Closed ones can still be commented on. That’s not how math works with remark-math. See the examples in the docs. The one I mentioned earlier and https://github.com/remarkjs/remark-math#examples. |
This comment has been minimized.
This comment has been minimized.
Sorry, I thought I cannot comment after closed. So let's not talk about formula, any regular string, starts with "\[", will be parsed into "[" only. Here my input is: '\[ something ]', and the output is: What I mean is, the first slash is omitted by the Markdown component. |
You probably need to provide more info. Please read the support guide. Don’t post a screenshot. Post code. Post what versions you are using. You use |
OK. I am using v9.0.0. My code is something like: And I expect it to show |
BTW, I cannot use dollars. Because the LaTex input is generated by ChatGPT automatically, which is in \[ ... \] format. So I am trying to parse it myself. And during this process, I found the slashes are gone in the remark tree:
The condition |
Then ask ChatGPT to solve it. Dollars are what is used here, not Escapes working as escapes is how markdown works. If you put |
? Now the output is |
right, that’s how markdown works. Escapes work on punctuation. |
I see. Thank you for your detailed explaination. I was not aware that [ is punctuation. |
No problem! Good luck! :) |
So how do you recommend we solve it? In ChatGPT's web-ui they are somehow parsing [ \ ] properly. So there must be a simple way to take care of all corner cases when compiling latex and react |
Regexp? Build your own plugins? They parsing things does not equal that it is simple. |
For processing chatgpt or openai latex. export const preprocessLaTeX = (content: string) => {
// Replace block-level LaTeX delimiters \[ \] with $$ $$
const blockProcessedContent = content.replace(
/\\\[(.*?)\\\]/gs,
(_, equation) => `$$${equation}$$`,
);
// Replace inline LaTeX delimiters \( \) with $ $
const inlineProcessedContent = blockProcessedContent.replace(
/\\\((.*?)\\\)/gs,
(_, equation) => `$${equation}$`,
);
return inlineProcessedContent;
}; |
@prashantbhudwal it's good idea |
@mandeep511 @shubh675 @prashantbhudwal I'm investigating the same issue and wondering how I'm sorry this discussion is obviously not related to react-markdown, but I guess many people will come here looking for this answer. |
@pavloko this is just additional preprocessing to handle latex edge cases. Everything else is done with react-markdown with plugins. You can give examples of how to format latex and most of the times katex plugin will render it. If it doesn't, this might help. |
@prashantbhudwal the formula you supplied has worked - big thank you! I'm just wondering how it works on their website since from the screenshot, the supplied markdown still contains |
@pavloko They are using this property for the math plugin - [remarkMath, { singleDollarTextMath: false }], That could change stuff. Please feel free to try. |
Just want to add to the discussion. It makes sense for them to do this because enabling it (default behavior, I believe) will cause issues when users don't expect LaTeX at all, i.e.:
Their pre-processing may involve detecting LaTeX more robustly for single-line, perhaps converting them to multi-line formatting, I'm not sure, since the setting drops the native in-line rendering, or maybe the settings is triggered on by specific LaTeX identifiers, who knows. Before @prashantbhudwal suggested his function, I'd implemented my own pre-processing function. Upon revisiting this problem, I was inspired to tackle some edge cases users of LibreChat have experienced with the LaTeX rendering taking some lessons learned from this thread, so I'm here to share back, since the solution and others, including my previous implementation, exhibit those edge cases. We can set I was inspired by the implementation here: lobehub/lobe-ui#168 but it was not complete. Here's what I came up with after several hours of trial and error: /**
* Preprocesses LaTeX content by replacing delimiters and escaping certain characters.
*
* @param content The input string containing LaTeX expressions.
* @returns The processed string with replaced delimiters and escaped characters.
*/
export function preprocessLaTeX(content: string): string {
// Step 1: Protect code blocks
const codeBlocks: string[] = [];
content = content.replace(/(```[\s\S]*?```|`[^`\n]+`)/g, (match, code) => {
codeBlocks.push(code);
return `<<CODE_BLOCK_${codeBlocks.length - 1}>>`;
});
// Step 2: Protect existing LaTeX expressions
const latexExpressions: string[] = [];
content = content.replace(/(\$\$[\s\S]*?\$\$|\\\[[\s\S]*?\\\]|\\\(.*?\\\))/g, (match) => {
latexExpressions.push(match);
return `<<LATEX_${latexExpressions.length - 1}>>`;
});
// Step 3: Escape dollar signs that are likely currency indicators
content = content.replace(/\$(?=\d)/g, '\\$');
// Step 4: Restore LaTeX expressions
content = content.replace(/<<LATEX_(\d+)>>/g, (_, index) => latexExpressions[parseInt(index)]);
// Step 5: Restore code blocks
content = content.replace(/<<CODE_BLOCK_(\d+)>>/g, (_, index) => codeBlocks[parseInt(index)]);
// Step 6: Apply additional escaping functions
content = escapeBrackets(content);
content = escapeMhchem(content);
return content;
} I also wrote some tests for this: preprocessLaTeX
✓ returns the same string if no LaTeX patterns are found (1 ms)
✓ escapes dollar signs followed by digits
✓ does not escape dollar signs not followed by digits
✓ preserves existing LaTeX expressions (1 ms)
✓ handles mixed LaTeX and currency
✓ converts LaTeX delimiters
✓ escapes mhchem commands
✓ handles complex mixed content
✓ handles empty string
✓ preserves code blocks
✓ handles multiple currency values in a sentence
✓ preserves LaTeX expressions with numbers
✓ handles currency values with commas
✓ preserves LaTeX expressions with special characters I'm sure it's not perfect, nor the most optimal, but it handles both those expecting LaTeX to render correctly with most AI model providers (OpenAI, Anthropic, Llama3.1, whose formatting of LaTeX largely depend on their training) I'm happy to share and collect feedback, and be corrected on this approach, maybe the way OpenAI does with ChatGPT is the better route, but I chose this one as it was more apparent to me. |
@danny-avila This is great. Thank you very much! I'll try to use the shared solution and get back to you. |
@danny-avila Your drilling spirit helps everyone, thank you very much, I researched for two hours and you saved me |
@danny-avila this algo is amazing. i was having similar issues with some of the other algorithms, but yours took care of every single issue. 🐐 |
Initial checklist
Affected packages and versions
9.0.0
Link to runnable example
No response
Steps to reproduce
#784
Actually, I'm already using these plugins: remarkGfm, remarkMath, rehypeKatex!
But I found the plugins CANNOT identify LaTex syntax with this format: \[ ... \]
It only recognize LaTex with $$ ... $$ format.
So I am trying to parse the \[ ... \] syntax myself, and found the text will be modified by the component, which is the slashes are gone after going through the component!
Please look into my screenshot. The slashes in the middle are kept. Only the start and end ones are gone. Why is this happening?
\[ \int_{0}^{\infty} e^{-x^2} dx = \frac{\sqrt{\pi}}{2} \]
Expected behavior
The slashes should be kept.
Actual behavior
The slashes are gone.
Runtime
No response
Package manager
No response
OS
No response
Build and bundle tools
No response
The text was updated successfully, but these errors were encountered: