Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rehype-sanitize breaks math rendering #67

Closed
4 tasks done
maclockard opened this issue Sep 29, 2021 · 6 comments · Fixed by #68
Closed
4 tasks done

rehype-sanitize breaks math rendering #67

maclockard opened this issue Sep 29, 2021 · 6 comments · Fixed by #68
Labels
💪 phase/solved Post is done

Comments

@maclockard
Copy link
Contributor

maclockard commented Sep 29, 2021

Initial checklist

Affected packages and versions

remark-math@5.1.0, rehype-katex@6.0.1, rehype-sanitize@5.0.0

Link to runnable example

https://codesandbox.io/s/react-markdown-debug-forked-3rbkc?file=/src/App.js

Steps to reproduce

The above sandbox link should work! code copied below for posterity.

Code:
import React from "react";
import Markdown from "react-markdown";
import rehypeKatex from "rehype-katex";
import rehypeSanitize from "rehype-sanitize";
import remarkMath from "remark-math";

import "katex/dist/katex.min.css";

const mathMarkdown = `
Lift($L$) can be determined by Lift Coefficient ($C_L$) like the following
equation.

$$
L = \\frac{1}{2} \\rho v^2 S C_L
$$
`;

export default function App() {
  return (
    <>
      <h1>Without sanitize</h1>
      <Markdown rehypePlugins={[rehypeKatex]} remarkPlugins={[remarkMath]}>
        {mathMarkdown}
      </Markdown>
      <h1>With sanitize</h1>
      <Markdown
        rehypePlugins={[rehypeKatex, rehypeSanitize]}
        remarkPlugins={[remarkMath]}
      >
        {mathMarkdown}
      </Markdown>
    </>
  );
}

Result:

Screen Shot 2021-09-29 at 1 36 11 PM

Expected behavior

rehype-katex should still be able to correctly render math after rehype-sanitize runs. According to https://github.com/remarkjs/remark-math#security its advised to add rehype-sanitize if accepting user input. Even if safe without sanitize, I still need rehype-sanitize since I'm also using rehype-raw and rehype-autolink-headings.

Note, this breaks regardless of the order of plugins are added.

Actual behavior

Math does not correctly render.

Runtime

Other (please specify in steps to reproduce)

Package manager

yarn v1

OS

macOS

Build and bundle tools

Webpack

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Sep 29, 2021
@maclockard
Copy link
Contributor Author

After some fiddling, I found a schema for sanitize that seems to allow for katex to work:

const mathSanitizeSchema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    div: [
      ...defaultSchema.attributes.div,
      ["className", "math", "math-display"]
    ],
    math: [["xmlns", "http://www.w3.org/1998/Math/MathML"], "display"],
    annotation: ["encoding"],
    span: ["className", "style"],
    svg: [
      ["xmlns", "http://www.w3.org/2000/svg"],
      "width",
      "height",
      "viewBox",
      "preserveAspectRatio"
    ],
    path: ["d"]
  },
  tagNames: [
    ...defaultSchema.tagNames,
    "math",
    "semantics",
    "annotation",
    "mrow",
    "mi",
    "mo",
    "mfrac",
    "mn",
    "msup",
    "msub",
    "svg",
    "path"
  ]
};

However, I'm unsure of the security implications of allowing some of these tags/attributes. Specifically svg, path, and allowing the attributes className and style on span.

Full sandbox + some "tests": https://codesandbox.io/s/react-markdown-debug-forked-6k0nv?file=/src/App.js:551-1247

Copied code for posterity
import React from "react";
import Markdown from "react-markdown";
import rehypeKatex from "rehype-katex";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import remarkMath from "remark-math";

import "katex/dist/katex.min.css";

const mathMarkdown = `
Lift($L$) can be determined by Lift Coefficient ($C_L$) like the following
equation.

$$
L = \\frac{1}{2} \\rho v^2 S C_L
$$

$$
\\sqrt{a}
$$

$$
f(\\relax{x}) = \\int_{-\\infty}^\\infty
    f(\\hat\\xi)\\,e^{2 \\pi i \\xi x}
    \\,d\\xi
$$
`;

const mathSanitizeSchema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    div: [
      ...defaultSchema.attributes.div,
      ["className", "math", "math-display"]
    ],
    math: [["xmlns", "http://www.w3.org/1998/Math/MathML"], "display"],
    annotation: ["encoding"],
    span: ["className", "style"],
    svg: [
      ["xmlns", "http://www.w3.org/2000/svg"],
      "width",
      "height",
      "viewBox",
      "preserveAspectRatio"
    ],
    path: ["d"]
  },
  tagNames: [
    ...defaultSchema.tagNames,
    "math",
    "semantics",
    "annotation",
    "mrow",
    "mi",
    "mo",
    "mfrac",
    "mn",
    "msup",
    "msub",
    "svg",
    "path"
  ]
};

export default function App() {
  return (
    <>
      <h1>Without sanitize</h1>
      <Markdown rehypePlugins={[rehypeKatex]} remarkPlugins={[remarkMath]}>
        {mathMarkdown}
      </Markdown>
      <h1>With sanitize</h1>
      <Markdown
        rehypePlugins={[
          rehypeRaw,
          rehypeKatex,
          [rehypeSanitize, mathSanitizeSchema]
        ]}
        remarkPlugins={[remarkMath]}
      >
        {mathMarkdown}
      </Markdown>
    </>
  );
}

Are the above changes safe to the sanitize schema to make assuming user input? If so, I might make a PR documenting this under the security section of the README.

I'm also encountering similar needs to tweak the schema for other remark/rehype plugins, so I could also create PRs for those tweaks as well. Although, I wonder if there's something the plugins themselves could do here in terms of modifying the schema automatically

@wooorm
Copy link
Member

wooorm commented Sep 30, 2021

Hey Mac!

This works as intended, as GitHub does not allow SVG/MathML.

However, I'm unsure of the security implications of allowing some of these tags/attributes. Specifically svg, path, and allowing the attributes className and style on span.

Indeed, specifically style and className, are unsafe. I don’t think svg / path are inherently unsafe as we’re allow-listing so everything inside it is stripped.

rehype-sanitize is two things: a) a default schema that matches GitHub, which does not allow SVG/MathML, b) a plugin to sanitize based on any rules, defaulting to GitHub.

I do understand that users want to use plugins, which differ from GitHub, but still be as safe as possible, rather than just turning sanitation off.

We could try and create safe/as-safe-as-possible schemas for plugins such as rehype-katex/rehype-mathjax/rehype-highlight/rehype-autolink-headings. Perhaps share/document them in rehype-sanitize, or maybe even better, in their own § Security sections?

className is the easiest probably, we can allow a list of acceptable classes.
Allowing any class is unsafe: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md#unsanitized-class-attribute.
Same can be done for rehype-highlight, which is class based.

style though, is a tough pickle. It’s hard to parse. What even would be safe and what wouldn’t?

@maclockard
Copy link
Contributor Author

maclockard commented Sep 30, 2021

Wow! I really appreciate the detailed breakdown, this really helped to cement my understanding of safe/unsafe with markdown.

Something I'm wondering though is if I trust both rehype-katex and katex, couldn't I apply the rehype-katex plugin after rehype-sanitize plugin?

So basically just do something like this:

const mathSanitizeSchema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    div: [
      ...defaultSchema.attributes.div,
      ["className", "math", "math-display"],
    ],
    span: [
      ["className", "math", "math-inline"],
    ],
  },
};

...

<Markdown
  rehypePlugins={[
    rehypeRaw,
    [rehypeSanitize, mathSanitizeSchema],
    rehypeKatex,
  ]}
  remarkPlugins={[remarkMath]}
>

Full sandbox: https://codesandbox.io/s/react-markdown-debug-forked-4e4yo?file=/src/App.js

Copied code:
import React from "react";
import Markdown from "react-markdown";
import rehypeKatex from "rehype-katex";
import rehypeRaw from "rehype-raw";
import rehypeSanitize, { defaultSchema } from "rehype-sanitize";
import remarkMath from "remark-math";

import "katex/dist/katex.min.css";

const mathMarkdown = `
Lift($L$) can be determined by Lift Coefficient ($C_L$) like the following
equation.

$$
L = \\frac{1}{2} \\rho v^2 S C_L
$$

$$
\\sqrt{a}
$$

$$
f(\\relax{x}) = \\int_{-\\infty}^\\infty
    f(\\hat\\xi)\\,e^{2 \\pi i \\xi x}
    \\,d\\xi
$$
`;

const mathSanitizeSchema = {
  ...defaultSchema,
  attributes: {
    ...defaultSchema.attributes,
    div: [
      ...defaultSchema.attributes.div,
      ["className", "math", "math-display"],
    ],
    span: [
      ["className", "math", "math-inline"],
    ],
  },
};

export default function App() {
  return (
    <>
      <h1>Without sanitize</h1>
      <Markdown rehypePlugins={[rehypeRaw, rehypeKatex]} remarkPlugins={[remarkMath]}>
        {mathMarkdown}
      </Markdown>
      <h1>With sanitize</h1>
      <Markdown
        rehypePlugins={[
          rehypeRaw,
          [rehypeSanitize, mathSanitizeSchema],
          rehypeKatex,
        ]}
        remarkPlugins={[remarkMath]}
      >
        {mathMarkdown}
      </Markdown>
    </>
  );
}

This way I'm sanitizing any class names / styles the users themselves are passing and only allowing the explicit class names of math, math-display, and math-inline.

I realize if katex is doing something dangerous such as setting a class name based on user input, this approach would be dangerous since I'm not sanitizing the HTML katex outputs. However, if I can verify that katex is not allowing anything that could result in XSS, this should be safe?

@wooorm
Copy link
Member

wooorm commented Oct 1, 2021

Something I'm wondering though is if I trust both rehype-katex and katex, couldn't I apply the rehype-katex plugin after rehype-sanitize plugin?

Yes, definitely!
Another example is the unified website, where arbitrary readmes are rendered. Those readmes are of course unsafe and have to be sanitized. But the “chrome” around it I do trust, and I don’t have to sanitize that.

So indeed, your example is safe (enough)!

@maclockard
Copy link
Contributor Author

Awesome! thanks for you help. I'm going to make a PR adding a little snippet under security mentioning this solution

@wooorm wooorm closed this as completed in #68 Oct 9, 2021
wooorm pushed a commit that referenced this issue Oct 9, 2021
Reviewed-by: Christian Murphy <christian.murphy.42@gmail.com>
Reviewed-by: Titus Wormer <tituswormer@gmail.com>

Closes GH-67.
Closes GH-68.
@github-actions

This comment has been minimized.

@wooorm wooorm added the 💪 phase/solved Post is done label Oct 9, 2021
@github-actions github-actions bot removed the 🤞 phase/open Post is being triaged manually label Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 phase/solved Post is done
Development

Successfully merging a pull request may close this issue.

2 participants