Skip to content

Conversation

@kevinports
Copy link
Contributor

@kevinports kevinports commented Nov 7, 2025

PR App

Important

I have a bandaid provisional fix for this issue up here https://github.com/readmeio/readme/pull/16356 to relieve the pressure on getting this figured out.

This PR attempts to address the @readme/markdown portion of https://linear.app/readme-io/issue/CX-2524/legacy-customer-sumsub-has-pages-with-large-html-blocks-not-loading where a few legacy projects with very large html blocks aren't loading.

This is a regression from https://github.com/readmeio/readme/pull/16342
In the readme app, we strip html comments from a document's markdown body. We do this on the server before the markdown is used in the rdmd render. See this diff

The function we use to strip html comments is a plain remarkParse pipeline that handles either markdown or mdx content:

async function stripComments (doc: string, { mdx }: Opts = {}): Promise<string> {
const processor = unified()
.use(remarkParse)
.use(mdx ? remarkMdx : undefined)
.use(stripCommentsTransformer)
.use(remarkStringify);
const file = await processor.process(doc);
return String(file);
}

I think there's something going on with the pipeline modifying the magic block syntax, but we've only seen it on html blocks with tons of code within them, so it's hard to pinpoint. I think the best solution is to just skip magic blocks altogether in the parsing.

I made an attempt at doing this in the legacy package but couldn't find a clear path. Here's the PR with my attempt at that. We don't want to actually transform the magic blocks into HTML -- we want to alter the surrounding markdown and return it with the magic blocks untouched. Not sure how to do that with an ast without a lot of complexity! But maybe I'm missing something obvious.

So the best I could do is update the stripComments function in @readme/markdown to skip magic block syntax. It was much easier to just use regex to extract the blocks before the remark pipeline runs and then restore them afterwards.

🧰 Changes

  • Modify stripComments to use a new extractMagicBlocks util that swaps magic blocks with placeholders before parsing the markdown. After the markdown is parsed the placeholders are restored with the original magic block text.

🧬 QA & Testing

Do the tests pass? I did link a markdown build from this branch to my readme repo locally and it fixed the issue on clones of the projects in question.

Comment on lines +13 to +21
const replaced = markdown.replace(MAGIC_BLOCK_REGEX, (match) => {
// Use backticks so it becomes a code span, preventing remarkParse from
// parsing special characters in the token as markdown syntax
const token = `\`__MAGIC_BLOCK_${index}__\``;

blocks.push({ token, raw: match });
index += 1;
return token;
});

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings starting with '[block:' and with many repetitions of '[block:\'.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to fix this here: 81c5885

@kevinports kevinports marked this pull request as ready for review November 7, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants