-
Notifications
You must be signed in to change notification settings - Fork 887
Description
When a Markdown file contains an HTML comment with a malformed closer (-- >
instead of -->
), the Python-Markdown parser treats the rest of the document as an unterminated comment. This can result in unbounded memory usage and eventual OOM termination. No exception is raised, the process is simply killed by the OS.
Related downstream issue: mkdocs/mkdocs#4030
Environment
- Python-Markdown: 3.9
- Python: 3.11.2
- OS: Ubuntu 22.04 (also reproduces in GitHub Actions
ubuntu-latest
)
Minimal Reproducer
import markdown
bad = """# Hello
<!-- This comment is malformed and never closes -- >
Some content after the bad comment.
"""
html = markdown.markdown(bad)
print(html)
Expected:
- Either emit valid HTML with the comment treated as literal text, or raise a parse error.
Actual:
- The parser goes into runaway behavior: memory usage grows rapidly until the process is killed by the OOM killer. No Python traceback is shown.
Why this happens (likely)
The HTML block parser looks for a -->
terminator. If it encounters -- >
(space before >
), it fails to close the comment and consumes the rest of the document as one giant comment block. This produces extremely large buffers and regex backtracking.
Impact
- A single malformed
-- >
can take down CI pipelines or documentation builds. - From a security perspective, it’s effectively a denial-of-service vector.
Workarounds
-
Find/replace malformed closers in Markdown sources:
grep -RIn "\-\- >" docs && echo "Bad comment found"
-
Replace with valid
-->
.
Suggested fix ideas
- Fail closed: if no proper
-->
terminator is found, treat the sequence as literal text instead of an open comment. - Optionally normalize
--\s*>
as a valid closing delimiter (tolerant parser). - Impose a max comment block length; if exceeded, abort parsing as a safety guard.