Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 11, 2025

📄 50% (0.50x) speedup for maybe_wrap_in_iframe in marimo/_output/formatters/iframe.py

⏱️ Runtime : 16.4 milliseconds 10.9 milliseconds (best of 158 runs)

📝 Explanation and details

The optimization introduces a fast regex pre-filter to avoid expensive HTML parsing in the common case where HTML doesn't contain inline scripts.

What was optimized:

  • Added _SCRIPT_INLINE_RE regex pattern that matches <script> tags without src attributes
  • Inserted regex check as an early exit before instantiating ScriptTagParser

Key performance improvement:
The original code always instantiated ScriptTagParser() and parsed the entire HTML whenever "<script" was found, even if all script tags had src attributes. The new regex (r"<script\b(?![^>]*\bsrc\s*=)") quickly identifies if any <script> tag lacks a src attribute without full HTML parsing.

Why this leads to speedup:

  • Fast path for common cases: When HTML contains only external scripts (with src), the regex immediately returns False, avoiding parser instantiation and HTML parsing entirely
  • Regex vs Parser: A single regex scan is ~10x faster than HTMLParser instantiation + feed() for typical HTML sizes
  • Preserved accuracy: The regex is a conservative filter - if it matches, the original parser still validates the result

Performance characteristics from tests:

  • Massive gains (9-19x faster) for HTML with many script tags that have src attributes
  • Minimal overhead (0-17% slower) for HTML that actually needs iframe wrapping
  • Best for: Applications processing HTML with external scripts, web scraping, or HTML sanitization pipelines

The optimization maintains identical behavior while dramatically improving performance for the common case of externally-sourced scripts.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 8 Passed
🌀 Generated Regression Tests 76 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
_output/formatters/test_iframe.py::test_maybe_wrap_in_iframe_no_script 519ns 506ns 2.57%✅
_output/formatters/test_iframe.py::test_maybe_wrap_in_iframe_with_inline_script 30.5μs 32.5μs -6.36%⚠️
_output/formatters/test_iframe.py::test_maybe_wrap_in_iframe_with_script_src 27.5μs 2.57μs 970%✅
🌀 Generated Regression Tests and Runtime

import html

Simulate the ScriptTagParser used by maybe_wrap_in_iframe

from html.parser import HTMLParser

imports

import pytest
from marimo._output.formatters.iframe import maybe_wrap_in_iframe

Minimal flatten_string

def flatten_string(text: str) -> str:
return "".join([line.strip() for line in text.split("\n")])

Minimal src_or_src_doc

def src_or_src_doc(html_content: str) -> dict[str, str]:
# Always use srcdoc for test
return {"srcdoc": html.escape(html_content)}

Minimal h.iframe builder

class h:
@staticmethod
def iframe(
*,
src: str = None,
srcdoc: str = None,
width: str = None,
height: str = None,
style: str = None,
onload: str = None,
frameborder: str = "0",
**kwargs
):
params = []
if src is not None:
params.append(f'src="{src}"')
if srcdoc is not None:
params.append(f'srcdoc="{srcdoc}"')
if width is not None:
params.append(f'width="{width}"')
if height is not None:
params.append(f'height="{height}"')
if style is not None:
params.append(f'style="{style}"')
if onload is not None:
params.append(f'onload="{onload}"')
if frameborder is not None:
params.append(f'frameborder="{frameborder}"')
for k, v in kwargs.items():
params.append(f'{k}="{v}"')
return f"<iframe {' '.join(params)}></iframe>"
from marimo._output.formatters.iframe import maybe_wrap_in_iframe

--- Begin test suite ---

Helper to get expected iframe output for a given html_content

def expected_iframe(html_content: str) -> str:
# This must match the iframe() function above
return flatten_string(
h.iframe(
**src_or_src_doc(html_content),
onload="__resizeIframe(this)",
width="100%",
height="400px",
)
)

---------------------- BASIC TEST CASES ----------------------

def test_returns_input_without_script():
# No script tag: should return input as-is
html_content = "

Hello
"
codeflash_output = maybe_wrap_in_iframe(html_content) # 680ns -> 711ns (4.36% slower)

def test_returns_input_with_script_with_src():
# Script tag with src: should return input as-is
html_content = '<script src="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 27.7μs -> 2.75μs (909% faster)

def test_wraps_script_without_src():
# Script tag without src: should wrap in iframe
html_content = "<script>alert('hi')</script>"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 30.8μs -> 35.3μs (12.6% slower)

def test_wraps_script_without_src_with_attributes():
# Script tag with other attributes, but not src
html_content = '<script type="text/javascript">console.log(1)</script>'
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 31.3μs -> 34.8μs (10.1% slower)

def test_wraps_multiple_script_tags_one_without_src():
# Multiple script tags, at least one without src
html_content = (
'<script src="foo.js"></script>'
'<script>console.log("x")</script>'
'

test
'
)
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 45.5μs -> 50.2μs (9.41% slower)

def test_does_not_wrap_multiple_scripts_all_with_src():
# Multiple script tags, all with src
html_content = (
'<script src="foo.js"></script>'
'<script src="bar.js"></script>'
'

test
'
)
codeflash_output = maybe_wrap_in_iframe(html_content) # 41.4μs -> 2.85μs (1351% faster)

def test_wraps_script_tag_with_uppercase():
# Script tag in uppercase
html_content = "<SCRIPT>alert('hi')</SCRIPT>"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 573ns -> 547ns (4.75% faster)

def test_wraps_script_tag_with_mixed_case():
# Script tag in mixed case
html_content = "<ScRiPt>alert('hi')</ScRiPt>"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 629ns -> 564ns (11.5% faster)

def test_ignores_script_tag_in_comment():
# Script tag inside HTML comment should not trigger wrapping
html_content = ""
codeflash_output = maybe_wrap_in_iframe(html_content) # 11.0μs -> 13.0μs (15.4% slower)

def test_ignores_script_tag_in_string():
# Script tag inside a string (not actual tag)
html_content = "

Here is '<script>' in text
"
codeflash_output = maybe_wrap_in_iframe(html_content) # 42.4μs -> 44.8μs (5.44% slower)

---------------------- EDGE TEST CASES ----------------------

def test_empty_string():
# Empty string input
html_content = ""
codeflash_output = maybe_wrap_in_iframe(html_content) # 481ns -> 496ns (3.02% slower)

def test_script_tag_with_src_and_other_attrs():
# Script tag with src and other attributes
html_content = '<script src="foo.js" type="text/javascript"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 30.4μs -> 2.87μs (959% faster)

def test_script_tag_with_src_and_whitespace():
# Script tag with src and whitespace
html_content = '<script src="foo.js" ></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 25.4μs -> 2.49μs (919% faster)

def test_script_tag_with_src_and_empty_src():
# Script tag with empty src: should be considered as having src
html_content = '<script src=""></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 23.6μs -> 2.10μs (1028% faster)

def test_script_tag_with_src_in_different_order():
# Script tag with src not first attribute
html_content = '<script type="text/javascript" src="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 27.0μs -> 2.33μs (1060% faster)

def test_script_tag_with_src_in_single_quotes():
# Script tag with src in single quotes
html_content = "<script src='foo.js'></script>"
codeflash_output = maybe_wrap_in_iframe(html_content) # 23.2μs -> 2.26μs (927% faster)

def test_script_tag_with_false_src_value():
# Script tag with src="false" is still considered as having src
html_content = '<script src="false"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 23.6μs -> 2.19μs (981% faster)

def test_script_tag_with_src_in_uppercase():
# Script tag with SRC attribute in uppercase
html_content = '<script SRC="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 23.7μs -> 2.23μs (965% faster)

def test_script_tag_with_src_and_script_without_src():
# One script with src, one without
html_content = (
'<script src="foo.js"></script>'
'<script>console.log("no src")</script>'
)
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 47.8μs -> 57.6μs (17.0% slower)

def test_script_tag_with_src_incomplete_tag():
# Malformed script tag (missing closing >)
html_content = '<script src="foo.js"'
# Should not raise, should return input as-is
codeflash_output = maybe_wrap_in_iframe(html_content) # 8.52μs -> 2.18μs (292% faster)

def test_script_tag_with_broken_html():
# Broken HTML with script tag
html_content = "

<script>alert('x')
"
# Should not raise, should wrap due to <script> without src
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 33.4μs -> 36.8μs (9.10% slower)

def test_script_tag_with_nested_script_tag():
# Nested script tags (invalid HTML but possible)
html_content = "<script>var a = '<script>';</script>"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 26.8μs -> 28.5μs (6.18% slower)

def test_script_tag_with_src_and_nested_script_without_src():
# Outer script with src, inner script without src in string
html_content = '<script src="foo.js">var a = "<script>bad</script>";</script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 28.9μs -> 32.3μs (10.3% slower)

def test_script_tag_with_src_attribute_but_value_is_src():
# Attribute named 'src' but value is 'src'
html_content = '<script src="src"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 24.2μs -> 2.35μs (929% faster)

def test_script_tag_with_nonstandard_attribute():
# Script tag with nonstandard attribute, no src
html_content = '<script custom="value">alert(1)</script>'
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 32.8μs -> 37.2μs (11.7% slower)

def test_script_tag_with_src_and_custom_attribute():
# Script tag with src and custom attribute
html_content = '<script src="foo.js" custom="value"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 27.1μs -> 2.60μs (941% faster)

def test_script_tag_with_src_and_no_content():
# Script tag with src, no content
html_content = '<script src="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 24.1μs -> 2.30μs (946% faster)

def test_script_tag_with_src_and_content():
# Script tag with src and content
html_content = '<script src="foo.js">console.log("should ignore content")</script>'
codeflash_output = maybe_wrap_in_iframe(html_content) # 24.8μs -> 2.29μs (981% faster)

def test_script_tag_with_src_and_malformed_attribute():
# Script tag with malformed src attribute (no value)
html_content = '<script src></script>'
# Should be considered as having src
codeflash_output = maybe_wrap_in_iframe(html_content) # 21.8μs -> 28.9μs (24.7% slower)

def test_html_with_multiple_script_variants():
# HTML with a mix of script tags, some with src, some without
html_content = (
'<script src="foo.js"></script>'
'<script>console.log("no src")</script>'
'<script src="bar.js"></script>'
'<script type="text/javascript">alert(1)</script>'
)
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 48.6μs -> 52.0μs (6.56% slower)

---------------------- LARGE SCALE TEST CASES ----------------------

def test_large_html_without_script():
# Large HTML with no script tags
html_content = "

" + ("Hello" * 200) + "
"
codeflash_output = maybe_wrap_in_iframe(html_content) # 695ns -> 616ns (12.8% faster)

def test_large_html_with_many_script_tags_all_with_src():
# Large HTML with many script tags, all with src
html_content = "".join(
f'<script src="file{i}.js"></script>' for i in range(500)
)
html_content += "

End
"
codeflash_output = maybe_wrap_in_iframe(html_content) # 2.82ms -> 147μs (1807% faster)

def test_large_html_with_many_script_tags_some_without_src():
# Large HTML with many script tags, some without src
html_content = "".join(
f'<script src="file{i}.js"></script>' if i % 2 == 0
else '<script>console.log("no src")</script>'
for i in range(500)
)
html_content += "

End
"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 120μs -> 127μs (5.20% slower)

def test_large_html_with_script_tag_at_end():
# Large HTML, script tag without src at the end
html_content = "

" + ("Hello" * 200) + "
" + "<script>alert(1)</script>"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 42.1μs -> 44.4μs (5.13% slower)

def test_large_html_with_script_tag_at_start():
# Large HTML, script tag without src at the start
html_content = "<script>alert(1)</script>" + "

" + ("Hello" * 200) + "
"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 28.9μs -> 30.6μs (5.67% slower)

def test_large_html_with_script_tag_in_middle():
# Large HTML, script tag without src in the middle
html_content = "

" + ("Hello" * 100) + "
" + "<script>alert(1)</script>" + "
" + ("World" * 100) + "
"
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 39.9μs -> 42.1μs (5.37% slower)

def test_large_html_with_many_non_script_tags():
# Large HTML with many non-script tags
html_content = "".join(f"

{i}
" for i in range(1000))
codeflash_output = maybe_wrap_in_iframe(html_content) # 3.07μs -> 3.07μs (0.098% slower)

def test_large_html_with_malformed_script_tags():
# Large HTML with many malformed script tags
html_content = "".join(
"<script>" if i % 2 == 0 else "</script>"
for i in range(1000)
)
# Should wrap because at least one <script> without src
expected = expected_iframe(html_content)
codeflash_output = maybe_wrap_in_iframe(html_content) # 71.9μs -> 74.4μs (3.27% slower)

def test_large_html_with_script_tags_in_comments():
# Large HTML with script tags inside comments
html_content = "".join(
""
for _ in range(1000)
)
# Should not wrap, as script tags are inside comments
codeflash_output = maybe_wrap_in_iframe(html_content) # 1.20ms -> 1.18ms (1.72% faster)

def test_large_html_with_script_tags_in_strings():
# Large HTML with script tags as text, not tags
html_content = "".join(
'

"<script>alert(1)</script>"
'
for _ in range(1000)
)
# Should not wrap, as script tags are not real tags
codeflash_output = maybe_wrap_in_iframe(html_content) # 240μs -> 244μs (1.45% slower)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import re
from html.parser import HTMLParser

imports

import pytest
from marimo._output.formatters.iframe import maybe_wrap_in_iframe

--- Helper function for tests ---

def is_iframe_wrapped(html: str, expected_content: str = None) -> bool:
"""
Returns True if html is an iframe containing expected_content (if provided).
"""
# Remove whitespace for comparison
html = re.sub(r"\s+", "", html)
if not html.startswith("<iframe"):
return False
if expected_content is not None:
# Escape expected_content as srcdoc does
import html as html_mod
expected_escaped = html_mod.escape(expected_content)
return expected_escaped in html
return True

--- Unit Tests ---

1. Basic Test Cases

def test_basic_no_script_tag_returns_input():
# No script tag: should return input unchanged
html = "

Hello world
"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 524ns -> 544ns (3.68% slower)

def test_basic_script_tag_with_src_does_not_wrap():
# Script tag with src: should not wrap
html = '

Hi
<script src="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 36.8μs -> 2.82μs (1203% faster)

def test_basic_script_tag_without_src_wraps():
# Script tag without src: should wrap
html = "

Hi
<script>alert('hi');</script>"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 40.5μs -> 46.7μs (13.2% slower)

def test_basic_multiple_script_tags_only_one_without_src():
# Only one script tag without src: should wrap
html = '

Hi
<script src="a.js"></script><script>foo()</script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 54.2μs -> 57.0μs (4.86% slower)

def test_basic_multiple_script_tags_all_with_src():
# All script tags with src: should not wrap
html = '<script src="a.js"></script><script src="b.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 34.3μs -> 2.75μs (1145% faster)

def test_basic_script_tag_with_src_and_other_attrs():
# Script tag with src and other attributes: should not wrap
html = '<script src="foo.js" type="text/javascript"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 25.7μs -> 2.72μs (846% faster)

def test_basic_script_tag_with_uppercase_tag():
# Script tag in uppercase: should wrap
html = "<SCRIPT>alert('hi');</SCRIPT>"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 518ns -> 524ns (1.15% slower)

def test_basic_script_tag_with_spaces_in_tag():
# Script tag with spaces: should wrap
html = "<script >alert(1)</script>"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 32.7μs -> 35.9μs (8.88% slower)

2. Edge Test Cases

def test_edge_empty_string():
# Empty string: should return unchanged
html = ""
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 504ns -> 492ns (2.44% faster)

def test_edge_script_tag_broken_html():
# Malformed HTML: should not raise, should not wrap
html = "

<script>alert('hi')
"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 36.7μs -> 38.5μs (4.51% slower)

def test_edge_script_tag_with_src_in_caps():
# Script tag with SRC in uppercase: should not wrap
html = '<script SRC="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 25.8μs -> 2.35μs (999% faster)

def test_edge_script_tag_with_src_and_other_attrs_unordered():
# Script tag with src not first: should not wrap
html = '<script type="text/javascript" src="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 27.6μs -> 2.39μs (1056% faster)

def test_edge_script_tag_with_src_and_other_attrs_mixed_case():
# Script tag with Src in mixed case: should not wrap
html = '<script sRc="foo.js"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 23.6μs -> 2.32μs (920% faster)

def test_edge_script_tag_with_src_and_other_attrs_weird_spacing():
# Script tag with weird spacing: should not wrap
html = '<script src = "foo.js" ></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 24.1μs -> 2.50μs (862% faster)

def test_edge_script_tag_with_comment_inside():
# Script tag with comment inside: should wrap
html = "<script></script>"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 31.9μs -> 37.2μs (14.2% slower)

def test_edge_script_tag_with_attributes_but_no_src():
# Script tag with attributes but no src: should wrap
html = '<script type="text/javascript"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 33.7μs -> 37.5μs (10.2% slower)

def test_edge_script_tag_with_src_and_another_script_without_src():
# Mixed: one with src, one without
html = '<script src="a.js"></script><script>alert(1)</script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 46.8μs -> 51.0μs (8.28% slower)

def test_edge_script_tag_self_closing():
# Self-closing script tag (should not wrap, as it's not valid but should be ignored)
html = '<script src="foo.js"/>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 17.7μs -> 2.22μs (697% faster)

def test_edge_script_tag_with_src_and_empty_src():
# Script tag with empty src attribute: should treat as with src and not wrap
html = '<script src=""></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 23.9μs -> 2.06μs (1062% faster)

def test_edge_script_tag_with_src_and_junk_attributes():
# Script tag with junk attributes and src: should not wrap
html = '<script foo="bar" src="baz.js" bar="baz"></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 28.3μs -> 2.53μs (1018% faster)

def test_edge_script_tag_with_src_and_srcdoc_in_content():
# Script tag with srcdoc in content (should not affect logic)
html = '<script src="foo.js"></script>

'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 32.9μs -> 2.38μs (1284% faster)

def test_edge_script_tag_with_nested_script_tag():
# Nested script tag (not valid HTML, but parser should find the first)
html = '<script><script>alert(1)</script></script>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 31.6μs -> 36.5μs (13.6% slower)

def test_edge_script_tag_with_no_closing_tag():
# Script tag without closing tag (should not wrap, parser won't find end)
html = '

<script>alert(1)
'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 33.2μs -> 36.1μs (8.06% slower)

def test_edge_script_tag_with_script_in_attribute_value():
# "script" in an attribute value, not a tag: should not wrap
html = '

'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 530ns -> 589ns (10.0% slower)

def test_edge_script_tag_with_script_in_text():
# "script" in text, not a tag: should not wrap
html = '

this is a script test
'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 556ns -> 590ns (5.76% slower)

def test_edge_script_tag_with_script_tag_in_comment():
# Script tag in a comment: should not wrap
html = ''
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 10.7μs -> 12.7μs (16.1% slower)

def test_edge_script_tag_with_script_tag_in_cdata():
# Script tag in CDATA section: should not wrap
html = 'alert(1)]]>'
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 13.5μs -> 14.8μs (8.45% slower)

3. Large Scale Test Cases

def test_large_html_no_script_tags():
# Large HTML with no script tags: should not wrap
html = "

" + ("hello" * 1000) + "
"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 976ns -> 973ns (0.308% faster)

def test_large_html_many_script_tags_with_src():
# Large HTML with many script tags, all with src: should not wrap
html = "".join(f'<script src="f{i}.js"></script>' for i in range(500))
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 2.81ms -> 136μs (1960% faster)

def test_large_html_many_script_tags_one_without_src():
# Large HTML with many script tags, one without src: should wrap
html = (
"".join(f'<script src="f{i}.js"></script>' for i in range(499))
+ "<script>console.log('hi')</script>"
)
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 2.89ms -> 3.07ms (5.97% slower)

def test_large_html_script_tag_at_start():
# Large HTML with script tag without src at the start: should wrap
html = "<script>alert(1)</script>" + ("abc" * 800)
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 34.9μs -> 36.7μs (4.92% slower)

def test_large_html_script_tag_at_end():
# Large HTML with script tag without src at the end: should wrap
html = ("abc" * 800) + "<script>alert(1)</script>"
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 36.6μs -> 38.9μs (5.71% slower)

def test_large_html_script_tag_in_middle():
# Large HTML with script tag without src in the middle: should wrap
html = ("abc" * 400) + "<script>alert(1)</script>" + ("def" * 400)
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 35.2μs -> 37.2μs (5.43% slower)

def test_large_html_script_tags_with_and_without_src():
# Large HTML with many script tags, some with src, some without
html = (
"".join(f'<script src="f{i}.js"></script>' for i in range(250))
+ "<script>alert(1)</script>"
+ "".join(f'<script src="g{i}.js"></script>' for i in range(250))
)
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 1.52ms -> 1.60ms (4.89% slower)

def test_large_html_many_script_tags_all_without_src():
# Large HTML with many script tags, none with src: should wrap
html = "".join(f'<script>console.log({i})</script>' for i in range(500))
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 84.3μs -> 86.2μs (2.18% slower)

def test_large_html_with_script_tag_with_src_and_empty_script_tag():
# Large HTML with script tag with src and an empty script tag without src: should wrap
html = (
"".join(f'<script src="f{i}.js"></script>' for i in range(499))
+ "<script></script>"
)
codeflash_output = maybe_wrap_in_iframe(html); result = codeflash_output # 2.88ms -> 3.02ms (4.76% slower)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
from marimo._output.formatters.iframe import maybe_wrap_in_iframe

def test_maybe_wrap_in_iframe():
maybe_wrap_in_iframe('<script>')

def test_maybe_wrap_in_iframe_2():
maybe_wrap_in_iframe('')

🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_bps3n5s8/tmpmm5jpnrr/test_concolic_coverage.py::test_maybe_wrap_in_iframe 29.5μs 33.7μs -12.5%⚠️
codeflash_concolic_bps3n5s8/tmpmm5jpnrr/test_concolic_coverage.py::test_maybe_wrap_in_iframe_2 500ns 500ns 0.000%✅

To edit these changes git checkout codeflash/optimize-maybe_wrap_in_iframe-mhv7vjl6 and push.

Codeflash Static Badge

The optimization introduces a **fast regex pre-filter** to avoid expensive HTML parsing in the common case where HTML doesn't contain inline scripts.

**What was optimized:**
- Added `_SCRIPT_INLINE_RE` regex pattern that matches `<script>` tags without `src` attributes
- Inserted regex check as an early exit before instantiating `ScriptTagParser`

**Key performance improvement:**
The original code always instantiated `ScriptTagParser()` and parsed the entire HTML whenever `"<script"` was found, even if all script tags had `src` attributes. The new regex (`r"<script\b(?![^>]*\bsrc\s*=)"`) quickly identifies if any `<script>` tag lacks a `src` attribute without full HTML parsing.

**Why this leads to speedup:**
- **Fast path for common cases**: When HTML contains only external scripts (with `src`), the regex immediately returns `False`, avoiding parser instantiation and HTML parsing entirely
- **Regex vs Parser**: A single regex scan is ~10x faster than HTMLParser instantiation + feed() for typical HTML sizes
- **Preserved accuracy**: The regex is a conservative filter - if it matches, the original parser still validates the result

**Performance characteristics from tests:**
- **Massive gains (9-19x faster)** for HTML with many script tags that have `src` attributes
- **Minimal overhead (0-17% slower)** for HTML that actually needs iframe wrapping
- **Best for**: Applications processing HTML with external scripts, web scraping, or HTML sanitization pipelines

The optimization maintains identical behavior while dramatically improving performance for the common case of externally-sourced scripts.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 11, 2025 23:43
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant