Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 12, 2025

📄 24% (0.24x) speedup for build_stub_fn in marimo/_ast/pytest.py

⏱️ Runtime : 19.9 milliseconds 16.0 milliseconds (best of 112 runs)

📝 Explanation and details

The optimization introduces caching for the AST parsing operation that was being repeated on every function call. The key change is replacing the direct call to ast_parse(inspect.getsource(_pytest_scaffold)) with a cached version using @functools.lru_cache(maxsize=1).

What was optimized:

  • Added _cached_pytest_base() function with LRU cache to store the parsed AST of the scaffold function
  • The expensive ast_parse(inspect.getsource(_pytest_scaffold)) operation now only runs once and is cached for subsequent calls

Why this leads to speedup:
The line profiler shows the most significant improvement in build_stub_fn where PYTEST_BASE = ast_parse(inspect.getsource(_pytest_scaffold)) dropped from 31.4% of total time (19.9ms) to 16.5% of total time (8.5ms). This represents the core optimization - avoiding redundant parsing of the same source code on every invocation.

The _pytest_scaffold function is static and never changes, making it an ideal candidate for caching. After the first call, subsequent invocations skip the expensive source code parsing and AST construction entirely.

Impact on workloads:
The 24% speedup is most beneficial for scenarios that call build_stub_fn repeatedly, as evidenced by the test results showing consistent ~50-100% improvements across individual calls. The optimization is particularly effective for:

  • Basic cases with few arguments (87-114% faster)
  • Large-scale cases with many arguments still see meaningful gains (6-32% faster)
  • The performance benefit scales with call frequency since caching eliminates redundant work

The remaining time is still dominated by copy.deepcopy(PYTEST_BASE) (30.2% of time) and eval(compile(...)) (27% of time), which are necessary for the dynamic AST manipulation and cannot be easily optimized without changing the fundamental approach.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 36 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime

import ast
import inspect
from typing import Any, Callable

imports

import pytest # used for our unit tests
from marimo._ast.pytest import build_stub_fn

unit tests

Helper to create ast.FunctionDef nodes for testing

def make_funcdef(name, args, lineno=1, col_offset=0):
return ast.FunctionDef(
name=name,
args=ast.arguments(
posonlyargs=[],
args=[ast.arg(arg, lineno=lineno, col_offset=col_offset) for arg in args],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=lineno,
col_offset=col_offset,
)

Basic Test Cases

def test_single_argument_basic():
"""Test stub with a single argument."""
def cell(stub): return stub + 1
funcdef = make_funcdef("myfunc", ["stub"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 215μs -> 115μs (87.4% faster)

def test_multiple_arguments_basic():
"""Test stub with multiple arguments."""
def cell(a, b): return a * b
funcdef = make_funcdef("myfunc", ["a", "b"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 210μs -> 106μs (98.1% faster)

def test_no_arguments_basic():
"""Test stub with no arguments."""
def cell(): return 42
funcdef = make_funcdef("myfunc", [])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 199μs -> 93.0μs (114% faster)

def test_allowed_subset_basic():
"""Test stub with allowed subset of arguments."""
def cell(a, b, c): return a + b
funcdef = make_funcdef("myfunc", ["a", "b", "c"])
codeflash_output = build_stub_fn(funcdef, basis=cell, allowed=["a", "b"]); stub_fn = codeflash_output # 205μs -> 102μs (100% faster)

def test_argument_order_basic():
"""Test stub preserves argument order."""
def cell(x, y): return x - y
funcdef = make_funcdef("myfunc", ["x", "y"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 207μs -> 101μs (104% faster)

Edge Test Cases

def test_empty_allowed_edge():
"""Test stub with empty allowed list (no arguments)."""
def cell(): return "empty"
funcdef = make_funcdef("myfunc", ["a", "b"])
codeflash_output = build_stub_fn(funcdef, basis=cell, allowed=[]); stub_fn = codeflash_output # 196μs -> 91.7μs (114% faster)

def test_nonexistent_allowed_edge():
"""Test stub with allowed containing nonexistent argument."""
def cell(a): return a
funcdef = make_funcdef("myfunc", ["a"])
# Allowed contains an argument not in funcdef; should error
with pytest.raises(KeyError):
build_stub_fn(funcdef, basis=cell, allowed=["b"]) # 171μs -> 68.0μs (152% faster)

def test_asyncfunctiondef_edge():
"""Test stub with AsyncFunctionDef."""
def cell(a): return a * 2
funcdef = ast.AsyncFunctionDef(
name="myfunc",
args=ast.arguments(
posonlyargs=[],
args=[ast.arg("a", lineno=1, col_offset=0)],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=1,
col_offset=0,
)
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 229μs -> 116μs (97.2% faster)

def test_argument_linenos_coloffsets_edge():
"""Test stub preserves argument lineno and col_offset."""
def cell(x): return x
funcdef = make_funcdef("myfunc", ["x"], lineno=42, col_offset=13)
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 209μs -> 103μs (102% faster)

def test_argument_name_collision_edge():
"""Test stub with argument name collision with built-in."""
def cell(list): return list + [1]
funcdef = make_funcdef("myfunc", ["list"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 206μs -> 100μs (106% faster)

def test_argument_type_annotations_ignored_edge():
"""Test stub ignores type annotations."""
arg = ast.arg("x", annotation=ast.Name("int", ast.Load()), lineno=1, col_offset=0)
funcdef = ast.FunctionDef(
name="myfunc",
args=ast.arguments(
posonlyargs=[],
args=[arg],
vararg=None,
kwonlyargs=[],
kw_defaults=[],
kwarg=None,
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=1,
col_offset=0,
)
def cell(x): return x + 1
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 204μs -> 99.2μs (106% faster)

Large Scale Test Cases

def test_many_arguments_large_scale():
"""Test stub with a large number of arguments."""
def cell(**kwargs): return sum(kwargs.values())
arg_names = [f"arg{i}" for i in range(1000)]
funcdef = make_funcdef("myfunc", arg_names)
codeflash_output = build_stub_fn(funcdef, basis=lambda **kwargs: sum(kwargs.values())); stub_fn = codeflash_output # 4.55ms -> 4.51ms (1.07% faster)

def test_many_allowed_large_scale():
"""Test stub with a large allowed list subset."""
def cell(**kwargs): return len(kwargs)
arg_names = [f"x{i}" for i in range(1000)]
allowed = [f"x{i}" for i in range(500, 1000)]
funcdef = make_funcdef("myfunc", arg_names)
codeflash_output = build_stub_fn(funcdef, basis=lambda **kwargs: len(kwargs), allowed=allowed); stub_fn = codeflash_output # 1.91ms -> 1.76ms (8.27% faster)

def test_large_argument_names_large_scale():
"""Test stub with very long argument names."""
long_names = ["x" * 100 for _ in range(10)]
def cell(**kwargs): return sum(kwargs.values())
funcdef = make_funcdef("myfunc", long_names)
codeflash_output = build_stub_fn(funcdef, basis=lambda **kwargs: sum(kwargs.values())); stub_fn = codeflash_output # 225μs -> 112μs (101% faster)

def test_wrong_return_type_mutation_resistance():
"""Test fails if stub does not call basis correctly."""
def cell(x): return x + 1
funcdef = make_funcdef("myfunc", ["x"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 232μs -> 117μs (97.2% faster)

def test_argument_mapping_mutation_resistance():
"""Test fails if argument mapping is incorrect."""
def cell(a, b): return a * b
funcdef = make_funcdef("myfunc", ["a", "b"])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 212μs -> 106μs (99.4% faster)

def test_allowed_argument_subset_mutation_resistance():
"""Test fails if allowed argument subset is not respected."""
def cell(a, b, c): return b
funcdef = make_funcdef("myfunc", ["a", "b", "c"])
codeflash_output = build_stub_fn(funcdef, basis=cell, allowed=["b"]); stub_fn = codeflash_output # 204μs -> 99.0μs (107% faster)

def test_callable_returned_mutation_resistance():
"""Test fails if returned object is not callable."""
def cell(): return 123
funcdef = make_funcdef("myfunc", [])
codeflash_output = build_stub_fn(funcdef, basis=cell); stub_fn = codeflash_output # 197μs -> 94.3μs (109% faster)

codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

#------------------------------------------------
import ast
import inspect
from typing import Any, Callable

imports

import pytest # used for our unit tests
from marimo._ast.pytest import build_stub_fn

Helper to create ast.FunctionDef objects for testing

def make_funcdef(name: str, args: list[str]) -> ast.FunctionDef:
return ast.FunctionDef(
name=name,
args=ast.arguments(
posonlyargs=[],
args=[ast.arg(arg=a, lineno=1, col_offset=i) for i, a in enumerate(args)],
kwonlyargs=[],
kw_defaults=[],
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=1,
col_offset=0,
)

Helper stub function for basis

def stub_fn(**kwargs):
return kwargs

-------------------- BASIC TEST CASES --------------------

def test_single_argument_basic():
"""Test with a single argument function."""
funcdef = make_funcdef("myfunc", ["x"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 202μs -> 100μs (101% faster)
result = stub(x=42)

def test_multiple_arguments_basic():
"""Test with multiple arguments."""
funcdef = make_funcdef("myfunc", ["a", "b", "c"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 208μs -> 106μs (95.7% faster)
result = stub(a=1, b=2, c=3)

def test_no_arguments_basic():
"""Test with no arguments."""
funcdef = make_funcdef("empty", [])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 197μs -> 92.0μs (115% faster)
result = stub()

def test_allowed_subset_basic():
"""Test allowed subset of arguments."""
funcdef = make_funcdef("myfunc", ["x", "y", "z"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn, allowed=["y"]); stub = codeflash_output # 203μs -> 97.6μs (109% faster)
result = stub(y=100)

def test_file_argument_basic():
"""Test passing a custom file name."""
funcdef = make_funcdef("myfunc", ["x"])
codeflash_output = build_stub_fn(funcdef, file="myfile.py", basis=stub_fn); stub = codeflash_output # 203μs -> 97.0μs (109% faster)
result = stub(x="abc")

-------------------- EDGE TEST CASES --------------------

def test_empty_allowed_edge():
"""Test with allowed as empty list (should take no arguments)."""
funcdef = make_funcdef("myfunc", ["x", "y"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn, allowed=[]); stub = codeflash_output # 194μs -> 91.1μs (114% faster)
result = stub()

def test_argument_name_collision_edge():
"""Test with argument names that collide with Python keywords."""
funcdef = make_funcdef("myfunc", ["class", "def"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 205μs -> 101μs (101% faster)
result = stub(**{"class": 1, "def": 2})

def test_long_argument_names_edge():
"""Test with long argument names."""
long_name = "a" * 100
funcdef = make_funcdef("myfunc", [long_name])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 201μs -> 98.3μs (105% faster)
result = stub(**{long_name: 999})

def test_non_ascii_argument_names_edge():
"""Test with non-ASCII argument names."""
funcdef = make_funcdef("myfunc", ["α", "β"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 204μs -> 101μs (101% faster)
result = stub(**{"α": 1, "β": 2})

def test_async_functiondef_edge():
"""Test with ast.AsyncFunctionDef."""
funcdef = ast.AsyncFunctionDef(
name="async_func",
args=ast.arguments(
posonlyargs=[],
args=[ast.arg(arg="x", lineno=1, col_offset=0)],
kwonlyargs=[],
kw_defaults=[],
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=1,
col_offset=0,
)
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 200μs -> 97.6μs (105% faster)
result = stub(x=123)

def test_basis_none_edge():
"""Test with basis=None, should raise TypeError when called."""
funcdef = make_funcdef("myfunc", ["x"])
codeflash_output = build_stub_fn(funcdef, basis=None); stub = codeflash_output # 201μs -> 99.4μs (103% faster)
with pytest.raises(TypeError):
stub(x=1)

def test_missing_required_argument_edge():
"""Test calling stub without required argument, should raise TypeError."""
funcdef = make_funcdef("myfunc", ["x", "y"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 231μs -> 120μs (92.9% faster)
with pytest.raises(TypeError):
stub(x=1)

def test_extra_argument_passed_edge():
"""Test passing extra argument not in allowed, should raise TypeError."""
funcdef = make_funcdef("myfunc", ["x"])
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 209μs -> 105μs (98.7% faster)
with pytest.raises(TypeError):
stub(x=1, y=2)

-------------------- LARGE SCALE TEST CASES --------------------

def test_many_arguments_large_scale():
"""Test with a large number of arguments."""
arg_names = [f"arg{i}" for i in range(500)]
funcdef = make_funcdef("bigfunc", arg_names)
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 1.84ms -> 1.72ms (6.77% faster)
values = {name: i for i, name in enumerate(arg_names)}
result = stub(**values)

def test_large_allowed_subset_large_scale():
"""Test with a large allowed subset."""
arg_names = [f"arg{i}" for i in range(100)]
allowed_names = arg_names[:50]
funcdef = make_funcdef("bigfunc", arg_names)
codeflash_output = build_stub_fn(funcdef, basis=stub_fn, allowed=allowed_names); stub = codeflash_output # 353μs -> 234μs (50.5% faster)
values = {name: i for i, name in enumerate(allowed_names)}
result = stub(**values)

def test_large_argument_names_large_scale():
"""Test with many long argument names."""
arg_names = ["x" * (10 + i) for i in range(100)]
funcdef = make_funcdef("bigfunc", arg_names)
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 469μs -> 353μs (32.6% faster)
values = {name: i for i, name in enumerate(arg_names)}
result = stub(**values)

def test_large_async_functiondef_large_scale():
"""Test with large ast.AsyncFunctionDef."""
arg_names = [f"arg{i}" for i in range(200)]
funcdef = ast.AsyncFunctionDef(
name="bigasync",
args=ast.arguments(
posonlyargs=[],
args=[ast.arg(arg=a, lineno=1, col_offset=i) for i, a in enumerate(arg_names)],
kwonlyargs=[],
kw_defaults=[],
defaults=[],
),
body=[ast.Pass()],
decorator_list=[],
lineno=1,
col_offset=0,
)
codeflash_output = build_stub_fn(funcdef, basis=stub_fn); stub = codeflash_output # 734μs -> 636μs (15.5% faster)
values = {name: i for i, name in enumerate(arg_names)}
result = stub(**values)

#------------------------------------------------
from ast import FunctionDef
from marimo._ast.pytest import build_stub_fn
import pytest

def test_build_stub_fn():
with pytest.raises(AttributeError, match="'FunctionDef'\ object\ has\ no\ attribute\ 'args'"):
build_stub_fn(FunctionDef(), file='', basis=None, allowed=[])

🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_bps3n5s8/tmp6xuwm3lt/test_concolic_coverage.py::test_build_stub_fn 112μs 2.38μs 4638%✅

To edit these changes git checkout codeflash/optimize-build_stub_fn-mhvjydtz and push.

Codeflash Static Badge

The optimization introduces **caching for the AST parsing operation** that was being repeated on every function call. The key change is replacing the direct call to `ast_parse(inspect.getsource(_pytest_scaffold))` with a cached version using `@functools.lru_cache(maxsize=1)`.

**What was optimized:**
- Added `_cached_pytest_base()` function with LRU cache to store the parsed AST of the scaffold function
- The expensive `ast_parse(inspect.getsource(_pytest_scaffold))` operation now only runs once and is cached for subsequent calls

**Why this leads to speedup:**
The line profiler shows the most significant improvement in `build_stub_fn` where `PYTEST_BASE = ast_parse(inspect.getsource(_pytest_scaffold))` dropped from 31.4% of total time (19.9ms) to 16.5% of total time (8.5ms). This represents the core optimization - avoiding redundant parsing of the same source code on every invocation.

The `_pytest_scaffold` function is static and never changes, making it an ideal candidate for caching. After the first call, subsequent invocations skip the expensive source code parsing and AST construction entirely.

**Impact on workloads:**
The 24% speedup is most beneficial for scenarios that call `build_stub_fn` repeatedly, as evidenced by the test results showing consistent ~50-100% improvements across individual calls. The optimization is particularly effective for:
- Basic cases with few arguments (87-114% faster)
- Large-scale cases with many arguments still see meaningful gains (6-32% faster)
- The performance benefit scales with call frequency since caching eliminates redundant work

The remaining time is still dominated by `copy.deepcopy(PYTEST_BASE)` (30.2% of time) and `eval(compile(...))` (27% of time), which are necessary for the dynamic AST manipulation and cannot be easily optimized without changing the fundamental approach.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 12, 2025 05:21
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant