Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 7, 2025

📄 12% (0.12x) speedup for extract_sql_commands in litellm/proxy/db/check_migration.py

⏱️ Runtime : 1.13 milliseconds 1.01 milliseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 12% speedup by replacing inefficient string concatenation with list-based buffering and reducing redundant string operations.

Key optimizations:

  1. Eliminated expensive string concatenation: The original code used current_command += line + " " repeatedly, which creates new string objects each time. The optimized version uses a buffer list and only joins strings when commands are complete via ' '.join(buffer).

  2. Reduced redundant strip() operations: The original code stripped every line during initial parsing with [line.strip() for line in diff_output.split("\n") if line.strip()], then processed the already-stripped lines. The optimized version only strips non-empty lines once during processing, eliminating duplicate work.

  3. Improved empty line handling: Instead of filtering empty lines upfront, the optimized code uses continue to skip them during iteration, avoiding the overhead of creating a filtered list.

  4. More efficient buffer management: Using buffer.clear() is more efficient than string reassignment, and the list-based approach scales better with command length.

The optimizations are particularly effective for:

  • Multiline commands (15-27% faster): Less string concatenation overhead
  • Large inputs with noise (20-27% faster): Better empty line handling
  • Commands without semicolons (24-28% faster): More efficient end-of-input flushing

These improvements make the function more efficient for parsing Prisma migration diffs, which often contain multiline SQL commands and significant whitespace/noise.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 58 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import List

# imports
import pytest  # used for our unit tests
from litellm.proxy.db.check_migration import extract_sql_commands

# unit tests

# 1. Basic Test Cases

def test_single_simple_sql_command():
    # Test extraction of a single SQL command following a comment line
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 3.33μs -> 2.83μs (17.3% faster)

def test_multiple_sql_commands():
    # Test extraction of multiple SQL commands, each with its own comment
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    -- CreateTable
    CREATE TABLE "Post" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 3.85μs -> 3.39μs (13.5% faster)

def test_command_with_multiple_lines():
    # Test SQL command that spans multiple lines
    diff = """
    -- CreateTable
    CREATE TABLE "User" (
      "id" INT PRIMARY KEY,
      "name" TEXT
    );
    """
    expected = ['CREATE TABLE "User" ( "id" INT PRIMARY KEY, "name" TEXT );']
    codeflash_output = extract_sql_commands(diff) # 4.76μs -> 3.87μs (22.9% faster)

def test_commands_with_interleaved_empty_lines():
    # Test input with empty lines between commands
    diff = """
    -- CreateTable

    CREATE TABLE "User" ("id" INT PRIMARY KEY);

    -- CreateTable

    CREATE TABLE "Post" ("id" INT PRIMARY KEY);

    """
    codeflash_output = extract_sql_commands(diff) # 3.84μs -> 3.50μs (9.71% faster)

# 2. Edge Test Cases

def test_no_sql_commands():
    # Test input with no comment and no SQL commands
    diff = ""
    codeflash_output = extract_sql_commands(diff) # 1.35μs -> 1.01μs (33.8% faster)

def test_only_comments_no_sql():
    # Test input with only comment lines, no SQL
    diff = """
    -- CreateTable
    -- DropTable
    """
    codeflash_output = extract_sql_commands(diff) # 2.42μs -> 2.06μs (17.7% faster)

def test_sql_without_semicolon():
    # Test SQL command missing final semicolon (should not extract)
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY)
    """
    # Since the command never ends with ';', it should be collected at the end
    codeflash_output = extract_sql_commands(diff) # 3.09μs -> 2.41μs (28.3% faster)

def test_command_with_internal_semicolons():
    # Test SQL command with semicolons inside quotes, but ends with a real semicolon
    diff = """
    -- InsertData
    INSERT INTO "User" ("bio") VALUES ('This is a test; with semicolon.');
    """
    codeflash_output = extract_sql_commands(diff) # 2.87μs -> 2.67μs (7.69% faster)

def test_command_with_multiple_comment_lines_before_sql():
    # Test multiple comment lines before a SQL command
    diff = """
    -- CreateTable
    -- This is a table for users
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    # Only the first comment triggers the block, second is ignored as it's not followed by SQL
    codeflash_output = extract_sql_commands(diff) # 3.17μs -> 2.87μs (10.5% faster)

def test_command_with_leading_and_trailing_whitespace():
    # Test SQL command with extra spaces/tabs
    diff = """
    -- CreateTable
        CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 2.87μs -> 2.59μs (10.6% faster)

def test_command_with_inline_comment():
    # Test SQL command with inline SQL comment (should not affect extraction)
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY); -- Inline comment
    """
    codeflash_output = extract_sql_commands(diff) # 2.85μs -> 2.41μs (18.5% faster)

def test_command_with_multiple_sql_per_comment():
    # Test input where multiple SQL commands follow a single comment
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    CREATE TABLE "Post" ("id" INT PRIMARY KEY);
    """
    # Only the first command is extracted, as the block ends after the first semicolon
    codeflash_output = extract_sql_commands(diff) # 3.13μs -> 2.78μs (12.6% faster)

def test_command_with_no_comment():
    # Test SQL command without a preceding comment (should not be extracted)
    diff = """
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 2.06μs -> 1.64μs (25.7% faster)

def test_command_with_nonstandard_comment_prefix():
    # Test input with comments not starting with '-- ' (should not trigger extraction)
    diff = """
    --CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 2.25μs -> 1.82μs (23.5% faster)

def test_command_with_mixed_case_comment_prefix():
    # Test input with mixed case comment
    diff = """
    -- createTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 2.87μs -> 2.59μs (11.0% faster)

def test_command_with_special_characters():
    # Test SQL command with special characters
    diff = """
    -- CreateTable
    CREATE TABLE "User$" ("id@" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 2.74μs -> 2.59μs (5.92% faster)

def test_command_with_multiline_and_trailing_comment():
    # Test multiline SQL with trailing comment after semicolon
    diff = """
    -- CreateTable
    CREATE TABLE "User" (
      "id" INT PRIMARY KEY
    ); -- Table created
    """
    expected = ['CREATE TABLE "User" ( "id" INT PRIMARY KEY ); -- Table created']
    codeflash_output = extract_sql_commands(diff) # 3.96μs -> 3.28μs (20.8% faster)

def test_command_with_extra_semicolons():
    # Test SQL command ending with multiple semicolons
    diff = """
    -- CreateTable
    CREATE TABLE "User" ("id" INT PRIMARY KEY);;
    """
    codeflash_output = extract_sql_commands(diff) # 2.77μs -> 2.56μs (7.80% faster)

def test_command_with_only_whitespace_lines():
    # Test input with only whitespace lines
    diff = "\n   \n\t\n"
    codeflash_output = extract_sql_commands(diff) # 1.77μs -> 1.52μs (16.4% faster)

def test_command_with_unicode_characters():
    # Test SQL command with unicode characters
    diff = """
    -- CreateTable
    CREATE TABLE "用户" ("标识" INT PRIMARY KEY);
    """
    codeflash_output = extract_sql_commands(diff) # 4.04μs -> 3.61μs (11.9% faster)

# 3. Large Scale Test Cases

def test_large_number_of_commands():
    # Test extraction from a large diff with many commands
    diff = ""
    expected = []
    for i in range(100):
        diff += f"-- CreateTable\nCREATE TABLE \"User{i}\" (\"id\" INT PRIMARY KEY);\n"
        expected.append(f'CREATE TABLE "User{i}" ("id" INT PRIMARY KEY);')
    codeflash_output = extract_sql_commands(diff) # 37.2μs -> 36.6μs (1.67% faster)

def test_large_multiline_commands():
    # Test large number of multiline commands
    diff = ""
    expected = []
    for i in range(100):
        diff += f"-- CreateTable\nCREATE TABLE \"User{i}\" (\n  \"id\" INT PRIMARY KEY,\n  \"name\" TEXT\n);\n"
        expected.append(f'CREATE TABLE "User{i}" ( "id" INT PRIMARY KEY, "name" TEXT );')
    codeflash_output = extract_sql_commands(diff) # 104μs -> 90.1μs (15.7% faster)

def test_large_input_with_noise():
    # Test large input with unrelated lines and noise
    diff = ""
    expected = []
    for i in range(50):
        diff += f"-- CreateTable\nCREATE TABLE \"User{i}\" (\"id\" INT PRIMARY KEY);\n"
        diff += "\n" * 2
        diff += "Some unrelated log message\n"
        diff += "-- DropTable\nDROP TABLE \"OldUser\";\n"
        expected.append(f'CREATE TABLE "User{i}" ("id" INT PRIMARY KEY);')
        expected.append('DROP TABLE "OldUser";')
    codeflash_output = extract_sql_commands(diff) # 47.3μs -> 45.5μs (4.05% faster)

def test_large_input_no_sql():
    # Test large input with many comments but no SQL
    diff = ""
    for i in range(100):
        diff += f"-- CreateTable\n"
    codeflash_output = extract_sql_commands(diff) # 15.3μs -> 12.0μs (27.0% faster)

def test_large_input_all_noise():
    # Test large input with only unrelated lines
    diff = ""
    for i in range(200):
        diff += f"Unrelated log line {i}\n"
    codeflash_output = extract_sql_commands(diff) # 26.0μs -> 20.6μs (25.9% faster)

def test_large_input_mixed_semicolons():
    # Test large input with some commands missing semicolons
    diff = ""
    expected = []
    for i in range(50):
        diff += f"-- CreateTable\nCREATE TABLE \"User{i}\" (\"id\" INT PRIMARY KEY)\n"
        expected.append(f'CREATE TABLE "User{i}" ("id" INT PRIMARY KEY)')
    codeflash_output = extract_sql_commands(diff) # 22.4μs -> 19.9μs (12.8% faster)

def test_large_multiline_command_with_internal_semicolons():
    # Test multiline SQL command with internal semicolons in values
    diff = ""
    expected = []
    for i in range(20):
        diff += f"-- InsertData\nINSERT INTO \"User{i}\" (\"bio\") VALUES ('semicolon;here');\n"
        expected.append(f'INSERT INTO "User{i}" ("bio") VALUES (\'semicolon;here\');')
    codeflash_output = extract_sql_commands(diff) # 9.71μs -> 9.30μs (4.48% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import List

# imports
import pytest
from litellm.proxy.db.check_migration import extract_sql_commands

# unit tests

# ---------------------
# 1. BASIC TEST CASES
# ---------------------

def test_single_simple_command():
    # Single SQL command after a comment
    diff = "-- CreateTable\nCREATE TABLE foo (id INT);"
    codeflash_output = extract_sql_commands(diff) # 2.36μs -> 2.09μs (12.8% faster)

def test_multiple_commands():
    # Multiple SQL commands, each after a comment
    diff = """-- CreateTable
CREATE TABLE foo (id INT);
-- CreateTable
CREATE TABLE bar (id INT);"""
    codeflash_output = extract_sql_commands(diff) # 2.94μs -> 2.65μs (10.9% faster)

def test_multiline_sql_command():
    # SQL command that spans multiple lines
    diff = """-- CreateTable
CREATE TABLE foo (
    id INT,
    name TEXT
);"""
    expected = ["CREATE TABLE foo ( id INT, name TEXT );"]
    codeflash_output = extract_sql_commands(diff) # 4.28μs -> 3.36μs (27.2% faster)

def test_commands_with_extra_whitespace():
    # SQL commands with extra blank lines and leading/trailing whitespace
    diff = """
-- CreateTable

CREATE TABLE foo (id INT);

-- CreateTable

CREATE TABLE bar (id INT);
"""
    codeflash_output = extract_sql_commands(diff) # 3.32μs -> 3.21μs (3.62% faster)

def test_no_sql_commands():
    # No SQL commands, only comments
    diff = "-- CreateTable\n-- DropTable"
    codeflash_output = extract_sql_commands(diff) # 1.97μs -> 1.58μs (24.3% faster)

def test_command_without_comment():
    # SQL command not preceded by a comment (should not be extracted)
    diff = "CREATE TABLE foo (id INT);"
    codeflash_output = extract_sql_commands(diff) # 1.62μs -> 1.20μs (35.1% faster)

def test_command_with_semicolon_in_middle():
    # Semicolon in the middle of a line (should not terminate command)
    diff = """-- CreateTable
CREATE TABLE foo (id INT DEFAULT (SELECT 1;));"""
    # The function will treat this as a complete command since the line ends with ';'
    codeflash_output = extract_sql_commands(diff) # 2.33μs -> 2.09μs (11.8% faster)

# ---------------------
# 2. EDGE TEST CASES
# ---------------------

def test_empty_string():
    # Empty input string
    codeflash_output = extract_sql_commands("") # 1.27μs -> 924ns (37.4% faster)

def test_only_whitespace():
    # Input string with only whitespace
    codeflash_output = extract_sql_commands("   \n   ") # 1.59μs -> 1.29μs (23.3% faster)

def test_comment_without_sql():
    # Comment line but no SQL command after
    diff = "-- CreateTable"
    codeflash_output = extract_sql_commands(diff) # 1.59μs -> 1.20μs (31.9% faster)

def test_sql_without_semicolon():
    # SQL command missing a semicolon (should not be extracted per logic)
    diff = "-- CreateTable\nCREATE TABLE foo (id INT)"
    codeflash_output = extract_sql_commands(diff) # 2.55μs -> 2.05μs (24.3% faster)
    # This exposes a subtlety: the implementation flushes at the end

def test_multiple_comments_before_sql():
    # Multiple comment lines before a SQL command
    diff = """-- CreateTable
-- AnotherComment
CREATE TABLE foo (id INT);"""
    # Only the last comment triggers in_sql_block, so only one command is extracted
    codeflash_output = extract_sql_commands(diff) # 2.78μs -> 2.54μs (9.69% faster)

def test_sql_with_internal_comments():
    # SQL command contains inline SQL comments (should be included as part of command)
    diff = """-- CreateTable
CREATE TABLE foo (
    id INT, -- primary key
    name TEXT
);"""
    expected = ["CREATE TABLE foo ( id INT, -- primary key name TEXT );"]
    codeflash_output = extract_sql_commands(diff) # 4.38μs -> 3.55μs (23.4% faster)

def test_comment_like_line_in_sql():
    # SQL command contains a line starting with '--', but not at start of line
    diff = """-- CreateTable
CREATE TABLE foo (
    id INT,
    name TEXT -- user name
);"""
    expected = ["CREATE TABLE foo ( id INT, name TEXT -- user name );"]
    codeflash_output = extract_sql_commands(diff) # 4.08μs -> 3.29μs (24.2% faster)

def test_multiple_commands_with_irregular_spacing():
    # Multiple SQL commands with irregular spacing and blank lines
    diff = """
-- CreateTable

CREATE TABLE foo (id INT);

-- DropTable

DROP TABLE bar;

-- AlterTable
ALTER TABLE baz ADD COLUMN age INT;
"""
    codeflash_output = extract_sql_commands(diff) # 4.03μs -> 3.81μs (5.58% faster)

def test_sql_command_with_semicolon_in_string_literal():
    # Semicolon inside a string literal should not terminate the command early
    diff = """-- Insert
INSERT INTO foo (text) VALUES ('hello; world');"""
    codeflash_output = extract_sql_commands(diff) # 2.34μs -> 2.11μs (10.8% faster)

def test_sql_command_with_leading_and_trailing_newlines():
    # Command surrounded by newlines
    diff = "\n\n-- CreateTable\n\nCREATE TABLE foo (id INT);\n\n"
    codeflash_output = extract_sql_commands(diff) # 2.70μs -> 2.59μs (4.48% faster)

def test_sql_command_with_non_ascii_characters():
    # SQL command contains non-ASCII characters
    diff = "-- CreateTable\nCREATE TABLE foo (name TEXT, emoji TEXT DEFAULT '😀');"
    codeflash_output = extract_sql_commands(diff) # 3.50μs -> 3.22μs (8.73% faster)

def test_sql_command_with_escaped_quotes():
    # SQL command contains escaped quotes
    diff = "-- Insert\nINSERT INTO foo (text) VALUES ('It\\'s a test;');"
    codeflash_output = extract_sql_commands(diff) # 2.39μs -> 2.08μs (14.7% faster)

def test_command_with_multiple_semicolons():
    # Command contains multiple semicolons, but only ends at line end
    diff = """-- Insert
INSERT INTO foo VALUES (1); INSERT INTO foo VALUES (2);"""
    # As per implementation, both are on one line, so treated as one command
    codeflash_output = extract_sql_commands(diff) # 2.33μs -> 2.06μs (12.9% faster)

def test_sql_command_with_tabs_and_spaces():
    # SQL command with tabs and spaces for indentation
    diff = "-- CreateTable\n\tCREATE TABLE foo (\n\t\tid INT\n\t);"
    expected = ["CREATE TABLE foo ( id INT );"]
    codeflash_output = extract_sql_commands(diff) # 4.04μs -> 3.29μs (22.6% faster)

# ---------------------
# 3. LARGE SCALE TEST CASES
# ---------------------

def test_large_number_of_commands():
    # Test with 500 SQL commands to check performance and correctness
    num_commands = 500
    diff = ""
    for i in range(num_commands):
        diff += f"-- CreateTable\nCREATE TABLE table_{i} (id INT);\n"
    expected = [f"CREATE TABLE table_{i} (id INT);" for i in range(num_commands)]
    codeflash_output = extract_sql_commands(diff) # 184μs -> 186μs (0.785% slower)

def test_large_multiline_commands():
    # Test with 100 multiline SQL commands
    num_commands = 100
    diff = ""
    for i in range(num_commands):
        diff += f"-- CreateTable\nCREATE TABLE table_{i} (\n    id INT,\n    name TEXT\n);\n"
    expected = [f"CREATE TABLE table_{i} ( id INT, name TEXT );" for i in range(num_commands)]
    codeflash_output = extract_sql_commands(diff) # 103μs -> 87.8μs (18.2% faster)

def test_large_input_with_irrelevant_lines():
    # Large input with many irrelevant lines (comments, whitespace, non-SQL text)
    num_commands = 200
    lines = []
    for i in range(num_commands):
        lines.append("-- SomeOtherComment")
        lines.append("This is not SQL")
        lines.append("-- CreateTable")
        lines.append(f"CREATE TABLE foo_{i} (id INT);")
        lines.append("")
    diff = "\n".join(lines)
    expected = [f"CREATE TABLE foo_{i} (id INT);" for i in range(num_commands)]
    codeflash_output = extract_sql_commands(diff) # 145μs -> 137μs (5.84% faster)

def test_large_command_with_long_lines():
    # One very long SQL command (over 1000 characters)
    long_column_def = ", ".join([f"col{i} INT" for i in range(50)])
    diff = f"-- CreateTable\nCREATE TABLE big_table ({long_column_def});"
    expected = [f"CREATE TABLE big_table ({long_column_def});"]
    codeflash_output = extract_sql_commands(diff) # 2.58μs -> 2.36μs (9.26% faster)

def test_large_input_with_no_commands():
    # Large input with only comments and non-SQL lines
    diff = "\n".join([f"-- Comment {i}\nThis is not a command" for i in range(800)])
    codeflash_output = extract_sql_commands(diff) # 306μs -> 253μs (20.9% faster)

# ---------------------
# 4. FUNCTIONALITY MUTATION TESTS
# ---------------------

def test_mutation_missing_semicolon():
    # If a semicolon is missing, the command is still flushed at the end
    diff = "-- CreateTable\nCREATE TABLE foo (id INT)"
    # Should flush at the end
    codeflash_output = extract_sql_commands(diff) # 2.45μs -> 1.97μs (24.5% faster)

def test_mutation_multiple_commands_without_comments():
    # Multiple SQL commands, none preceded by a comment (should not be extracted)
    diff = "CREATE TABLE foo (id INT);\nCREATE TABLE bar (id INT);"
    codeflash_output = extract_sql_commands(diff) # 1.90μs -> 1.48μs (29.0% faster)

def test_mutation_command_with_comment_but_no_sql():
    # Comment followed by no SQL
    diff = "-- CreateTable"
    codeflash_output = extract_sql_commands(diff) # 1.58μs -> 1.20μs (32.0% faster)

def test_mutation_command_with_comment_and_whitespace_only():
    # Comment followed by whitespace only
    diff = "-- CreateTable\n   \n   "
    codeflash_output = extract_sql_commands(diff) # 2.12μs -> 1.73μs (22.9% faster)

def test_mutation_incomplete_sql_block():
    # Comment, some SQL lines, but no semicolon and no flush
    diff = "-- CreateTable\nCREATE TABLE foo (id INT"
    codeflash_output = extract_sql_commands(diff) # 2.46μs -> 2.10μs (17.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-extract_sql_commands-mho927zm and push.

Codeflash Static Badge

The optimized code achieves a 12% speedup by replacing inefficient string concatenation with list-based buffering and reducing redundant string operations.

**Key optimizations:**

1. **Eliminated expensive string concatenation**: The original code used `current_command += line + " "` repeatedly, which creates new string objects each time. The optimized version uses a `buffer` list and only joins strings when commands are complete via `' '.join(buffer)`.

2. **Reduced redundant `strip()` operations**: The original code stripped every line during initial parsing with `[line.strip() for line in diff_output.split("\n") if line.strip()]`, then processed the already-stripped lines. The optimized version only strips non-empty lines once during processing, eliminating duplicate work.

3. **Improved empty line handling**: Instead of filtering empty lines upfront, the optimized code uses `continue` to skip them during iteration, avoiding the overhead of creating a filtered list.

4. **More efficient buffer management**: Using `buffer.clear()` is more efficient than string reassignment, and the list-based approach scales better with command length.

The optimizations are particularly effective for:
- **Multiline commands** (15-27% faster): Less string concatenation overhead
- **Large inputs with noise** (20-27% faster): Better empty line handling  
- **Commands without semicolons** (24-28% faster): More efficient end-of-input flushing

These improvements make the function more efficient for parsing Prisma migration diffs, which often contain multiline SQL commands and significant whitespace/noise.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 7, 2025 02:42
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant