Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up method Parser._generate_expression by 20% in parse.py [codeflash] #194

Closed
wants to merge 2 commits into from

Conversation

KRRT7
Copy link

@KRRT7 KRRT7 commented Sep 11, 2024

📄 Parser._generate_expression() in parse.py

📈 Performance improved by 20% (0.20x faster)

⏱️ Runtime went down from 5.39 milliseconds to 4.51 milliseconds

Explanation and details

Sure, let's optimize the code. Your code already does quite a lot, so optimizations will primarily focus on reducing redundant calculations, avoiding unnecessary data structures, and streamlining control flow where possible.

Here’s the revised code with improvements for efficiency.

Optimization Highlights.

  1. Early return with ternary conditional operators: Simplified the conditions for creating the regular expression pattern.
  2. Regex matching to split expressions - Single iteration: The logic to add elements to the pattern list is concise and reduces the number of iterations.
  3. Direct dictionary lookup for type patterns: Avoids multiple if-elif checks.
  4. Use of partition instead of split: Provides a faster mechanism to separate the name and format in the _handle_field method.

This refactoring improves readability and performance by reducing redundant checks and optimizing string operations.

Correctness verification

The new optimized code was tested for correctness. The results are listed below.

🔘 (none found) − ⚙️ Existing Unit Tests

✅ 60 Passed − 🌀 Generated Regression Tests

(click to show generated tests)
# imports
# function to test
from __future__ import absolute_import

import re
from decimal import Decimal
from functools import partial

import pytest  # used for our unit tests
from parse import Parser

PARSE_RE = re.compile(r"({{|}}|{[\w-]*(?:\.[\w-]+|\[[^]]+])*(?::[^}]+)?})")

REGEX_SAFETY = re.compile(r"([?\\.[\]()*+^$!|])")

# unit tests
@pytest.mark.parametrize("format_string, expected", [
    # Basic Format Strings
    ("Hello World", r"Hello World"),
    ("Sample text with no fields", r"Sample text with no fields"),
    ("{{}}", r"\{\}"),
    ("{{Hello}}", r"\{Hello\}"),
    ("{name}", r"(?P<name>.+?)"),
    ("{0}", r"(.+?)"),
    ("{name} {age}", r"(?P<name>.+?) (?P<age>.+?)"),
    ("{0} {1}", r"(.+?) (.+?)"),

    # Fields with Format Specifiers
    ("{value:d}", r"(?P<value>\d+)"),
    ("{value:.2f}", r"(?P<value>\d*\.\d+)"),
    ("{value:x}", r"(?P<value>(0[xX])?[0-9a-fA-F]+)"),
    ("{name:s}", r"(?P<name>.+?)"),
    ("{name:20s}", r"(?P<name>.+?)"),
    ("{date:%Y-%m-%d}", r"(?P<date>.+?)"),
    ("{time:%H:%M:%S}", r"(?P<time>.+?)"),

    # Edge Cases
    ("", r""),
    ("{name", r"\{name"),
    ("name}", r"name\}"),
    ("{outer{inner}}", r"\{outer(?P<inner>.+?)\}"),

    # Complex Format Strings
    ("Name: {name}, Age: {age}, Score: {score:.2f}", r"Name: (?P<name>.+?), Age: (?P<age>.+?), Score: (?P<score>\d*\.\d+)"),
    ("Date: {date:%Y-%m-%d}, Time: {time:%H:%M:%S}", r"Date: (?P<date>.+?), Time: (?P<time>.+?)"),
    ("{name} {name}", r"(?P<name>.+?) (?P=name)"),
    ("{0} {0}", r"(.+?) \1"),
    ("{user-name}", r"(?P<user_name>.+?)"),
    ("{user.name}", r"(?P<user__name>.+?)"),

    # Performance and Scalability
    (" ".join(f"{{field{i}}}" for i in range(1000)), r" ".join(f"(?P<field{i}>.+?)" for i in range(1000))),

    # Case Sensitivity
    ("{Name}", r"(?P<Name>.+?)"),
    ("{name}", r"(?P<name>.+?)"),

    # Extra Types
    ("{custom_field:custom_type}", r"(?P<custom_field>.+?)"),

    # Special Characters in Text
    ("Text with special characters: .+*?^$()[]{}", r"Text with special characters: \.\+\*\?\^\$\(\)\[\]\{\}"),
    ("Escaped characters: \\ \\.", r"Escaped characters: \\\\ \\\\."),

    # Alignment and Padding
    ("{name:<10}", r"(?P<name>.+?) *"),
    ("{name:>10}", r" *(.+?)"),
    ("{name:^10}", r" *(.+?) *"),
    ("{name:*>10}", r"\**(.+?)"),
    ("{name:_<10}", r"(?P<name>.+?)_*"),

    # Numeric Specifics
    ("{value:+d}", r"[-+ ]?\d+"),
    ("{value: d}", r"[-+ ]?\d+"),
    ("{value:0=10d}", r"0*\d+"),

    # Rare or Unexpected Edge Cases
    ("{name!me}", r"\{name!me\}"),
    ("{na@me}", r"\{na@me\}"),
    ("{name with spaces}", r"\{name with spaces\}"),
    ("{value:10z}", r"(?P<value>\%z+)"),
    ("{value:10q}", r"(?P<value>\%q+)"),
    ("{outer{inner}}", r"\{outer(?P<inner>.+?)\}"),
    ("{outer{{inner}}}", r"\{outer\{inner\}\}"),
    ("{user-name}", r"(?P<user_name>.+?)"),
    ("{user.name}", r"(?P<user__name>.+?)"),
    ("{}", r"(.+?)"),
    ("{ }", r"(?P< >.+?)"),
    ("{{{name}}", r"\{\{(?P<name>.+?)\}"),
    ("{name}}}", r"(?P<name>.+?)\}"),
    ("{value:!@#}", r"(?P<value>\%#)"),
    ("{value:10.2!}", r"(?P<value>\%!)"),
    ("{value:*>10}", r"\**(.+?)"),
    ("{value:_<10}", r"(?P<value>.+?)_*"),
    ("{value:-10d}", r"(?P<value>\d+)"),
    ("{value:0d}", r"(?P<value>\d+)"),
    ("{date:%Q-%W-%E}", r"(?P<date>.+?)"),
    ("{date:%Y-%m}", r"(?P<date>.+?)"),
    ("{outer{inner{deep}}}", r"\{outer(?P<inner>.+?)\{deep\}\}"),
    ("{json_field:{'key':'value'}}", r"\{json_field:\{'key':'value'\}\}"),
    ("{名前}", r"(?P<名前>.+?)"),
    ("{用户}", r"(?P<用户>.+?)"),
    ("{name\\n}", r"\{name\\n\}"),
    ("{name\\t}", r"\{name\\t\}")
])
def test_generate_expression(format_string, expected):
    parser = Parser(format_string)
    codeflash_output = parser._generate_expression()
    # Outputs were verified to be equal to the original implementation

🔘 (none found) − ⏪ Replay Tests

codeflash-ai bot and others added 2 commits September 10, 2024 07:17
Sure, let's optimize the code. Your code already does quite a lot, so optimizations will primarily focus on reducing redundant calculations, avoiding unnecessary data structures, and streamlining control flow where possible.

Here’s the revised code with improvements for efficiency.



### Optimization Highlights.
1. **Early return with ternary conditional operators**: Simplified the conditions for creating the regular expression pattern.
2. **Regex matching to split expressions - Single iteration**: The logic to add elements to the pattern list is concise and reduces the number of iterations.
3. **Direct dictionary lookup for type patterns**: Avoids multiple `if-elif` checks.
4. **Use of `partition` instead of `split`**: Provides a faster mechanism to separate the name and format in the `_handle_field` method.

This refactoring improves readability and performance by reducing redundant checks and optimizing string operations.
…expression-2024-09-10T07.17.25

⚡️ Speed up method `Parser._generate_expression` by 20% in `parse.py`
@wimglenn
Copy link
Collaborator

No thanks.

@wimglenn wimglenn closed this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants