Skip to content

Format only code that was optimized (and helpers) #211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 32 commits into from

Conversation

zomglings
Copy link
Contributor

@zomglings zomglings commented May 16, 2025

PR Type

Enhancement, Tests


Description

  • Implement CST-based targeted formatting for functions

  • Track function code ranges with OptimFunctionCollector

  • Enable disabling telemetry via env var

  • Add bubble sort samples for formatting tests


Changes walkthrough 📝

Relevant files
Tests
bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py
Add bubble sort class with bad formatting                               

code_to_optimize/bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py

  • Add poorly formatted top-level lol function
  • Introduce BubbleSorter class with bad formatting
  • Implement sorter bubble sort with prints
  • +38/-0   
    bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py
    Add bubble sort function sample                                                   

    code_to_optimize/bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py

  • Add poorly formatted top-level lol function
  • Implement sorter bubble sort function
  • Print sorting messages and return array
  • +19/-0   
    Enhancement
    code_replacer.py
    Record function code ranges in CST collector                         

    codeflash/code_utils/code_replacer.py

  • Include PositionProvider in metadata dependencies
  • Add modification_code_range_lines list
  • Record start/end lines of modified functions
  • +11/-2   
    formatter.py
    Add function to get modification code ranges                         

    codeflash/code_utils/formatter.py

  • Import libcst, OptimFunctionCollector, and models
  • Add get_modification_code_ranges function
  • Parse code and extract modified line ranges via CST
  • +36/-0   
    function_optimizer.py
    Implement targeted code splicing in optimizer                       

    codeflash/optimization/function_optimizer.py

  • Import get_modification_code_ranges and cast
  • Update reformat_code_and_helpers for targeted formatting
  • Splice formatted code ranges into unformatted code
  • Validate ranges and write updated files
  • +41/-14 
    Configuration changes
    main.py
    Support disabling telemetry via env var                                   

    codeflash/main.py

  • Import os and read CODEFLASH_DISABLE_TELEMETRY
  • Adjust sentry and telemetry initialization logic
  • Respect env var across CLI commands
  • +12/-7   

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Off-by-one slicing

    The code in reformat_code_and_helpers uses 1-based line numbers from PositionProvider directly for Python list slicing, which is 0-based and may lead to incorrect replacements. Verify and adjust indices by subtracting one before slicing.

    new_code_lines = unformatted_code.split("\n")
    for range_0, range_1 in zip(code_ranges_unformatted, code_ranges_formatted):
        range_0_0, range_0_1 = range_0
        range_1_0, range_1_1 = range_1
        new_code_lines = new_code_lines[:range_0_0] + formatted_code_lines[range_1_0:range_1_1 + 1] + new_code_lines[range_0_1 + 1:]
    new_code = "\n".join(new_code_lines)
    Missing tests

    The new get_modification_code_ranges function has no dedicated unit tests. Add tests covering various scenarios (e.g., nested functions, classes, edge cases) to ensure correct range detection.

    def get_modification_code_ranges(
        modified_code: str,
        fto: FunctionToOptimize,
        code_context: CodeOptimizationContext,
    ) -> list[tuple[int, int]]:
        """
        Returns the line number of modified and new functions in a string containing containing the code in a fully modified file.
        """
        modified_functions = set()
        modified_functions.add(fto.qualified_name)
        for helper_function in code_context.helper_functions:
            if helper_function.jedi_definition.type != "class":
                modified_functions.add(helper_function.qualified_name)
    
        parsed_function_names = set()
        for original_function_name in modified_functions:
            if original_function_name.count(".") == 0:
                class_name, function_name = None, original_function_name
            elif original_function_name.count(".") == 1:
                class_name, function_name = original_function_name.split(".")
            else:
                msg = f"Unable to find {original_function_name}. Returning unchanged source code."
                logger.error(msg)
                continue
            parsed_function_names.add((class_name, function_name))
    
        module = cst.metadata.MetadataWrapper(cst.parse_module(modified_code))
        visitor = OptimFunctionCollector(code_context.preexisting_objects, parsed_function_names)
        module.visit(visitor)
        return visitor.modification_code_range_lines

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Sort imports before writing

    You should apply sort_imports before writing the file so the sorted imports are
    actually persisted. Otherwise the write happens before sorting.

    codeflash/optimization/function_optimizer.py [619-622]

    +if should_sort_imports:
    +    new_code = sort_imports(new_code)
     path.write_text(new_code, encoding="utf8")
     
    -if should_sort_imports:
    -    new_code = sort_imports(new_code)
    -
    Suggestion importance[1-10]: 9

    __

    Why: Writing before sorting means the sorted imports aren’t persisted; reordering ensures the final file contains the sorted imports.

    High
    Initialize modification flag correctly

    Instead of defaulting modification to True and only clearing it in one branch,
    initialize it to False and set it to True only when you actually add the function to
    modified or new lists. This prevents unintentional marking of unrelated functions.

    codeflash/code_utils/code_replacer.py [55-75]

    -modification = True
    +modification = False
     
     if (self.current_class, node.name.value) in self.function_names:
    +    modification = True
         self.modified_functions[(self.current_class, node.name.value)] = node
     elif self.current_class and node.name.value == "__init__":
    +    modification = True
         self.modified_init_functions[self.current_class] = node
     elif (
         self.preexisting_objects
         and (node.name.value, ()) not in self.preexisting_objects
         and self.current_class is None
     ):
    +    modification = True
         self.new_functions.append(node)
    -else:
    -    modification = False
     
     if modification:
         pos = self.get_metadata(cst.metadata.PositionProvider, node)
         self.modification_code_range_lines.append((pos.start.line, pos.end.line))
    Suggestion importance[1-10]: 8

    __

    Why: Defaulting modification to True marks all functions as modified; initializing it to False ensures only targeted functions are tracked, avoiding incorrect metadata.

    Medium
    General
    Support nested qualified names

    Support qualified names with multiple dots by splitting at the last dot, rather than
    erroring out. This will correctly handle nested modules or classes.

    codeflash/code_utils/formatter.py [79-87]

     for original_function_name in modified_functions:
    -    if original_function_name.count(".") == 0:
    -        class_name, function_name = None, original_function_name
    -    elif original_function_name.count(".") == 1:
    -        class_name, function_name = original_function_name.split(".")
    -    else:
    -        msg = f"Unable to find {original_function_name}. Returning unchanged source code."
    -        logger.error(msg)
    -        continue
    +    head, sep, tail = original_function_name.rpartition(".")
    +    class_name = head if sep else None
    +    function_name = tail if sep else original_function_name
    Suggestion importance[1-10]: 6

    __

    Why: Splitting at the last dot correctly handles nested qualified names, extending support without altering the handling of simple names.

    Low

    new_code = format_code(self.args.formatter_cmds, path)
    if should_sort_imports:
    new_code = sort_imports(new_code)
    if len(code_ranges_formatted) != len(code_ranges_unformatted):
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Q: Can you help me understand the case on Why these have to be equal always?

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Just a defensive measure. The function to optimize and helpers are non-overlapping blocks of code, and should not become overlapping due to formatting. If this has happened, something has gone wrong in the formatting step or in the extraction step (which identifies function to optimize and helpers).

    @@ -299,7 +299,6 @@ def optimize_function(self) -> Result[BestOptimization, str]:

    self.log_successful_optimization(explanation, generated_tests, exp_type)

    # xylophone
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    xylophone? 😂

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Easy to grep, should be gone. :)

    @misrasaurabh1
    Copy link
    Contributor

    approach looks good. Can you write tests for reformat_code_and_helpers that check for the variety of cases we expect around unformatted and formatted code?

    @zomglings
    Copy link
    Contributor Author

    No longer have permission to push to the official repo, will open this PR from my fork.

    @zomglings zomglings closed this Jun 2, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants