Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Extended Improve Mode for More Thorough PR Reviews #229

Merged
merged 5 commits into from
Aug 22, 2023

Conversation

mrT23
Copy link
Collaborator

@mrT23 mrT23 commented Aug 22, 2023

User description

PR Type:

Enhancement


PR Description:

This pull request introduces an extended mode for the "improve" tool, which provides more thorough feedback on pull requests by making multiple model calls. The extended mode allows for no compression of diff files, generates code suggestions for each group of diff files, and aggregates these suggestions. It also includes functionality for ranking the suggestions based on their importance and clipping the least important suggestions. The extended mode can be activated with the 'improve --extended' command.


PR Main Files Walkthrough:

pr_agent/tools/pr_code_suggestions.py: Added support for the extended improve mode. This includes the creation of a new function '_prepare_prediction_extended' to handle the extended mode logic, and a 'rank_suggestions' function to rank code suggestions based on their importance. The 'run' function was also updated to handle the extended mode.
pr_agent/algo/pr_processing.py: Added a new function 'get_pr_multi_diffs' to get multiple diffs for the extended improve mode. The 'pr_generate_extended_diff' function was also updated to return the tokens of each extended patch.
pr_agent/cli.py: Updated the CLI help text to include information about the extended improve mode.
pr_agent/servers/help.py: Updated the server help text to include information about the extended improve mode.
pr_agent/algo/ai_handler.py: Updated the 'chat_completion' function to have a default temperature value.
pr_agent/algo/utils.py: Updated the 'update_settings_from_args' function to handle the '--extended' argument.
pr_agent/algo/git_patch_processing.py: Updated the 'convert_to_hunks_with_lines_numbers' function to add an extra newline before the filename for better readability.
pr_agent/config_loader.py: Added 'settings/pr_sort_code_suggestions_prompts.toml' to the list of configuration files to load.
pr_agent/settings/pr_code_suggestions_prompts.toml: Updated the system and user prompts for code suggestions to accommodate the extended improve mode.
pr_agent/settings/pr_sort_code_suggestions_prompts.toml: Added a new configuration file for the prompts used when sorting code suggestions in the extended improve mode.
pr_agent/settings/configuration.toml: Added new configuration options for the extended improve mode, including the number of code suggestions per chunk, whether to rank extended suggestions, the maximum number of calls, and the final clip factor.


PR Type

Enhancement, Documentation


Description

  • Introduced an extended improve mode for more thorough PR reviews, allowing multiple model calls and suggestion ranking.
  • Enhanced hunk formatting in patches for better readability.
  • Updated CLI and server help texts to include information about the extended improve mode.
  • Added new configuration options and files to support the extended improve mode.
  • Modified functions to handle multiple diffs and added return types for better type safety.

Changes walkthrough 📝

Relevant files
Enhancement
10 files
ai_handler.py
Set default temperature in chat_completion function           

pr_agent/algo/ai_handler.py

  • Changed the default temperature value in chat_completion function.
  • +1/-1     
    git_patch_processing.py
    Enhance hunk formatting and readability in patches             

    pr_agent/algo/git_patch_processing.py

  • Improved readability by adding extra newlines before filenames.
  • Replaced hunk delimiters with __new hunk__ and __old hunk__.
  • Added handling for previous header lines in hunks.
  • +16/-10 
    pr_processing.py
    Implement multi-diff handling and extended diff generation

    pr_agent/algo/pr_processing.py

  • Added get_pr_multi_diffs function for handling multiple diffs.
  • Modified pr_generate_extended_diff to return additional tokens.
  • Adjusted default arguments in get_pr_diff.
  • +93/-13 
    utils.py
    Enhance token clipping and argument validation                     

    pr_agent/algo/utils.py

  • Added validation for empty text in clip_tokens.
  • Improved argument validation in update_settings_from_args.
  • +2/-1     
    bitbucket_provider.py
    Specify return type for publish_code_suggestions                 

    pr_agent/git_providers/bitbucket_provider.py

    • Added return type bool to publish_code_suggestions.
    +1/-1     
    git_provider.py
    Specify return type for publish_code_suggestions                 

    pr_agent/git_providers/git_provider.py

    • Added return type bool to publish_code_suggestions.
    +1/-1     
    github_provider.py
    Specify return type for publish_code_suggestions                 

    pr_agent/git_providers/github_provider.py

    • Added return type bool to publish_code_suggestions.
    +1/-1     
    gitlab_provider.py
    Specify return type for publish_code_suggestions                 

    pr_agent/git_providers/gitlab_provider.py

    • Added return type bool to publish_code_suggestions.
    +1/-1     
    local_git_provider.py
    Specify return type for publish_code_suggestions                 

    pr_agent/git_providers/local_git_provider.py

    • Added return type bool to publish_code_suggestions.
    +1/-1     
    pr_code_suggestions.py
    Implement extended improve mode and suggestion ranking     

    pr_agent/tools/pr_code_suggestions.py

  • Added support for extended improve mode.
  • Implemented suggestion ranking and separate publishing.
  • +107/-8 
    Documentation
    3 files
    cli.py
    Update CLI help for extended improve mode                               

    pr_agent/cli.py

    • Updated CLI help text to include extended improve mode.
    +14/-6   
    help.py
    Update server help for extended improve mode                         

    pr_agent/servers/help.py

    • Updated help text to include extended improve mode.
    +1/-1     
    pr_code_suggestions_prompts.toml
    Update prompts for extended improve mode                                 

    pr_agent/settings/pr_code_suggestions_prompts.toml

    • Updated prompts to accommodate extended improve mode.
    +43/-26 
    Configuration changes
    3 files
    config_loader.py
    Include sorting configuration in settings loader                 

    pr_agent/config_loader.py

    • Added new configuration file for sorting code suggestions.
    +1/-0     
    configuration.toml
    Add extended improve mode configurations                                 

    pr_agent/settings/configuration.toml

    • Added configuration options for extended improve mode.
    +6/-0     
    pr_sort_code_suggestions_prompts.toml
    Add configuration for sorting code suggestions                     

    pr_agent/settings/pr_sort_code_suggestions_prompts.toml

    • Added new configuration file for sorting code suggestions.
    +46/-0   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 22, 2023

    /describe

    @github-actions github-actions bot changed the title Tr/sequential improve Implementing Extended Improve Mode for More Thorough PR Reviews Aug 22, 2023
    @github-actions github-actions bot added the enhancement New feature or request label Aug 22, 2023
    pr_agent/algo/ai_handler.py Show resolved Hide resolved
    pr_agent/algo/pr_processing.py Show resolved Hide resolved
    pr_agent/algo/utils.py Show resolved Hide resolved
    pr_agent/algo/pr_processing.py Show resolved Hide resolved
    pr_agent/tools/pr_code_suggestions.py Show resolved Hide resolved
    pr_agent/algo/pr_processing.py Show resolved Hide resolved
    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 22, 2023

    @zmeir
    i am implementing here your suggestion for unifying the "improve" and "extended improve"

    I ran from the CLI:
    --pr_url="#229" improve --extended

    @esauser
    Copy link

    esauser commented Aug 22, 2023

    @mrT23 can you add configuration so that the automatic review can also hit this new path of desired?

    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 22, 2023

    @esauser

    hit this new path of desired

    'review' is a different tool, that works differently.
    Since the goal of 'review' is to summarize the entire PR, it will require significant logic adaptations to review files in chunks and aggregate.

    But I think a simpler solution, if you like the "extended" feature, is to can make the "improve --extended" command to run automatically on any PR, instead (or in addition) to "review'.

    @mrT23 mrT23 merged commit cbe0a69 into main Aug 22, 2023
    @mrT23 mrT23 deleted the tr/sequential_improve branch August 22, 2023 13:46
    @esauser
    Copy link

    esauser commented Aug 24, 2023

    But I think a simpler solution, if you like the "extended" feature, is to can make the "improve --extended" command to run automatically on any PR, instead (or in addition) to "review'.

    Yes, that was my expectation. A civics value to have it fire. It would be nice to be able to have describe auto fire as well along with describe.

    @Codium-ai Codium-ai deleted a comment from github-actions bot Aug 28, 2023
    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 28, 2023

    /ask rate this PR quality, from 1 to 10. explain why

    @github-actions
    Copy link
    Contributor

    Question: rate this pr quality, from 1 to 10. explain why

    Answer:
    I would rate this PR as an 8. The code changes are substantial and well-structured, with clear explanations provided in the PR description. The implementation of the extended improve mode is a significant enhancement to the tool, allowing for more thorough feedback on pull requests. The code is generally clean and follows good practices.

    However, there are a few areas that could be improved:

    1. In pr_agent/algo/pr_processing.py, the function get_pr_multi_diffs is quite long and complex. It could potentially be broken down into smaller, more manageable functions to improve readability and maintainability.

    2. The PR could benefit from more comments explaining the logic, especially in the more complex functions. For example, in pr_agent/tools/pr_code_suggestions.py, the function _prepare_prediction_extended could use comments to explain the logic of the for loop and the if conditions.

    3. There are some magic numbers used in the code, such as 5 in get_pr_multi_diffs function. It would be better to define these as constants at the top of the file or in a configuration file, to make the code easier to understand and modify.

    4. The PR could benefit from more comprehensive testing. It's not clear from the PR if new tests have been added or existing tests have been updated to cover the new functionality.

    5. The PR includes changes to several files that seem unrelated to the main purpose of the PR, such as pr_agent/algo/ai_handler.py and pr_agent/git_providers/gitlab_provider.py. It would be better if these changes were made in separate PRs, to keep each PR focused on a single feature or bug fix.

    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 28, 2023

    /review

    @github-actions
    Copy link
    Contributor

    github-actions bot commented Aug 28, 2023

    PR Analysis

    (review updated until commit 9157fa6)

    • 🎯 Main theme: Implementation of an extended mode for the "improve" tool to provide more thorough feedback on pull requests.
    • 📝 PR summary: This PR introduces an extended mode for the "improve" tool, which provides more comprehensive feedback on pull requests by making multiple model calls. The extended mode allows for no compression of diff files, generates code suggestions for each group of diff files, and aggregates these suggestions. It also includes functionality for ranking the suggestions based on their importance and clipping the least important suggestions.
    • 📌 Type of PR: Enhancement
    • 🧪 Relevant tests added: No
    • ⏱️ Estimated effort to review [1-5]: 4, because the PR introduces significant changes across multiple files, including new functionalities and modifications to existing ones. It requires a thorough understanding of the existing codebase and the new features being introduced.
    • 🔒 Security concerns: No security concerns found

    PR Feedback

    • 💡 General suggestions: The PR is well-structured and introduces significant enhancements to the "improve" tool. The extended mode should provide more comprehensive and valuable feedback on pull requests. However, it would be beneficial to add tests to validate the new functionalities and ensure they work as expected. Also, consider handling exceptions more gracefully in the new functions to avoid potential crashes.

    • 🤖 Code feedback:
      • relevant file: pr_agent/algo/pr_processing.py
        suggestion: Consider handling the case where the 'text' argument in the 'clip_tokens' function is None or empty more gracefully. Instead of checking if 'text' is not None, you could directly return 'text' if it's None or empty. This would make the function more robust. [important]
        relevant line: if not text:

      • relevant file: pr_agent/tools/pr_code_suggestions.py
        suggestion: The '_prepare_prediction_extended' function seems to be doing a lot. Consider breaking it down into smaller, more manageable functions. This would improve the readability and maintainability of the code. [medium]
        relevant line: async def _prepare_prediction_extended(self, model: str) -> dict:

      • relevant file: pr_agent/algo/pr_processing.py
        suggestion: The 'get_pr_multi_diffs' function could benefit from more detailed comments explaining the logic behind the code. This would make it easier for other developers to understand the function's purpose and how it works. [medium]
        relevant line: def get_pr_multi_diffs(git_provider: GitProvider,

      • relevant file: pr_agent/algo/git_patch_processing.py
        suggestion: The 'convert_to_hunks_with_lines_numbers' function uses 'new hunk' and 'old hunk' as markers. Consider using constants for these markers to avoid potential typos and make the code more maintainable. [medium]
        relevant line: patch_with_lines_str += '\n__new hunk__\n'

    How to use

    Instructions

    Tag me in a comment '@CodiumAI-Agent' and add one of the following commands:
    /review: Request a review of your Pull Request.
    /describe: Update the PR title and description based on the contents of the PR.
    /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    /ask <QUESTION>: Ask a question about the PR.
    /update_changelog: Update the changelog based on the PR's contents.
    /add_docs: Generate docstring for new components introduced in the PR.
    /generate_labels: Generate labels for the PR based on the PR's contents.
    see the tools guide for more details.

    To edit any configuration parameter from the configuration.toml, add --config_path=new_value.
    For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
    To list the possible configuration parameters, add a /config comment.

    @mrT23
    Copy link
    Collaborator Author

    mrT23 commented Aug 28, 2023

    /improve --extended

    @@ -121,7 +138,11 @@ def push_inline_code_suggestions(self, data):
    if get_settings().config.verbosity_level >= 2:
    logging.info(f"Could not parse suggestion: {d}")

    self.git_provider.publish_code_suggestions(code_suggestions)
    is_successful = self.git_provider.publish_code_suggestions(code_suggestions)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider adding error handling or a retry mechanism in case the 'publish_code_suggestions' function fails.

    Suggested change
    is_successful = self.git_provider.publish_code_suggestions(code_suggestions)
    try:
    self.git_provider.publish_code_suggestions(code_suggestions)
    except Exception as e:
    logging.error('Failed to publish code suggestions: ', e)
    # Retry logic here

    continue

    patch = convert_to_hunks_with_lines_numbers(patch, file)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider adding a check for the existence of 'patch' before calling 'convert_to_hunks_with_lines_numbers'. This will prevent potential errors if 'patch' is None.

    Suggested change
    patch = convert_to_hunks_with_lines_numbers(patch, file)
    if patch:
    patch = convert_to_hunks_with_lines_numbers(patch, file)


    if patch:
    patches.append(patch)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider adding a check for the existence of 'patch' before appending it to 'patches'. This will prevent potential errors if 'patch' is None.

    Suggested change
    patches.append(patch)
    if patch:
    patches.append(patch)


    if patch:
    patches.append(patch)
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider adding a check for the existence of 'patch' before appending it to 'patches'. This will prevent potential errors if 'patch' is None.

    Suggested change
    patches.append(patch)
    if patch:
    patches.append(patch)

    Copy link
    Contributor

    qodo-merge-pro bot commented Sep 9, 2024

    PR Code Suggestions ✨

    Latest suggestions up to 9157fa6

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add a check for empty results before returning sorted suggestions

    Consider adding a check for empty data_sorted list before returning it to avoid
    potential issues with empty results.

    pr_agent/tools/pr_code_suggestions.py [240-246]

     except Exception as e:
         if get_settings().config.verbosity_level >= 1:
             logging.info(f"Could not sort suggestions, error: {e}")
         data_sorted = suggestion_list
     
    -return data_sorted
    +return data_sorted if data_sorted else suggestion_list
     
    Suggestion importance[1-10]: 7

    Why: This suggestion addresses a potential issue that could lead to unexpected behavior.

    7
    Performance
    Use a more efficient initialization for the sorted data list

    Consider using a more efficient approach to initialize data_sorted instead of
    creating a list of empty lists.

    pr_agent/tools/pr_code_suggestions.py [211]

    -data_sorted = [[]] * len(suggestion_list)
    +data_sorted = [None] * len(suggestion_list)
     
    Suggestion importance[1-10]: 5

    Why: The suggestion improves performance, but the impact is likely minimal in most cases.

    5
    Best practice
    Improve variable naming in function call for better code readability

    Consider using a more descriptive variable name instead of d in the list
    comprehension. This will improve code readability.

    pr_agent/algo/pr_processing.py [60-61]

     patches_extended, total_tokens, patches_extended_tokens = pr_generate_extended_diff(pr_languages, token_handler,
    -                                                            add_line_numbers_to_hunks)
    +                                                            add_line_numbers_to_hunks=add_line_numbers_to_hunks)
     
    Suggestion importance[1-10]: 3

    Why: The suggestion improves code readability but doesn't address a critical issue or bug.

    3
    Improve variable naming in loop for better code readability

    Consider using a more descriptive variable name instead of s in the loop. This will
    improve code readability.

    pr_agent/tools/pr_code_suggestions.py [231-234]

    -for s in sort_order['Sort Order']:
    -    suggestion_number = s['suggestion number']
    -    importance_order = s['importance order']
    +for suggestion in sort_order['Sort Order']:
    +    suggestion_number = suggestion['suggestion number']
    +    importance_order = suggestion['importance order']
         data_sorted[importance_order - 1] = suggestion_list[suggestion_number - 1]
     
    Suggestion importance[1-10]: 3

    Why: The suggestion improves code readability but doesn't address a critical issue or bug.

    3

    Previous suggestions

    Suggestions up to commit 9157fa6
    CategorySuggestion                                                                                                                                    Score
    Error handling
    Add error handling for potential input format issues

    Consider adding error handling for the case when 'relevant lines' doesn't contain a
    hyphen. This will make the code more robust against potential input errors.

    pr_agent/algo/pr_processing.py [129-130]

    -'relevant_lines_start': int(suggestion['relevant lines'].split('-')[0]),
    -'relevant_lines_end': int(suggestion['relevant lines'].split('-')[1]),
    +relevant_lines = suggestion['relevant lines'].split('-')
    +if len(relevant_lines) != 2:
    +    raise ValueError(f"Invalid 'relevant lines' format: {suggestion['relevant lines']}")
    +'relevant_lines_start': int(relevant_lines[0]),
    +'relevant_lines_end': int(relevant_lines[1]),
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: This suggestion improves code robustness by handling potential input errors, which is important for preventing bugs.

    8
    Performance
    Add a check for empty input in a function to avoid unnecessary processing

    Consider adding a check for empty text before processing in the clip_tokens function
    to avoid unnecessary processing.

    pr_agent/algo/pr_processing.py [325-326]

     if not text:
    -    return text
    +    return ""
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: This suggestion improves performance by avoiding unnecessary processing, which is a meaningful optimization.

    7
    Maintainability
    Use a more descriptive variable name for clarity

    Consider using a more descriptive variable name instead of patches_extended_tokens.
    A name like patch_token_counts might better convey that this list contains the token
    counts for each patch.

    pr_agent/algo/pr_processing.py [96]

    -patches_extended_tokens = []
    +patch_token_counts = []
     
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    Why: The suggestion improves code readability, but it's a minor change that doesn't affect functionality.

    5
    Use a more descriptive variable name in a loop for better code readability

    Consider using a more descriptive variable name instead of d in the loop. A name
    like suggestion would make the code more readable.

    pr_agent/algo/pr_processing.py [124-139]

    -for d in data['Code suggestions']:
    +for suggestion in data['Code suggestions']:
         try:
             code_suggestion = {
    -            'body': d['suggestion content'],
    -            'relevant_file': d['relevant file'],
    -            'relevant_lines_start': int(d['relevant lines'].split('-')[0]),
    -            'relevant_lines_end': int(d['relevant lines'].split('-')[1]),
    -            'existing_code': d['existing code'],
    -            'improved_code': self.dedent_code(d['relevant file'],
    -                                              int(d['relevant lines'].split('-')[0]),
    -                                              d['improved code'])
    +            'body': suggestion['suggestion content'],
    +            'relevant_file': suggestion['relevant file'],
    +            'relevant_lines_start': int(suggestion['relevant lines'].split('-')[0]),
    +            'relevant_lines_end': int(suggestion['relevant lines'].split('-')[1]),
    +            'existing_code': suggestion['existing code'],
    +            'improved_code': self.dedent_code(suggestion['relevant file'],
    +                                              int(suggestion['relevant lines'].split('-')[0]),
    +                                              suggestion['improved code'])
             }
             code_suggestions.append(code_suggestion)
         except Exception:
             if get_settings().config.verbosity_level >= 2:
    -            logging.info(f"Could not parse suggestion: {d}")
    +            logging.info(f"Could not parse suggestion: {suggestion}")
     
    • Apply this suggestion
    Suggestion importance[1-10]: 5

    Why: The suggestion enhances code readability, but it's a minor improvement that doesn't affect functionality.

    5

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @medsho
    Copy link

    medsho commented Sep 18, 2024

    @CodiumAI-Agent /review

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @Celebrum
    Copy link

    @CodiumAI-Agent /review

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @Celebrum
    Copy link

    /docker sumerise the content and creat a dockefiles

    Copy link
    Contributor

    PR-Agent was enabled for this repository, but no eligible user found. Please link your git user with your CodiumAI identity here.

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @Celebrum
    Copy link

    @CodiumAI-Agent /improve

    @cloudchip7
    Copy link

    /describe

    @qodo-merge-pro qodo-merge-pro bot added the documentation Improvements or additions to documentation label Oct 23, 2024
    Copy link
    Contributor

    PR Description updated to latest commit (9157fa6)

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    1 similar comment
    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @cloudchip7
    Copy link

    /review

    Copy link
    Contributor

    PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here.

    Persistent review updated to latest commit 9157fa6

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    1 similar comment
    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @chun-wen
    Copy link

    chun-wen commented Nov 6, 2024

    /review

    Copy link
    Contributor

    qodo-merge-pro bot commented Nov 6, 2024

    Persistent review updated to latest commit 9157fa6

    1 similar comment
    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @winston86zhu
    Copy link

    /review

    Copy link
    Contributor

    Persistent review updated to latest commit 9157fa6

    2 similar comments
    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    @CodiumAI-Agent
    Copy link

    Persistent review updated to latest commit 9157fa6

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    documentation Improvements or additions to documentation enhancement New feature or request Missing SOC2 PR ticket Review effort [1-5]: 4
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.