Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements in Token Clipping, PR Statistics Calculation, and Conditional Review Labeling #848

Merged
merged 2 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pr_agent/algo/utils.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes walkthrough

Enhancement
utils.py (+1/-1)
Improve Text Token Clipping Presentation                                 

pr_agent/algo/utils.py

  • Modified the clip_tokens function to append a newline before
    "...(truncated)" when clipping text tokens.
  • github_provider.py (+1/-19)
    Simplify PR Statistics Calculation                                             

    pr_agent/git_providers/github_provider.py

  • Removed the implementation of calc_pr_statistics function, making it
    return an empty dictionary directly.
  • pr_reviewer.py (+3/-0)
    Conditional Review Labeling Based on Configuration             

    pr_agent/tools/pr_reviewer.py

  • Added a condition to check the publish_output setting before setting
    review labels.
  • Tests
    test_clip_tokens.py (+1/-1)
    Update Test for Improved Token Clipping                                   

    tests/unittest/test_clip_tokens.py

  • Updated the test for clip_tokens to reflect the new line addition
    before "...(truncated)".
  • Original file line number Diff line number Diff line change
    Expand Up @@ -575,7 +575,7 @@ def clip_tokens(text: str, max_tokens: int, add_three_dots=True) -> str:
    num_output_chars = int(chars_per_token * max_tokens)
    clipped_text = text[:num_output_chars]
    if add_three_dots:
    clipped_text += " ...(truncated)"
    clipped_text += "\n...(truncated)"
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider removing the newline character before the truncation message to maintain the consistency of the text format, especially if the text is expected to be in a single line or paragraph format. [enhancement]

    Suggested change
    clipped_text += "\n...(truncated)"
    clipped_text += " ...(truncated)"

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Consider handling cases where the input text ends with a newline character. Appending "\n...(truncated)" directly might result in two newline characters in a row, which could be unintended. A simple check to see if clipped_text ends with a newline before appending could prevent this. [medium]

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Consider removing the newline character before the truncation message to maintain the consistency of the text output format. [enhancement]

    Suggested change
    clipped_text += "\n...(truncated)"
    clipped_text += " ...(truncated)"

    return clipped_text
    except Exception as e:
    get_logger().warning(f"Failed to clip tokens: {e}")
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: When catching a broad exception in the clip_tokens function, consider logging the exception with .exception() instead of .warning() to include the traceback for easier debugging. [best practice]

    Suggested change
    get_logger().warning(f"Failed to clip tokens: {e}")
    get_logger().exception("Failed to clip tokens", exc_info=True)

    Expand Down
    20 changes: 1 addition & 19 deletions pr_agent/git_providers/github_provider.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -745,22 +745,4 @@ def auto_approve(self) -> bool:
    return False

    def calc_pr_statistics(self, pull_request_data: dict):
    try:
    out = {}
    from datetime import datetime
    created_at = pull_request_data['created_at']
    closed_at = pull_request_data['closed_at']
    closed_at_datetime = datetime.strptime(closed_at, "%Y-%m-%dT%H:%M:%SZ")
    created_at_datetime = datetime.strptime(created_at, "%Y-%m-%dT%H:%M:%SZ")
    difference = closed_at_datetime - created_at_datetime
    out['hours'] = difference.total_seconds() / 3600
    out['commits'] = pull_request_data['commits']
    out['comments'] = pull_request_data['comments']
    out['review_comments'] = pull_request_data['review_comments']
    out['changed_files'] = pull_request_data['changed_files']
    out['additions'] = pull_request_data['additions']
    out['deletions'] = pull_request_data['deletions']
    except Exception as e:
    get_logger().exception(f"Failed to calculate PR statistics, error: {e}")
    return {}
    return out
    return {}
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The new implementation of calc_pr_statistics returns an empty dictionary, which removes the functionality to calculate PR statistics. If this is intentional and temporary, consider adding a TODO comment to restore the functionality later. Otherwise, reimplement the method to provide meaningful statistics based on the pull_request_data. [possible issue]

    Suggested change
    return {}
    # TODO: Implement the calculation of PR statistics or restore previous functionality
    return {}

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    While simplifying the calc_pr_statistics function to return an empty dictionary is a straightforward approach, consider implementing a minimal placeholder functionality that can be expanded easily. This could maintain the method's utility for future enhancements without leaving it as a stub. [medium]

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Replacing the detailed PR statistics calculation with an empty dictionary removes functionality. Consider implementing a simplified version of the statistics calculation or handling potential exceptions more gracefully to maintain this feature. [bug]

    Suggested change
    return {}
    try:
    # Simplified statistics calculation or proper exception handling
    return {"basic_stat": "value"}
    except Exception as e:
    get_logger().exception(f"Failed to calculate PR statistics, error: {e}")
    return {}

    3 changes: 3 additions & 0 deletions pr_agent/tools/pr_reviewer.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -357,6 +357,9 @@ def _can_run_incremental_review(self) -> bool:
    return True

    def set_review_labels(self, data):
    if not get_settings().config.publish_output:
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Ensure that the condition if not get_settings().config.publish_output: is thoroughly tested in scenarios where publish_output might be dynamically changed. This addition introduces conditional logic that could affect the flow of setting review labels. [important]

    return
    Comment on lines +360 to +361
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Adding a condition to check publish_output before setting review labels is a good practice for conditional execution. However, consider logging a message when skipping the setting of review labels for clarity and debugging purposes. [enhancement]

    Suggested change
    if not get_settings().config.publish_output:
    return
    if not get_settings().config.publish_output:
    get_logger().info("Skipping setting review labels as publish_output is disabled.")
    return

    Comment on lines +360 to +361
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: Adding a check for publish_output before setting review labels is good for conditional execution. However, consider logging a message when skipping this step for better traceability and debugging. [enhancement]

    Suggested change
    if not get_settings().config.publish_output:
    return
    if not get_settings().config.publish_output:
    get_logger().info("Skipping setting review labels due to publish_output being disabled.")
    return


    if (get_settings().pr_reviewer.enable_review_labels_security or
    get_settings().pr_reviewer.enable_review_labels_effort):
    try:
    Expand Down
    2 changes: 1 addition & 1 deletion tests/unittest/test_clip_tokens.py
    Original file line number Diff line number Diff line change
    Expand Up @@ -15,5 +15,5 @@ def test_clip(self):

    max_tokens = 10
    result = clip_tokens(text, max_tokens)
    expected_results = 'line1\nline2\nline3\nli ...(truncated)'
    expected_results = 'line1\nline2\nline3\nli\n...(truncated)'
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The test expectation has been updated to match the new output of clip_tokens function. Ensure that this change is consistent with the desired output format across all tests and usage scenarios, considering the added newline character before the truncation indicator. [possible issue]

    Suggested change
    expected_results = 'line1\nline2\nline3\nli\n...(truncated)'
    expected_results = 'line1\nline2\nline3\nli ...(truncated)'

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Given the change in clip_tokens to append a newline before the truncation indicator, ensure that all edge cases are covered in the tests, such as clipping at different points in the text and with various lengths of input text. This will help ensure that the new behavior is consistent and reliable. [important]

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggestion: The expected result in the test case should match the output format of the clip_tokens function. If the newline before the truncation message is removed, update this test accordingly. [bug]

    Suggested change
    expected_results = 'line1\nline2\nline3\nli\n...(truncated)'
    expected_results = 'line1\nline2\nline3\nli ...(truncated)'

    assert result == expected_results
    Loading