Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix base url not being passed through github_provider class correctly #1127

Merged
merged 3 commits into from
Aug 13, 2024

Conversation

squinn1
Copy link
Contributor

@squinn1 squinn1 commented Aug 13, 2024

User description

  • the base_url and base_url_html attributes were not being fully passed through the code
  • this pr fixes that and removes the hard coded github.com references throughout the code

all unit tests passing

image


PR Type

Bug fix, Enhancement


Description

  • Fixed issue where base_url and base_url_html attributes were not being correctly used throughout the GitHubProvider class.
  • Refactored _parse_pr_url and _parse_issue_url methods from static to instance methods to access self.base_url.
  • Updated get_lines_link_original_file method to use self.base_url_html instead of hardcoded 'github.com'.
  • Removed hardcoded references to 'github.com' and 'api.github.com' throughout the code.
  • These changes improve flexibility and allow the class to work with different GitHub instances or Enterprise GitHub setups.

Changes walkthrough 📝

Relevant files
Bug fix
github_provider.py
Refactor GitHub provider to use instance-specific base URLs

pr_agent/git_providers/github_provider.py

  • Replaced static methods with instance methods for _parse_pr_url and
    _parse_issue_url
  • Updated URL parsing to use self.base_url instead of hardcoded
    'api.github.com'
  • Modified get_lines_link_original_file to use self.base_url_html
    instead of hardcoded 'github.com'
  • +5/-7     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🏅 Score: 92
    🧪 No relevant tests
    🔒 No security concerns identified
    🔀 No multiple PR themes
    ⚡ Key issues to review

    Code Consistency
    The _parse_pr_url and _parse_issue_url methods have similar structure and could potentially be refactored into a single method to reduce code duplication.

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Aug 13, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Maintainability
    Refactor the URL parsing methods to reduce code duplication

    The _parse_pr_url and _parse_issue_url methods have very similar logic. Consider
    refactoring the common parts into a separate method to reduce code duplication and
    improve maintainability.

    pr_agent/git_providers/github_provider.py [608-641]

    +def _parse_github_url(self, url: str, expected_type: str) -> Tuple[str, int]:
    +    parsed_url = urlparse(url)
    +    if urlsplit(self.base_url).netloc != parsed_url.netloc:
    +        raise ValueError(f"The provided URL is not from the expected GitHub instance")
    +    
    +    path_parts = parsed_url.path.strip('/').split('/')
    +    if len(path_parts) < 5 or path_parts[3] != expected_type:
    +        raise ValueError(f"The provided URL does not appear to be a GitHub {expected_type} URL")
    +    
    +    repo_name = '/'.join(path_parts[1:3])
    +    try:
    +        number = int(path_parts[4])
    +    except ValueError as e:
    +        raise ValueError(f"Unable to convert {expected_type} number to integer") from e
    +    
    +    return repo_name, number
    +
     def _parse_pr_url(self, pr_url: str) -> Tuple[str, int]:
    -    parsed_url = urlparse(pr_url)
    +    return self._parse_github_url(pr_url, 'pulls')
     
    -    path_parts = parsed_url.path.strip('/').split('/')
    -    if self.base_url in parsed_url.netloc:
    -        if len(path_parts) < 5 or path_parts[3] != 'pulls':
    -            raise ValueError("The provided URL does not appear to be a GitHub PR URL")
    -        repo_name = '/'.join(path_parts[1:3])
    -        try:
    +def _parse_issue_url(self, issue_url: str) -> Tuple[str, int]:
    +    return self._parse_github_url(issue_url, 'issues')
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This suggestion significantly improves code maintainability by reducing duplication and centralizing URL parsing logic, which is a key aspect of the changes in the PR.

    9
    Make error messages in URL parsing methods more generic for better extensibility

    The error messages in the URL parsing methods are specific to GitHub. Consider
    making them more generic or parameterizing them to support potential future
    extensions to other Git providers.

    pr_agent/git_providers/github_provider.py [614]

    -raise ValueError("The provided URL does not appear to be a GitHub PR URL")
    +raise ValueError(f"The provided URL does not appear to be a valid {self.__class__.__name__} PR URL")
     
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: While this suggestion improves maintainability and extensibility, it's less crucial compared to the other suggestions and the main focus of the PR changes.

    6
    Enhancement
    Improve the robustness of the base URL comparison in the URL parsing methods

    Consider using a more robust method to check if the base URL is part of the parsed
    URL. The current implementation might fail if the base URL is a subdomain or has a
    different structure. Use urllib.parse.urlsplit() to compare the netloc more
    accurately.

    pr_agent/git_providers/github_provider.py [612]

    -if self.base_url in parsed_url.netloc:
    +from urllib.parse import urlsplit
    +if urlsplit(self.base_url).netloc == parsed_url.netloc:
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion addresses a potential security issue by improving the accuracy of URL parsing, which is crucial for preventing potential vulnerabilities.

    8
    Best practice
    Improve URL construction in the get_lines_link_original_file method using urljoin

    The get_lines_link_original_file method is using string concatenation to build the
    URL. Consider using urllib.parse.urljoin() to construct the URL more safely and
    handle potential edge cases with slashes.

    pr_agent/git_providers/github_provider.py [830-831]

    -link = (f"{self.base_url_html}/{self.repo}/blob/{self.last_commit_id.sha}/{filepath}/"
    -        f"#L{line_start}-L{line_end}")
    +from urllib.parse import urljoin
    +base_path = f"{self.repo}/blob/{self.last_commit_id.sha}/{filepath}"
    +full_url = urljoin(self.base_url_html, base_path)
    +link = f"{full_url}#L{line_start}-L{line_end}"
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: The suggestion enhances URL construction safety, which is relevant to the changes in the PR, but it's not as critical as the previous suggestions.

    7

    @mrT23
    Copy link
    Collaborator

    mrT23 commented Aug 13, 2024

    thanks, i will review it later today

    @mrT23 mrT23 self-requested a review August 13, 2024 10:19
    @mrT23
    Copy link
    Collaborator

    mrT23 commented Aug 13, 2024

    @squinn1
    Have you validated in your environment that you are getting the correct values ?

    Run the code, catch breakpoints, and test is.
    I think (know) it has problems

    @squinn1
    Copy link
    Contributor Author

    squinn1 commented Aug 13, 2024

    Hi @mrT23,
    yeah I have spotted my error now, I needed to extract the domain correctly, I have added in this fix by parsing in the init. let me know if you would prefer this done within the methods etc

    Thanks

    @mrT23
    Copy link
    Collaborator

    mrT23 commented Aug 13, 2024

    yes, this was the problem, the 'https'

    @mrT23 mrT23 merged commit c97c39d into Codium-ai:main Aug 13, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants