-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating new CLI for process_pr_entropy
, referenced in GitHub action entropy-check.yml
#95
Creating new CLI for process_pr_entropy
, referenced in GitHub action entropy-check.yml
#95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I see that the bot triggered and produced an entropy report, very cool to see that!
I left some refactoring comments if you'd like to address them, but I think this PR is good enough to accept, considering that it works.
from typing import Any, Dict | ||
|
||
|
||
def compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function's body seems awfully similar to compute_repo_data()
above; perhaps you could add most_recent_commit
and oldest_commit
parameters to compute_repo_data()
, with it defaulting to the first and last commit if unspecified, and then use that in compute_pr_data()
?
I'm thinking you'd modify compute_repo_data()
like so:
def compute_repo_data(repo_path: str, most_recent_commit: pygit2.Commit=None, oldest_commit: pygit2.Commit=None) -> None:
# ...
# Retrieve the list of commits from the repository
commits = get_commits(repo)
most_recent_commit = commits[0] if most_recent_commit is None else most_recent_commit
first_commit = commits[-1] if oldest_commit is None else oldest_commit
# ...
Then, you can invoke it from compute_pr_data()
like so:
def compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) -> Dict[str, Any]:
try:
# ...
# Get the most recent commits on each branch
pr_commit = repo.get(pr_ref.target)
main_commit = repo.get(main_ref.target)
result = compute_repo_data(repo_path, pr_commit, main_commit)
return {
"pr_branch": pr_branch,
"main_branch": main_branch,
"total_entropy_introduced": result["total_normalized_entropy"],
"number_of_files_changed": result["number_of_files"],
"entropy_per_file": result["file_level_entropy"],
"commits": result["time_range_of_commits"]
}
except Exception as e:
# If processing fails, return an informative error
return {"pr_branch": pr_branch, "main_branch": main_branch, "error": str(e)}
There is a small downside in that you have to parse the repo twice; IMHO that isn't a big deal, but if you want to avoid that, there are a number of ways you could do it. You could, for example, have compute_repo_data()
take a repo
object, then write another small method to deal with taking in a path and constructing repo
object, which would then call compute_repo_data(repo)
. Alternatively, you could pull the entropy calculations out of compute_repo_data()
into another function and call it from both compute_repo_data()
and compute_pr_data()
.
I'd also suggest using consistent names with your variables and dictionary keys. For example, in compute_repo_data()
, you use normalized_total_entropy
as a variable name, then assign it to the dict key total_normalized_entropy
. Another example: in compute_pr_data()
you call the file-level entropy in the results dict entropy_per_file
, but in compute_repo_data
it's file_level_entropy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment @falquaddoomi! I definitely agree that there is some serious overlap between those two functions, and this refactoring is needed. Going to add this to a new issue. I got things running but ran into some small troubles with test cases and other small errors. With such a small deadline before my presentation, I figure it makes more sense to save this for later on.
This reverts commit da44ef2.
|
Thank you for the speedy review @falquaddoomi ! The compute file definitely needs some refactoring changes, which I plan to reference in a new issue, just stuck on a short time frame for my last week! Similarly I wanted to create a function that gives the option for different outputs(ex. json, md, etc.), however I did not get around to that, which then needed me to delete a test case. Sorry, I know this probably isn't the best way of doing this, but just crunched on time. Adding back the test cases and refactoring will be done in the next PR! |
Description
This PR introduces a new function,
process_pr_entropy
, in the fileprocessing_repositories.py
, which can be referenced via the CLI. This function generates a detailed and informative entropy report that will be utilized within our custom GitHub Action.To test this you can run
poetry run almanack process_pr_entropy --repo_path" " --pr_branch" " --main_branch " "
I'm struggling to create a test case for this implementation since now the use case is more complex(i.e. branches) I can no longer use my test repositories.
I appreciate any comments or feedback!
Closes #92
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.