-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LoC(lines of code) tracker #62
Add LoC(lines of code) tracker #62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple comments. I'm general, great to see such comprehensive docs on the PR description and that it was a bite-sized change. Keep it up!
src/almanack/LoC_tracker.py
Outdated
for commit in repo.iter_commits(): | ||
# Retrieve commit statistics | ||
diff_stat = commit.stats.total | ||
lines_added = diff_stat["insertions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you envisioning that you'll do something differently with insertions and deletions in the future? If not, then why not simply append commit.stats.total?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point! I did this initially for print statement testing, however I see how this is unnecessary . This helped me find the "lines" attribute of stats
which works even better. Thank you for this!
tests/test_Loc.py
Outdated
def test_calculate_loc_changes(repository_paths: dict[str, pathlib.Path]): | ||
high_entropy_path = repository_paths["high_entropy"] | ||
low_entropy_path = repository_paths["low_entropy"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally confident, but don't these require assertions to make a valid test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, you're totally right! I have corrected this mistake and created a robust test case.
Thank you for the review @gwaybio !! Your comments really helped me simplify/optimize my code and test cases. I have requested another review but no rush to get to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! I left a few comments and suggestions. Please don't hesitate to reach out with any questions.
src/almanack/LoC_tracker.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider generalizing the name for this module to help invite further work where / when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do! I have been thinking about a new name, and the best I have is code_tracker.py
. Let me know what you think of this! I'm not sure how I feel about it yet.
src/almanack/LoC_tracker.py
Outdated
|
||
total_lines_changed = 0 | ||
|
||
for commit in repo.iter_commits(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to tie in almanack.git_parser.get_commit_logs
here somehow to help introduce flexibility when it comes to certain commit segments or time periods (instead of only in full summary)? Perhaps it could come in the form of an expansion of the data structure found within that function (another dictionary key-value pair, for example). For example, we might expect that earlier in a project's lifespan the amount of change will be higher than later on. These counts from the earlier time periods might inadvertently effect calculations for later time periods under certain circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea Dave! I was able to implement get_commit_logs
by adding in attributes to my function, but wanted to see what you think?
Thank you @d33bs for the review! I have made some minor changes based off of your comments. I added attributes to my |
@d33bs Wanted to give a couple notes about what I have changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I left a few comments for your consideration - interested in what you think about these. Please don't hesitate to let me know if you have any questions.
src/almanack/code_tracker.py
Outdated
|
||
def calculate_loc_changes(repo_path: pathlib.Path, source: str, target: str) -> int: | ||
""" | ||
Finds the total number of code lines changed between the source and target commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be able to parse individual file-level lines of code differences between the source and target commits? If not, consider expanding the function to include this capability (perhaps with an additional parameter which identifies a file within a tree) to address this.
@d33bs, thanks for the review! I have implemented most of your feedback. Just a comment about a question from our meeting earlier, I found that the filename is the filepath relative to the repository root. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, this is looking pretty good to me! I left a couple comments about minor considerations; feel free to merge when you feel good about things.
diff = repo.git.diff(source, target, "--numstat") | ||
return { | ||
filename: abs(int(removed) + int(added)) # Calculate change | ||
for added, removed, filename in (line.split() for line in diff.splitlines()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how you gather data from this (data within a string) would it make sense to expand the test data to ensure nothing unusual happens when you have more than one file in a repository? Mostly I wonder, what would happen if there were two files (perhaps with one in a subdir and the other not)?
Description
This PR create a
LoC_tracker.py
module that retrieves file statistics from a specified source and target commit. From these statistics, we find the total number of insertions and deletions, sum them together, and then return the total.A test was made
test_LoC.py
, that checks that the module works correctly.I appreciate any comments or feedback!
Closes #54
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.