Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoC(lines of code) tracker #62

Merged
merged 37 commits into from
Jun 28, 2024

Conversation

willdavidson05
Copy link
Member

@willdavidson05 willdavidson05 commented Jun 24, 2024

Description

This PR create a LoC_tracker.py module that retrieves file statistics from a specified source and target commit. From these statistics, we find the total number of insertions and deletions, sum them together, and then return the total.

A test was made test_LoC.py, that checks that the module works correctly.

I appreciate any comments or feedback!

Closes #54

What is the nature of your change?

  • Content additions or updates (adds or updates content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own contributions.
  • I have commented my content, particularly in hard-to-understand areas.
  • I have made corresponding changes to related documentation (outside of book content).
  • My changes generate no new warnings.
  • New and existing tests pass locally with my changes.
  • I have added tests that prove my additions are effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments. I'm general, great to see such comprehensive docs on the PR description and that it was a bite-sized change. Keep it up!

for commit in repo.iter_commits():
# Retrieve commit statistics
diff_stat = commit.stats.total
lines_added = diff_stat["insertions"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you envisioning that you'll do something differently with insertions and deletions in the future? If not, then why not simply append commit.stats.total?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point! I did this initially for print statement testing, however I see how this is unnecessary . This helped me find the "lines" attribute of stats which works even better. Thank you for this!

src/almanack/LoC_tracker.py Outdated Show resolved Hide resolved
def test_calculate_loc_changes(repository_paths: dict[str, pathlib.Path]):
high_entropy_path = repository_paths["high_entropy"]
low_entropy_path = repository_paths["low_entropy"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally confident, but don't these require assertions to make a valid test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, you're totally right! I have corrected this mistake and created a robust test case.

@willdavidson05
Copy link
Member Author

Thank you for the review @gwaybio !! Your comments really helped me simplify/optimize my code and test cases. I have requested another review but no rush to get to it.

@willdavidson05 willdavidson05 requested a review from gwaybio June 24, 2024 20:50
Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I left a few comments and suggestions. Please don't hesitate to reach out with any questions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider generalizing the name for this module to help invite further work where / when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do! I have been thinking about a new name, and the best I have is code_tracker.py. Let me know what you think of this! I'm not sure how I feel about it yet.


total_lines_changed = 0

for commit in repo.iter_commits():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to tie in almanack.git_parser.get_commit_logs here somehow to help introduce flexibility when it comes to certain commit segments or time periods (instead of only in full summary)? Perhaps it could come in the form of an expansion of the data structure found within that function (another dictionary key-value pair, for example). For example, we might expect that earlier in a project's lifespan the amount of change will be higher than later on. These counts from the earlier time periods might inadvertently effect calculations for later time periods under certain circumstances.

Copy link
Member Author

@willdavidson05 willdavidson05 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea Dave! I was able to implement get_commit_logs by adding in attributes to my function, but wanted to see what you think?

tests/data/almanack/entropy/add_data.py Outdated Show resolved Hide resolved
tests/data/almanack/entropy/add_entropy.py Outdated Show resolved Hide resolved
src/almanack/LoC_tracker.py Outdated Show resolved Hide resolved
src/almanack/LoC_tracker.py Outdated Show resolved Hide resolved
@willdavidson05
Copy link
Member Author

Thank you @d33bs for the review! I have made some minor changes based off of your comments. I added attributes to my git_commit_logs function so I am able to reference my git_parser.py module through code_tracker.py. I also added hint return types for every function I have done so far. I left a couple questions on your comments as well!

@willdavidson05 willdavidson05 requested a review from d33bs June 25, 2024 17:44
@willdavidson05
Copy link
Member Author

@d33bs Wanted to give a couple notes about what I have changed.
1.) I deleted the get_all_commit_logs function inside of git_parser.py. I no longer see the need for it at this time
2.) I changed the calculate_loc_changes function to accept two new arguments source and target. This allows a file-based approach to our calculation.
3.) I created a function that grabs the two most recent commits(source,target) for the test datasets. I'm not sure if this is the best way to go about it, but would love to hear what you think! I also have it in test_code_tracker.py because I didn't see the utility of having it anywhere else.

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left a few comments for your consideration - interested in what you think about these. Please don't hesitate to let me know if you have any questions.

src/almanack/code_tracker.py Outdated Show resolved Hide resolved
src/almanack/code_tracker.py Outdated Show resolved Hide resolved

def calculate_loc_changes(repo_path: pathlib.Path, source: str, target: str) -> int:
"""
Finds the total number of code lines changed between the source and target commits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be able to parse individual file-level lines of code differences between the source and target commits? If not, consider expanding the function to include this capability (perhaps with an additional parameter which identifies a file within a tree) to address this.

src/almanack/code_tracker.py Outdated Show resolved Hide resolved
tests/test_code_tracker.py Outdated Show resolved Hide resolved
@willdavidson05
Copy link
Member Author

@d33bs, thanks for the review! I have implemented most of your feedback. Just a comment about a question from our meeting earlier, I found that the filename is the filepath relative to the repository root. Thank you!

@willdavidson05 willdavidson05 requested a review from d33bs June 28, 2024 17:07
Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, this is looking pretty good to me! I left a couple comments about minor considerations; feel free to merge when you feel good about things.

src/almanack/git_parser.py Outdated Show resolved Hide resolved
diff = repo.git.diff(source, target, "--numstat")
return {
filename: abs(int(removed) + int(added)) # Calculate change
for added, removed, filename in (line.split() for line in diff.splitlines())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how you gather data from this (data within a string) would it make sense to expand the test data to ensure nothing unusual happens when you have more than one file in a repository? Mostly I wonder, what would happen if there were two files (perhaps with one in a subdir and the other not)?

@willdavidson05 willdavidson05 merged commit 03d77e1 into software-gardening:main Jun 28, 2024
10 checks passed
@willdavidson05 willdavidson05 deleted the LoC-tacker branch June 28, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a file-based function that tracks lines of code added or removed.
3 participants