Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Shannon Entropy calculation function and editing test datasets #65

Merged
merged 21 commits into from
Jul 15, 2024

Conversation

willdavidson05
Copy link
Member

@willdavidson05 willdavidson05 commented Jul 2, 2024

Description

This PR introduces an entropy.py module that computes the Shannon Entropy, based on the output from (calculate_loc_changes) in git_parser.py.

Within the module, safeguards were used to ensure that probability values cannot be negative. Additionally, a test suite test_entropy.py has been created to validate against negative outputs.

The test datasets were edited to have multiple files in a repository, this is done in the high_entropy repo. To support repositories with multiple files, modifications were made to (calculate_loc_changes) in git_parser.py. It now accepts an additional parameter, file_names: list[str].

A PR has been created to work on refining the test repositories, as well as the expanding on the testing suites, referenced here #69

I appreciate any comments or feedback!

Closes #62 , #64

What is the nature of your change?

  • Content additions or updates (adds or updates content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own contributions.
  • I have commented my content, particularly in hard-to-understand areas.
  • I have made corresponding changes to related documentation (outside of book content).
  • My changes generate no new warnings.
  • New and existing tests pass locally with my changes.
  • I have added tests that prove my additions are effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@willdavidson05 willdavidson05 added this to the Entropy Linter milestone Jul 2, 2024
Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I left a few comments and suggestions throughout this review. Please don't hesitate to let me know if you have any questions.

src/almanack/entropy.py Outdated Show resolved Hide resolved
src/almanack/entropy.py Outdated Show resolved Hide resolved
src/almanack/entropy.py Outdated Show resolved Hide resolved
src/almanack/entropy.py Outdated Show resolved Hide resolved
src/almanack/entropy.py Show resolved Hide resolved
tests/test_entropy.py Show resolved Hide resolved
entropies = calculate_shannon_entropy(
repo_path, source_commit, target_commit, file_sets[label]
)
for _, entropy in entropies.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comparison for high entropy vs low entropy with an additional assert below (should one be higher than the other)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referenced in test dataset issue

repo_path, source_commit, target_commit, file_sets[label]
)
results[label] = loc_changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding another check to make sure that the file sets are different from one another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referenced in test dataset issue

tests/data/almanack/entropy/add_data.py Outdated Show resolved Hide resolved
src/almanack/entropy.py Outdated Show resolved Hide resolved
@willdavidson05 willdavidson05 deleted the Entropy_Formula branch July 11, 2024 17:57
@willdavidson05 willdavidson05 restored the Entropy_Formula branch July 11, 2024 17:58
@willdavidson05
Copy link
Member Author

Thank you for the review, @d33bs! Following your feedback on the test datasets, along with my own concerns, I've created a new PR (#69) to address many of the issues. Any comments you made regarding testing suites or the setup of repositories were referenced in issue #66. Your other feedback has been implemented, and I have left a question for you above!

@willdavidson05 willdavidson05 requested a review from d33bs July 15, 2024 19:53
Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for addressing all those comments. I left a couple additional thoughts. Looking forward to the changes in #69 . Feel free to merge when you feel things are ready.

CITATION.cff Outdated Show resolved Hide resolved
@willdavidson05 willdavidson05 merged commit 92fbb12 into software-gardening:main Jul 15, 2024
10 checks passed
@willdavidson05 willdavidson05 deleted the Entropy_Formula branch July 25, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants