-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating Shannon Entropy calculation function and editing test datasets #65
Creating Shannon Entropy calculation function and editing test datasets #65
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left a few comments and suggestions throughout this review. Please don't hesitate to let me know if you have any questions.
entropies = calculate_shannon_entropy( | ||
repo_path, source_commit, target_commit, file_sets[label] | ||
) | ||
for _, entropy in entropies.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comparison for high entropy vs low entropy with an additional assert below (should one be higher than the other)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referenced in test dataset issue
repo_path, source_commit, target_commit, file_sets[label] | ||
) | ||
results[label] = loc_changes | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding another check to make sure that the file sets are different from one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referenced in test dataset issue
Thank you for the review, @d33bs! Following your feedback on the test datasets, along with my own concerns, I've created a new PR (#69) to address many of the issues. Any comments you made regarding testing suites or the setup of repositories were referenced in issue #66. Your other feedback has been implemented, and I have left a question for you above! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for addressing all those comments. I left a couple additional thoughts. Looking forward to the changes in #69 . Feel free to merge when you feel things are ready.
Description
This PR introduces an
entropy.py
module that computes the Shannon Entropy, based on the output from (calculate_loc_changes
) ingit_parser.py
.Within the module, safeguards were used to ensure that probability values cannot be negative. Additionally, a test suite
test_entropy.py
has been created to validate against negative outputs.The test datasets were edited to have multiple files in a repository, this is done in the
high_entropy
repo. To support repositories with multiple files, modifications were made to (calculate_loc_changes
) ingit_parser.py
. It now accepts an additional parameter,file_names: list[str]
.A PR has been created to work on refining the test repositories, as well as the expanding on the testing suites, referenced here #69
I appreciate any comments or feedback!
Closes #62 , #64
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.