Creating new CLI for `process_pr_entropy`, referenced in GitHub action `entropy-check.yml` #95

willdavidson05 · 2024-08-13T22:37:49Z

Description

This PR introduces a new function, process_pr_entropy, in the file processing_repositories.py, which can be referenced via the CLI. This function generates a detailed and informative entropy report that will be utilized within our custom GitHub Action.

To test this you can run poetry run almanack process_pr_entropy --repo_path" " --pr_branch" " --main_branch " "

I'm struggling to create a test case for this implementation since now the use case is more complex(i.e. branches) I can no longer use my test repositories.

I appreciate any comments or feedback!

Closes #92

What is the nature of your change?

Content additions or updates (adds or updates content)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

I have read the CONTRIBUTING.md guidelines.
My code follows the style guidelines of this project.
I have performed a self-review of my own contributions.
I have commented my content, particularly in hard-to-understand areas.
I have made corresponding changes to related documentation (outside of book content).
My changes generate no new warnings.
New and existing tests pass locally with my changes.
I have added tests that prove my additions are effective or that my feature works.
I have deleted all non-relevant text in this pull request template.

…to CLI-PR

falquaddoomi

Neat! I see that the bot triggered and produced an entropy report, very cool to see that!

I left some refactoring comments if you'd like to address them, but I think this PR is good enough to accept, considering that it works.

falquaddoomi · 2024-08-14T16:34:08Z

src/almanack/processing/compute_data.py

+from typing import Any, Dict
+
+
+def compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) -> Dict[str, Any]:


This function's body seems awfully similar to compute_repo_data() above; perhaps you could add most_recent_commit and oldest_commit parameters to compute_repo_data(), with it defaulting to the first and last commit if unspecified, and then use that in compute_pr_data()?

I'm thinking you'd modify compute_repo_data() like so:

def compute_repo_data(repo_path: str, most_recent_commit: pygit2.Commit=None, oldest_commit: pygit2.Commit=None) -> None: # ... # Retrieve the list of commits from the repository commits = get_commits(repo) most_recent_commit = commits[0] if most_recent_commit is None else most_recent_commit first_commit = commits[-1] if oldest_commit is None else oldest_commit # ...

Then, you can invoke it from compute_pr_data() like so:

def compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) -> Dict[str, Any]: try: # ... # Get the most recent commits on each branch pr_commit = repo.get(pr_ref.target) main_commit = repo.get(main_ref.target) result = compute_repo_data(repo_path, pr_commit, main_commit) return { "pr_branch": pr_branch, "main_branch": main_branch, "total_entropy_introduced": result["total_normalized_entropy"], "number_of_files_changed": result["number_of_files"], "entropy_per_file": result["file_level_entropy"], "commits": result["time_range_of_commits"] } except Exception as e: # If processing fails, return an informative error return {"pr_branch": pr_branch, "main_branch": main_branch, "error": str(e)}

There is a small downside in that you have to parse the repo twice; IMHO that isn't a big deal, but if you want to avoid that, there are a number of ways you could do it. You could, for example, have compute_repo_data() take a repo object, then write another small method to deal with taking in a path and constructing repo object, which would then call compute_repo_data(repo). Alternatively, you could pull the entropy calculations out of compute_repo_data() into another function and call it from both compute_repo_data() and compute_pr_data().

I'd also suggest using consistent names with your variables and dictionary keys. For example, in compute_repo_data(), you use normalized_total_entropy as a variable name, then assign it to the dict key total_normalized_entropy. Another example: in compute_pr_data() you call the file-level entropy in the results dict entropy_per_file, but in compute_repo_data it's file_level_entropy.

Thanks for this comment @falquaddoomi! I definitely agree that there is some serious overlap between those two functions, and this refactoring is needed. Going to add this to a new issue. I got things running but ran into some small troubles with test cases and other small errors. With such a small deadline before my presentation, I figure it makes more sense to save this for later on.

This reverts commit da44ef2.

github-actions · 2024-08-14T21:25:03Z


================================================================================
                      Software Information Entropy Report                       
================================================================================

Repository information:
┌────────────────────────────┬─────────────────────────────────────┐
│ Repository Path            │ /home/runner/work/almanack/almanack │
├────────────────────────────┼─────────────────────────────────────┤
│ Total Normalized Entropy   │ 0.0022                              │
├────────────────────────────┼─────────────────────────────────────┤
│ Number of Commits Analyzed │ 74                                  │
├────────────────────────────┼─────────────────────────────────────┤
│ Files Analyzed             │ 99                                  │
├────────────────────────────┼─────────────────────────────────────┤
│ Time Range of Commits      │ 2024-03-05 to 2024-08-14            │
└────────────────────────────┴─────────────────────────────────────┘

Top 5 files with the most entropy:
┌───────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────┐
│ File Name                                                                                 │   Normalized Entropy │
├───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ poetry.lock                                                                               │               0.063  │
├───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ src/book/seed-bank/pubmed-github-repositories/visualize-pubmed-repo-sofware-entropy.ipynb │               0.0296 │
├───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ package-lock.json                                                                         │               0.0289 │
├───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ src/book/seed-bank/pubmed-github-repositories/gather-pubmed-repos/generate_data.py        │               0.0049 │
├───────────────────────────────────────────────────────────────────────────────────────────┼──────────────────────┤
│ src/book/garden-circle/contributing.md                                                    │               0.0045 │
└───────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────┘


{"repo_path": "/home/runner/work/almanack/almanack", "total_normalized_entropy": 0.0022092836141351982, "number_of_commits": 74, "number_of_files": 99, "time_range_of_commits": ["2024-03-05", "2024-08-14"], "file_level_entropy": {".alexignore": 8.253835527495845e-05, ".github/ISSUE_TEMPLATE/bug.yml": 0.0023349605455284246, ".github/ISSUE_TEMPLATE/config.yml": 8.253835527495845e-05, ".github/ISSUE_TEMPLATE/feature.yml": 0.002104028600972535, ".github/PULL_REQUEST_TEMPLATE.md": 0.001192097486247306, ".github/actions/install-node-env/action.yml": 0.0004861877577156774, ".github/actions/install-python-env/action.yml": 0.0009649693275020158, ".github/release-drafter.yml": 0.0007006579175230309, ".github/workflows/deploy-book.yml": 0.0011640435303269991, ".github/workflows/draft-release.yml": 0.000760353478909068, ".github/workflows/entropy-check.yml": 0.0015224212096459065, ".github/workflows/pre-commit-checks.yml": 0.0007006579175230309, ".github/workflows/publish-pypi.yml": 0.0008780354118013581, ".github/workflows/pytest-tests.yml": 0.0011640435303269991, ".gitignore": 0.0038960387682738305, ".linkcheckerrc.ini": 0.00039092240958147305, ".pre-commit-config.yaml": 0.002436493933654525, ".vale.ini": 0.0004861877577156774, "CITATION.cff": 0.0020262326268331316, "CONTRIBUTING.md": 0.00011971843019093978, "LICENSE": 0.0009071320875416535, "LICENSE.txt": 0.0009071320875416535, "README.md": 0.0005484609658173223, "pa11y.json": 0.00025940473583026404, "package-lock.json": 0.028881584389224866, "package.json": 0.0001909446166991543, "poetry.lock": 0.0630431314476572, "pyproject.toml": 0.0035884199150774234, "src/almanack/__init__.py": 0.000760353478909068, "src/almanack/book.py": 0.001895588192916369, "src/almanack/processing/calculate_entropy.py": 0.0026626727258898795, "src/almanack/processing/compute_data.py": 0.004036672559927364, "src/almanack/processing/git_operations.py": 0.0038489770323008043, "src/almanack/processing/processing_repositories.py": 0.0013586388649582597, "src/almanack/reporting/cli.py": 0.0007899786269540005, "src/almanack/reporting/report.py": 0.0019218189751533945, "src/book/_config.yml": 0.0012757374677957243, "src/book/_static/custom.css": 0.00045469869242032595, "src/book/_toc.yml": 0.0007899786269540005, "src/book/assets/640px-Forgard2-003.gif": 0.0, "src/book/assets/640px-Rundes_Fenster_mit_Gitter.jpeg": 0.0, "src/book/assets/Sundial_2916_HDR.jpeg": 0.0, "src/book/assets/almanack-influencing-software.png": 0.0, "src/book/assets/software-gardening-logo.png": 0.0, "src/book/assets/software-lifecycle.png": 0.0, "src/book/assets/xkcd_dependency.png": 0.0, "src/book/garden-circle/contributing.md": 0.0044998300432500145, "src/book/garden-circle/garden-circle.md": 0.00011971843019093978, "src/book/garden-circle/garden-map.md": 0.0013586388649582597, "src/book/garden-lattice/garden-lattice.md": 0.001247942242637372, "src/book/introduction.md": 0.0022327628564206368, "src/book/references.bib": 0.004292441258266434, "src/book/seed-bank/pubmed-github-repositories/gather-pubmed-repos/generate_data.py": 0.004932515854003226, "src/book/seed-bank/pubmed-github-repositories/gather-pubmed-repos/generate_github_enriched_data.py": 0.003919534779730899, "src/book/seed-bank/pubmed-github-repositories/gather-pubmed-repos/pubmed_github_links.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/gather-pubmed-repos/pubmed_github_links_with_github_data.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/gather-software-information-entropy.ipynb": 0.004199734287477557, "src/book/seed-bank/pubmed-github-repositories/images/pubmed-lines-of-code-and-time.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/pubmed-stars-and-forks.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/pubmed-stars-and-open-issues.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/software-information-entropy-forks.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/software-information-entropy-gh-stars.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/software-information-entropy-open-issues.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/images/software-information-entropy-top-5-langs.png": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_1.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_10.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_11.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_12.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_13.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_14.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_15.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_16.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_17.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_18.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_19.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_2.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_20.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_3.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_4.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_5.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_6.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_7.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_8.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/repository_analysis_results/repository_analysis_results_batch_9.parquet": 0.0, "src/book/seed-bank/pubmed-github-repositories/visualize-pubmed-repo-sofware-entropy.ipynb": 0.029622799158067335, "src/book/seed-bank/seed-bank.md": 0.00011971843019093978, "src/book/software-forest/software-forest.md": 0.0012757374677957243, "src/book/verdant-sundial/verdant-sundial.md": 0.00130345069321151, "styles/config/vocabularies/almanack/accept.txt": 0.0006705735698113297, "tests/conftest.py": 0.003155272598916929, "tests/data/almanack/repo_setup/create_repo.py": 0.002181404246431413, "tests/data/almanack/repo_setup/insert_code.py": 0.0016837773193039043, "tests/data/jupyter-book/sandbox.md": 0.0005484609658173223, "tests/test_almanack.py": 0.001107660493871501, "tests/test_build.py": 0.0019218189751533945, "tests/test_calculate_entropy.py": 0.0016837773193039043, "tests/test_compute_data.py": 0.0010793262195818538, "tests/test_git_operations.py": 0.002811830176828594, "tests/test_processing_repositories.py": 0.0011358987077730264}}

willdavidson05 · 2024-08-14T21:27:46Z

Thank you for the speedy review @falquaddoomi ! The compute file definitely needs some refactoring changes, which I plan to reference in a new issue, just stuck on a short time frame for my last week! Similarly I wanted to create a function that gives the option for different outputs(ex. json, md, etc.), however I did not get around to that, which then needed me to delete a test case. Sorry, I know this probably isn't the best way of doing this, but just crunched on time. Adding back the test cases and refactoring will be done in the next PR!

Will Davidson added 4 commits August 13, 2024 11:50

Adding in compute_pr

d43ef16

Merge branch 'CLI-PR' of https://github.com/willdavidson05/almanac in…

e9b9c09

…to CLI-PR

adding new report style and info

a527278

pre-commit + adding new cli to GH action

27a9e87

willdavidson05 added this to the Software Information Entropy Linter milestone Aug 13, 2024

Will Davidson added 11 commits August 13, 2024 17:27

Removing comments

f8d578f

attempting gh action fix

695a414

attempting gh action fix

1201543

attempting gh action fix

0ed6d3c

attempting gh action fix

3681a58

attempting gh action fix

1cf208a

attempting gh action fix

79b2509

attempting gh action fix

3503f3d

attempting gh action fix

7e2b175

Adding working .yml code from test branch

7c1a8df

return type change

da44ef2

willdavidson05 requested a review from falquaddoomi August 14, 2024 15:55

willdavidson05 marked this pull request as ready for review August 14, 2024 15:55

falquaddoomi approved these changes Aug 14, 2024

View reviewed changes

Will Davidson added 11 commits August 14, 2024 13:54

Refactoring compute functions

9664cc8

Revert "return type change"

24929b5

This reverts commit da44ef2.

Working refactored compute functions

b10ac01

reverting refactored changes

5aef66c

pre-commit

43aa52c

Dict key changes

da8cbeb

test case edit

b16579f

test case edit

c90276b

test case edit

1863e3c

pre-commit

e5c5e48

pre-commit

a7abefe

willdavidson05 mentioned this pull request Aug 14, 2024

Refactor process_repo_for_analysis and fine-tune compute_data.py module #90

Open

2 tasks

willdavidson05 merged commit f4bbef1 into software-gardening:main Aug 14, 2024
11 checks passed

willdavidson05 deleted the CLI-PR branch August 29, 2024 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating new CLI for `process_pr_entropy`, referenced in GitHub action `entropy-check.yml` #95

Creating new CLI for `process_pr_entropy`, referenced in GitHub action `entropy-check.yml` #95

willdavidson05 commented Aug 13, 2024 •

edited

Loading

falquaddoomi left a comment

falquaddoomi Aug 14, 2024 •

edited

Loading

willdavidson05 Aug 14, 2024

github-actions bot commented Aug 14, 2024

willdavidson05 commented Aug 14, 2024

		from typing import Any, Dict


		def compute_pr_data(repo_path: str, pr_branch: str, main_branch: str) -> Dict[str, Any]:

Creating new CLI for process_pr_entropy, referenced in GitHub action entropy-check.yml #95

Creating new CLI for process_pr_entropy, referenced in GitHub action entropy-check.yml #95

Conversation

willdavidson05 commented Aug 13, 2024 • edited Loading

Description

What is the nature of your change?

Checklist

falquaddoomi left a comment

Choose a reason for hiding this comment

falquaddoomi Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

willdavidson05 Aug 14, 2024

Choose a reason for hiding this comment

github-actions bot commented Aug 14, 2024

willdavidson05 commented Aug 14, 2024

Creating new CLI for `process_pr_entropy`, referenced in GitHub action `entropy-check.yml` #95

Creating new CLI for `process_pr_entropy`, referenced in GitHub action `entropy-check.yml` #95

willdavidson05 commented Aug 13, 2024 •

edited

Loading

falquaddoomi Aug 14, 2024 •

edited

Loading