Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync GitHub metadata #10

Closed
wants to merge 28 commits into from
Closed

Conversation

bswck
Copy link
Contributor

@bswck bswck commented Feb 2, 2024

Please see also #9 and jaraco/dotfiles#2.
Closes jaraco/pytest-perf#10, closes #8.

Overview

  • The new sync-github routine now syncs these data with GitHub repos of all downstream projects:

    • description (from the PEP 566 summary field),
    • homepage (a link to RTD),
    • GitHub topics (set by the project list, see next point).

    Description and homepage are not overwritten if they are already present. This is to ensure that manual changes to the repository's metadata are honored. If the project's description or homepage are changed, either that change must be applied manually to the repo or the description or homepage must be removed from the repository's metadata entirely before the next periodic synchronization (which is unlikely, since it is faster to apply the change by hand).

  • GitHub topics can now be set in the project list, after tags, in parentheses, separated by commas. Example:
    https://github.com/bswck/dotfiles/blob/0ea5638263a9a98db3cd39001a684184ac9da8a5/projects.txt#L139
    Topics won't overwrite topics you add through GitHub itself. You can technically call Repo.set_topics() that would replace the repository's topics entirely, but Repo.add_topics() is the default policy. Topics in the project list can have spaces and they will be implicitly converted to dashes, and whole topics lower-cased, as required by the GitHub API.

  • The sync-github routine will try to find an existing RTD documentation, basing on the foregoing assumption that the project RTD slug is identical with the project name (after removing dots and kebabifying).

  • Fetch depth can now be customized in temp_checkout(). sync-github uses the depth of 1.

  • Repo objects now work well with forks: imagine we're in /pypa/setuptools [fork]; github.Repo.detect() will correctly create github.Repo(f"{github.username()}/setuptools") and github.Repo.detect(upstream=True) will create github.Repo("pypa/setuptools"). Note: we build on the assumption that forks belong to the user, not one of their organizations.

Important

Known limitation: if there are more than 30 topics, they are consumed. I didn't implement handling the pagination as I assumed there won't be a necessity of having more than 20 topics. Let me know if I should implement that regardless.

@bswck
Copy link
Contributor Author

bswck commented Feb 2, 2024

I don't see the project being tested with doctest, should I put # doctest: +SKIP in docstrings REPL HTTP requests just in case?

@bswck bswck force-pushed the feat/sync-github-metadata branch 4 times, most recently from 24b1401 to a9b3d7b Compare February 2, 2024 10:40
@bswck bswck force-pushed the feat/sync-github-metadata branch from fa3a726 to a9b3d7b Compare February 2, 2024 10:42
@bswck bswck force-pushed the feat/sync-github-metadata branch 7 times, most recently from 21c2597 to 2daab1b Compare February 2, 2024 12:48
@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

I don't see the project being tested with doctest, should I put # doctest: +SKIP in docstrings REPL HTTP requests just in case?

Seems like I was incorrect! Please give me some time to review and address regressions. Thanks!

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

Ah, I see. That's why I praise typing. repo.get_project_metadata().project is not an instance of Project, but just a string.

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

Hm, I see that I need to change the Repo.detect() logic a bit. I don't think it behaves as it should in forks still...

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

I had an idea to use sys.intern() in the Project allocator to attach known metadata from the project list when creating Project instances from detection @ Repo.detect(). Unfortunately, sys.intern() won't work for string subclasses, so we need a cache dict. I don't think there's a risk of memory leak like it would be if re.compile() had an unlimited cache, so not going to bound the size of the cache dict unless asked.

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

I'm finding problems with the way repository ownership is handled. Maybe we should resolve repo (meaning the owner is contextual) and /owner/repo (when the owner is given) into owner/repo in the Project instances early on, like it now happens in Repo construction?

As for now, I will introduce a simple go-to function for that case and not change the logic at core, since I can see it is leveraged in determining paths for local temp checkouts, no?

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

Observation: PROJECT_PATH is in fact a full repository name.

@bswck
Copy link
Contributor Author

bswck commented Feb 3, 2024

Ready for another run.

@bswck
Copy link
Contributor Author

bswck commented Feb 4, 2024

Because git.resolve() is called regardless of the git_url_substitutions fixture, tests are failing.
Maybe delaying the execution of github.Repo.detect() by making it run lazily under functions' parameters and marking the fixture with autouse=True is a way to go?

@bswck bswck force-pushed the feat/sync-github-metadata branch from 260259b to 48cb955 Compare February 4, 2024 01:18
@bswck
Copy link
Contributor Author

bswck commented Feb 4, 2024

Let's try again with the checks.

jaraco/develop/git.py Outdated Show resolved Hide resolved
@bswck
Copy link
Contributor Author

bswck commented Feb 18, 2024

This PR needs a rework according to jaraco/dotfiles#2.

Copy link
Owner

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! There's a lot going on here and I can't help but feel like maybe this should be broken out into a few different concerns. I have little doubt the different commits apply changes incrementally, but it feels like the changes could be proposed incrementally too, like "implement and use RTD API for detection" and "refactor project detection".

@jaraco
Copy link
Owner

jaraco commented Feb 29, 2024

Because git.resolve() is called regardless of the git_url_substitutions fixture, tests are failing. Maybe delaying the execution of github.Repo.detect() by making it run lazily under functions' parameters and marking the fixture with autouse=True is a way to go?

I don't yet understand why this issue emerges with this change. Oh, maybe it's because of non-isolated tests (some test currently invokes git_url_substitutions, whose effect persists beyond the test scope, then the tests for resolve are run).

Except, no, that doesn't seem to be the case:

 jaraco.develop main @ tox -q -- -p no:cov -k resolve
============================================================== test session starts ===============================================================
platform darwin -- Python 3.12.2, pytest-8.0.2, pluggy-1.4.0
cachedir: .tox/py/.pytest_cache
rootdir: /Users/jaraco/code/jaraco/jaraco.develop
configfile: pytest.ini
plugins: enabler-3.0.0, checkdocs-2.10.1, mypy-0.10.3, subprocess-1.5.0, ruff-0.2.1
collected 88 items / 87 deselected / 1 selected                                                                                                  

jaraco/develop/git.py .                                                                                                                    [100%]

================================================================ warnings summary ================================================================
jaraco/develop/git.py::develop.git.resolve
  /Users/jaraco/code/jaraco/jaraco.develop/jaraco/develop/git.py:67: EncodingWarning: 'encoding' argument not specified.
    lines = subprocess.check_output(cmd, text=True)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================== 1 passed, 87 deselected, 1 warning in 8.49s ===================================================
.pkg: _exit> python '/Users/jaraco/Library/Application Support/pipx/venvs/tox/lib/python3.12/site-packages/pyproject_api/_backend.py' True setuptools.build_meta
  py: OK (9.08=setup[0.44]+cmd[8.64] seconds)
  congratulations :) (9.12 seconds)

Oh. Maybe that's because my dev environment is configured with those substitutions and the issue is that the tests aren't running in isolation of my local dev environment. I recently ran into this issue in another project. Let me see if I can't apply something like that here and then see if the tests fail (in isolation or not).

Edit: Interesting! The warning emitted in that output is in fact indicative of the subprocess call being made without being patched, confirming that the fixture isn't intercepting those calls and the test suite is in fact dependent on my local dev environment.

jaraco/develop/git.py Outdated Show resolved Hide resolved
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
@bswck
Copy link
Contributor Author

bswck commented Mar 26, 2024

I have little doubt the different commits apply changes incrementally, but it feels like the changes could be proposed incrementally too, like "implement and use RTD API for detection" and "refactor project detection".

Sure. I'll go ahead and close this and open PRs in a way that applies changes incrementally.
See you there, thanks for your help with this ticket!

@bswck bswck closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Synchronize project metadata with GitHub Feature Request: Project Description
2 participants