Skip to content

Use uv pip compile to recompile requirements files #8276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

toofar
Copy link
Member

@toofar toofar commented Aug 10, 2024

This PR switches the scripts/dev/recompile_requirements.py to use uv pip compile to compile the pinned requirements files, instead of the current method of using pip freeze.

The motivation for this is:

  • we don't have to worry about packages installed to the virtualenv that aren't our dependancies being included in the output (see Update dependencies #8269 (comment) for an example)

  • it speeds up the job a little bit because one of the pip installs is replaced by a uv call (and uv prides itself on being fast)

  • we get lineage annotations in the pinned requirements files saying why requirements are included, eg:

    image

There are a few differences between the two methods of calculating the pinned dependencies that I've accounted for:

  1. pip-compile outputs normalized package names, whereas pip freeze outputs names as specified in the packages
    • to make it easier to compare the output of the two methods I've changed both methods to normalize package names
    • and changed everywhere we compare package names (eg processing the diff, processing markers, looking up changelog URLs) to do so with normalized names
    • I also went and normalized all the package names in the requirements lock files so that we don't have to watch them incrementally get changed as packages get updated - and added that commit to the git blame ignore file
  2. pip-compile outputs setuptools and pip into the pinned requirements file, if they are dependencies of one of the packages we require, whereas pip freeze excludes them by default
    • I reproduced the current behaviour but there are comments on the pip-tools and uv bug trackers saying including setuptools and pip in the output is intentional, so I suggest we remove this behaviour
    • See commits "Exclude setuptools from pip-compile output" and "Add pip and setuptools back to tox requirement file "
    • In the latter commit I've added them into the tox requirements file directly instead of including them indirectly via the compile script

Options:

  1. remove the pip freeze based method of requirements calculation. I left it in as an escape hatch, but since we can get the output of the pip-compile based one to be exactly the same it seems safe to remove it. If we do this we should remove support for the pip_args marker as that won't work with uv (and it's not currently being used)
  2. don't special case setuptools and pip, I recommend doing this, but I'm not across all the intricacies of python packaging
  3. replace more commands in the script with the uv version (eg venv ) to make it faster

Here is a guide to the commits:

image

You'll probably want to review this PR commit-at-a-time or in stages so you don't get all the line noise from normalizing and updating the lock files in the diff.

@toofar toofar mentioned this pull request Aug 10, 2024
@toofar toofar force-pushed the feat/uv_pip_compile branch from 351aa4c to 6b471fa Compare August 10, 2024 09:55
@toofar toofar changed the title User uv pip compile to recompile requirements files Use uv pip compile to recompile requirements files Aug 10, 2024
@toofar toofar marked this pull request as ready for review August 12, 2024 06:57
@toofar toofar requested a review from The-Compiler August 12, 2024 06:57
@toofar toofar force-pushed the feat/uv_pip_compile branch from 6b471fa to 7e79aeb Compare August 12, 2024 07:32
@toofar toofar force-pushed the feat/uv_pip_compile branch 3 times, most recently from a43b9bd to 2f05290 Compare September 7, 2024 05:17
@toofar toofar force-pushed the feat/uv_pip_compile branch from 2f05290 to 7e91b3b Compare September 18, 2024 05:03
Copy link
Member

@The-Compiler The-Compiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, I left a couple of nitpicks. Maybe it'd be good to leave out the autogenerated changes from the PR, so we avoid conflicts and I can just update them after merging?

Can't wait to start using uv after trying it out recently. Probably will take a quick look at installing tox-uv using tox' self-bootstrapping setting as well. Probably the testing part of the recompile script will be much faster with that enabled too!

return "==".join([normalize_pkg(pkg), version])


CHANGELOG_URLS = {normalize_pkg(name): url for name, url in CHANGELOG_URLS.items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about normalizing them in the JSON file instead, and adding something to check_changelog_urls in scripts/dev/misc_checks.py to ensure they stay that way?

Shouldn't block this though, happy to take care of it after merging if you prefer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would prefer removing points of friction for contributing, and maintaining. In this case I think it would be nice if people could put packages names in the json files in whatever case they like. The robots doing comparisons should be the ones that have to care about normalization. For example, PyQt5 and PyJWT are in there at the moment, nothing wrong with writing them like that.

@@ -346,31 +396,71 @@ def get_outfile(name):
return os.path.join(REQ_DIR, 'requirements-{}.txt'.format(name))


def build_requirements(name):
"""Build a requirements file."""
def get_updated_requirements_freeze(filename, venv_dir, comments):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this or should we just throw it out entirely?

Copy link
Member Author

@toofar toofar Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are happy to get rid of it, then I'm happy to get rid of it! I just left it in there to allow for easy comparisons easy switching back if we aren't happy with the pip compile based implementation. It's one of the options I listed in the PR description.

pre=comments['pre'],
pip_args=comments['pip_args'])
with utils.gha_group('Freezing requirements'):
args = ['--all'] if '-tox.txt-raw' in filename else []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the if '-tox.txt-raw' in filename a bit confusing - could that be something like (untested) os.path.basename(filename) == "requirements-tox.txt-raw" instead, which we can then simplify to path.name == "..." once we switch this over to pathlib?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change that. But I would like to hear your opinion on removing the special casing of setuptools and pip entirely, which would remove this block altogether. pip freeze excludes it's own dependancies from it's output to avoid polluting peoples requirements lists, because its starting point is looking at all installed packages, and then we have to add these excluded requirements back when we actually want them. But compile based approaches like pip-compile that have a starting point of a requirements file don't have that problem.

See upstream discussions around including setuptools that I linked in this commit: a4ca2f9

@toofar toofar moved this to Assigned / Other focus in @toofar's focus Mar 2, 2025
@toofar toofar force-pushed the feat/uv_pip_compile branch 2 times, most recently from ae5aba0 to 1bcf279 Compare March 29, 2025 08:56
@toofar
Copy link
Member Author

toofar commented Mar 29, 2025

Thanks for the review! I've addressed most of your comments. I've:

  1. rebased, so this is up to date with the latest requirements files
  2. dropped almost all the automated changes. Still did a manual run at the end though else the new format of the requirement files will come through next time the update job runs, which might look a bit alarming. IMO this makes it harder to review the PR commit by commit and build up trust of the changes because you can't see the output alongside the commits that caused them. I can add back incremental autogenerated changes if that becomes an issue.
  3. removed the old pip freeze based method
  4. removed the code filtering pip and setuptools from the compiled requirements files, happy to add that back if you want

@toofar toofar force-pushed the feat/uv_pip_compile branch from 1bcf279 to 99a57b8 Compare March 31, 2025 08:46
@toofar toofar force-pushed the feat/uv_pip_compile branch 2 times, most recently from 58b1833 to 627361f Compare April 20, 2025 06:37
toofar added 10 commits April 25, 2025 18:36
In #8269 we saw some packages leaking into the pip freeze output that we
don't want in the requirements files (if setuptools isn't supposed to be
in there, why should its dependencies).
I've also greatly missed the comments that pip-compile puts in
requirements.txt files explaining where indirect dependencies come from.
So I took the opportunity to switch our tooling for updating and parsing
new dependencies and their versions to use pip-compile instead of `pip -U
install && pip freeze`.

It turned out to not be a big change because the pip freeze output is
largely compatible with requirements files (we are writing directly to
one after all). So we just need to switch what commands we are running
and triage any compatibility issues.

I chose `uv pip compile` instead of `pip-compile` because I like what uv
is doing (fast, aiming for compatibility, consolidating a confusing
ecosystem into a single tool). But pip-compile/tools should do the same
job if we want to go that route.

The biggest differences are:
* outputs normalized names: this generally results in a larger diff than
  otherwise (I propose we go through an regenerate all the requirements
  files in one go, and maybe add that commit to a blame ignore file) and
  requires our comparison logic to deal with normalized package names
  everywhere
* setuptools and pip not included in tox requirement file - not sure
  what to do about that yet, should they be in the .text-raw file?

TODO:
* remove support for pip_args?
* change markers in raw files to lower case? Ideally don't require, if a human
  can write them in any case and a robot can normalize we should do that. If
  if there are patterns with `._` in them as part of names, how do we handle
  that?
* pull out similar bits of `build_requirements*` methods
  * maybe make it so you can pass `requirements=None` to `init_venv` to
    make it not install stuff, install uv, do the uv invocation, gate
    all that behind a `--method="freeze|compile"` arg?
* add pip and setuptools to tox requirements file?
* basename requirements file names so they don't have
* `script_path/../../` in them in the annotated version
* add tests for the markers (with inputs of differing cases) to make
  sure they all still work
* update changelog check used in CI to normalize names too
The CHANGELOG_URLS variable is imported by the
`./scripts/dev/misc_checks.py changelog-urls` check. Since there is a
bit of churn in package name case in this change (and a while back when
a bunch of packages switched between underscores and hyphens), update
this variable at import time so that that checker will be looking at
normalized names too.
There was a fair bit of duplicate code, so I've pulled out the "take a
list of requirements, give me a new one" out to separate methods. The
stuff around parsing the output can stay common thankfully!

Even if we drop one of the methods this level of abstraction is probably
fine to keep.
Since I was looking at how hard it would be to support using pip-compile
to recompute requirements, I was worried that I would break the markers
we support in the raw requirements files.

This adds two tests:
* disabled_test_markers_real_pip_and_venv
  A test that sets up a local python index and runs the real pip/uv
  binaries. It works (and it was fun to setup a package index factory)
  but has a couple of downsides:
  1. it hits the real pypi index, which is not great in a test. This can
     be prevented by removing the EXTRA bit from the INDEX_URL env vars
     and pre-downloading pip, wheel, setuptools and uv to the test repo
     (and adding index.htmls for them). But because of the next item I'm
     not sure it's worth the effort of keeping this test around
  2. it's slow because of having to download packages from the internet
     (even if we pre-cached them it would still have to download them, I
     guess we could include a zip of fixed/vendored versions, but that
     will probably require maintenance over time) and because it cals
     venv to make new virtual environments, which isn't the quickest
     operation (maybe uv venv is quicker?)
* test_markers_in_comments
  Tests just the comment reading and line manipulation logic. Could be
  made a bit more pure by just calling read_comments() and
  convert_line(), but that wouldn't test that "add" marker.
`pip freeze` writes out package names as specified by the packages, `pip
compile` writes out normalized package names. Sticking to normalized
names in robot written files lets us more easily compare the output of
the two different requirement compiling methods and also mean we can
stop worrying about packages changing between `_` and `-` in dependency
updates.

Script used to do this change was:

    import re
    import glob

    def normalize_pkg(name):
        """Normalize a package name for comparisons.

        From https://packaging.python.org/en/latest/specifications/name-normalization/#name-normalization
        `pip freeze` passes file names though in whatever case they are in in the
        package, pip-compile will normalize them.
        """
        if "/" in name:  # don't change file paths
            return name
        return re.sub(r"[-_.]+", "-", name).lower()

    def normalize_line(line):
        if not line or not line.strip():
            return line
        if "==" not in line.split()[0]:
            return line

        pkg, version = line.split("==", maxsplit=1)
        return "==".join([normalize_pkg(pkg), version])

    for name in ["requirements.txt"] + glob.glob("misc/requirements/requirements*.txt"):
        with open(name) as f:
          before_lines = f.readlines()
        after_lines = [normalize_line(line) for line in before_lines]
        with open(name, mode="w") as f:
          f.writelines(after_lines)
This lets us more easily compare the output of runs of the different
requirements compiling methods.
This is to match the `pip freeze` requirements compilation method. It's
not clear to me if we actually want this behaviour or not.

If seems `pip freeze` will exclude dependencies of itself: https://pip.pypa.io/en/stable/cli/pip_freeze/#cmdoption-all
Even if there are other packages installed that depend on those
dependencies.

`uv pip compile`, and now the original `pip-compile` both have decided
to include setuptools in generated requirements files:

    astral-sh/uv#1353
    jazzband/pip-tools#989 (comment)

So I'm not sure if we have a reason for going against this here or if
they were just being excluded because that's what pip freeze does.

Hopefully we can drop this commit and use the default behaviour in the
future. For now when I'm trying to provide the new backend it's here to
make the diff of generated files more precise.

This message prefix to identify a pip compile comment was taken from
these examples:

    # The following packages were excluded from the output:
    # setuptool

    # The following packages are considered to be unsafe in a requirements file:
    # setuptools==41.4.0        # via protobuf
For the pip freeze backend pip is being passed `--all` when the tox
requirements file is being processed so that pip and setuptools are
included in the requirements file. This was added in 922dca0
for reasons I haven't fully grokked.

This commit adds similar behaviour for the pip compile backend via:
1. don't add the --no-emit-package args for the tox requirements file
2. add pip and setuptools to the tox requirements file

It seems that pip and setuptools aren't even requirements of tox, but
they are being included in the compiled requirements file anyway. Why
aren't the included in the raw requirements file? I don't know, but from
what I can figure it's not going to harm anything to have them in there.
It would have been convenient to have an end2end test to make sure that
the output of the two requirements file compilation methods had the same
results. But I think there is a bit too much stuff going on in
`recompile_requirements` for that atm.

Making the local repo in the temp path and install fake packages there
works fine, although setting up the virtualenv isn't quick. But the
script currently installs pip, setuptools and wheel which means we have
to either 1) hit pypi.org in the tests to get them, or 2) download them at
the start of the test suite and put them in the local repo.

Not sure it's worth the effort to go down this rabbit hole when we
already have a dozen real requirements files to verify the change with.

I'm leaving this in the commit history because it was fun to get the
local repo working!
I find these comments useful to show why packages are included in the
final compiled requirements files.

Required a small change to `recompile_requirements` to ignore these new
comment line (it was saying there was a new requirement with an empty
name that needed a changelog entry).

The addition of the `os.path.realpath()` is to clean up the paths of the
requirements files in the annotations, eg:

     check-manifest==0.49
    -    # via -r scripts/dev/../../misc/requirements/requirements-check-manifest.txt-raw
    +    # via -r misc/requirements/requirements-check-manifest.txt-raw
toofar added 7 commits April 25, 2025 18:38
see 433074c, adf39e9, db83a82 & 78a74a2

Now that we are using pip-compile to build requirements lock files,
instead of pip-freeze, these things shouldn't be showing up in the
results.

Looks like `typeguard` was dropped in a192a3d "initial compile based bump"
I guess that was being pulled in by pip freeze too.
This is for parsing diffs like:

        # via importlib-metadata
    +
    + # The following packages were excluded from the output:
    + # setuptools

Otherwise further parsing breaks because it is looking for a package
name on the line.
Substring matches when you are actually only trying to match one
specific item is a smell, its unclear what's intended and causes the
reader to have to stop and think.
This aligns with the direction of pip-compile and uv, and removes some
code from this script. See:

    astral-sh/uv#1353
    jazzband/pip-tools#989 (comment)

The code being removed worked fine. So if it turns out that installing a
newer setuptools or pip in virtualenvs break stuff, it's fine to add
back.
Changes in this commit should be limited to:
* lines changing order (they are sorted now)
* jaraco.text, jaraco.collections, autocommand, inflect, typeguard are
  gone now (setuptools vendored packages that were being included from
  the venv)
* comments added showing why dependencies are included
@toofar toofar force-pushed the feat/uv_pip_compile branch from 627361f to d701f32 Compare April 25, 2025 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Assigned / Other focus
Development

Successfully merging this pull request may close these issues.

2 participants