Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support modifying a lockfile with pex lock update #20364

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

cburroughs
Copy link
Contributor

To support updating part of the dependency graph instead of regenerating from scratch, pex has a lock update sub command which tries to make the minimal changes needed. This allows updating only a single package, or adding new packages without updating all preexisting ones. pex naturally needs the previous lockfile for this, which requires some plumbing adjustments.

ref #15704

@cburroughs
Copy link
Contributor Author

In the python example.

pants --keep-sandboxes=always --print-stacktrace generate-lockfiles --resolve=python-default 
Pantsd has been turned off via Env.
20:44:07.50 [INFO] Preserving local process execution dir /tmp/pants-sandbox-fb5WgH for Generate lockfile for python-default
20:44:14.47 [INFO] Completed: Generate lockfile for python-default
20:44:14.48 [INFO] Wrote lockfile for the resolve `python-default` to python-default.lock
                                                                  
Lockfile diff: python-default.lock [python-default]
                                                                  
==                    Upgraded dependencies                     ==

  attrs                          22.2         -->   23.2.0
  packaging                      23           -->   23.2
  pluggy                         1            -->   1.3.0
$ git checkout python-default.lock
Updated 1 path from the index
pants --keep-sandboxes=always --print-stacktrace generate-lockfiles --resolve=python-default --pex-lock-update --pex-lock-project='packaging'
Pantsd has been turned off via Env.
20:45:07.68 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ef9fk1 for Generate lockfile for python-default
20:45:15.03 [INFO] Completed: Generate lockfile for python-default
20:45:15.04 [INFO] Wrote lockfile for the resolve `python-default` to python-default.lock
                                                                  
Lockfile diff: python-default.lock [python-default]
                                                                  
==                    Upgraded dependencies                     ==

  packaging                      23           -->   23.2

@cburroughs
Copy link
Contributor Author

Feedback Requests:

  • This takes a fairly thin approach. I suspect trying to do anything heavily abstract over pants-->pex-->pip is going to be all sorts of leaky. I made this pex specific for similar reasons, I'm not sure which other lockfile tools support identical update semantics.
  • I'm unsure how much "protection" we should provide around the "new" metadata matching what is in the old lockfile. I have a check in their right now around interpreter constraints because that seemed like a way to shoot yourself in the foot... but double checking the pex docs I think that might actually be a valid -- and useful -- operation. See also Add ability to check if lockfiles are up-to-date with all declared requirements #15723
  • I'm not sure I have a strong enough grasp of how --pin would interact with the pants tracked metadata ^^ and on waffled on including that here.

@jsirois
Copy link
Contributor

jsirois commented Jan 4, 2024

@cburroughs --pin is not for updating versions of existing locked projects (or adding new projects to the lock, etc); it's for updating sources of those project artifacts. Here are 2 nominal uses:

  1. Say I create a lock today against PyPI. Part of the lock includes the requests project, but they missed publishing 3 wheels for whatever reason for the locked version. They backfill those at a later date (say they paid $ for a Mac ARM and want to provide mac arm wheels). If I update with --pin I get no changes in projects or versions, I just pick up 3 extra artifacts for the existing locked requests.
  2. Say I create a lock today against PyPI. My security team gets established later and wants to keep locks undisturbed semantically, but they also want the artifacts coming from an in-house index. If I update with --pin --no-pypi --index ..., I can get all the lock artifact URLs migrated over from the ones that support PyPI to the internal index ones.

See: pex-tool/pex#1887

@cburroughs
Copy link
Contributor Author

Thanks for the background and context. That makes sense.

I think the design question remains: If the metadata/config in pants has changed such that doing one of these updates is possibly invalid, should:

  • Pants try to detect this, at the expense of what may be a leaky extraction with duplicative logic?
  • Leave it up to pex to through an error in the subset of cases where that is detectable?

An answer might be "metadata needs to match exactly for --pin but pants will just get out of the way for --project

@jsirois
Copy link
Contributor

jsirois commented Jan 5, 2024

If I were you I'd punt --pin completely and just focus on --project. Pants lags Pex here by more than a year already and it's much easier to add a feature later than mess it up today with incomplete thinking.

@cburroughs cburroughs changed the title DRAFT: support modifying a lockfile with pex lock update support modifying a lockfile with pex lock update Jan 9, 2024
@cburroughs cburroughs marked this pull request as ready for review January 9, 2024 20:46
# Assuming that any lockfile we are trying to update has a destination
# that is a file and not an embedded read only resource. See
# pex_requirements.py
maybe_lockfile = await Get(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if this is actually the idiomatic way to check if a file exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't know.)

@cburroughs
Copy link
Contributor Author

Heeding advice and deferring --pin for now.

I think _setup_pip_args_and_constraints_file and pex itself handles all of the footguns and it is okay to pass along --project without a lot of pants-side metadata finickiness, but I admit this is the area I'm most worried I'm missing something. In other words, if new interpreter constraints are passed along... pex updates the lockfile to satisfy them which is the right behavior.

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I'm excited for this.

Do you have thoughts on how to test this?

advanced=False,
help=softwrap(
"""
update instead of creating a new lockfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is getting closer to merging, could we expand this a bit to describe more about what "update" means, similar to how project says something.

),
)

project = StrListOption(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this would usually be passed on the command line, rather than set in pants.toml, e.g. pants generate-lockfiles --resolve=... --pex-lock-project=some-library. Is that how you'd see it being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I think cli invocation would be most common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably with some aliases to shorten that...

for project in pex_lock_subsystem.project:
if PipRequirement.parse(project).project_name != project:
raise ValueError(
f"project {project} is not a bare name; specifier, markers must be specified in Pants targets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand what the means, could you give an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what it means, but Pex itself doesn't have this restriction. Why does Pants need it? Every time Pants gets between a user and a tool and meddles is a new chance to infuriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the complication here is where the authoritative list of projects and their constraints lives. In pants today libfoo<2 might be in a requirements.txt file, a poetry style file, in BUILD files, or split between any of those places and a constraints.txt file. If on the cli someone did --project=libfoo>3, but had libfoo<2 in requirements.txt` what should we do?

  • Take the versions in vcs as authoritative? (Current PR)
  • Take the cli as authoritative and leave the vcs file ambiguity?
  • Some permutation of are-you-sure or warnings?
  • Develop concrete syntax trees for editing all of these config files, sort of along the lines of New goal for managing 3rd party dependencies #12880?

I think maintaining consistency between the requirements and lockfile is the "least surprising" behavior, but reorganize there are no obviously right expectations.

As I understand them (and maybe I'm confused; I agree having each tool have slightly different semantics makes it hard to follow), the semantics of pex lock update --project=libfoo is:

  • If libfoo is not present, use the latest version
  • If libfoo is present in the lock's requirements, replace the requirements and use the latest libfoo
  • If libfoo is present in the lock's constraints, install the latest version of libfoo that abides by the constraints.
  • If libfoo is present in the lock's constraints and cli version is outside them, use the new version, issue a warning, and update the constraints.

I think this makes a lot of sense for a tool that is used to do all sorts of spiffy sometimes ad-hoc things and not narrowly focused on doing requirements-->lock a'-la poetry or Pants.

As a not fully formed feature request, if PEX had a way to say "try to update libfoo within the bounds of the requirements and constraints of the lockfile" and be happy to "get out of the way" and use that. (FWIW I think this is what poetry does with a similar subcmd but that's not a tool I've used before.)

A related PEX feature might be a way of permuting the constraints without changing the requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cburroughs maybe a good question, but the best 1st stop is to ask the tool:

$ pex3 lock create "ansicolors<1.1.8" -o lock.json
$ pex3 lock update -p "ansicolors>1.1.8" lock.json
ERROR: The following lock update constraints could not be satisfied:
ansicolors>1.1.8
Encountered 1 error updating lock.json:
1.) cp312-cp312-manylinux_2_35_x86_64: pid 87643 -> /home/jsirois/.pex/venvs/344fa41f0c33753264e54bcca2d4d078936e247e/0de1795ad4486f45ee94cecd983e9905b6b11dc9/bin/python -sE /home/jsirois/.pex/venvs/344fa41f0c33753264e54bcca2d4d078936e247e/0de1795ad4486f45ee94cecd983e9905b6b11dc9/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /home/jsirois/.pex/pip/23.2/pip_cache --log /tmp/pex-pip-log.nod0b_7e/pip.log download --dest /tmp/tmp5vzcj7a3/home.jsirois.bin.pex.venv.bin.python3.12 --constraint /tmp/lock_update.10b2bohk.constraints.txt ansicolors<1.1.8 --retries 5 --timeout 15 exited with 1 and STDERR:
ERROR: Cannot install ansicolors<1.1.8 because these package versions have conflicting dependencies.
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

 The conflict is caused by:
     The user requested ansicolors<1.1.8
     The user requested (constraint) ansicolors>1.1.8

 To fix this you could try to:
 1. loosen the range of package versions you've specified
 2. remove package versions to allow pip attempt to solve the dependency conflict

Now, that's an error from Pip behind the scenes and may be slightly confusing, but.

Copy link
Contributor

@jsirois jsirois Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'd amend my reply above to just say, make sure you totally understand the tool you're integrating with Pants before deciding to do extra stuff that may get between the user and the tool. When I worked on Pants, I preferred not using Pants at all to develop an integration, but complete the work on the command line. Then, once sorted, and the tool behavior was totally sussed, get Pants to run those same Processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care about feature requests so much - I don't use Pants. I care about not getting bugged because Pants made the choice to be more restrictive than it need be.

If the written down requirement is, say, requests>=2,<3 and requests 2.3.4 is locked and a newly reported sec vuln in 2.3.4 can be solved by -p requests<2.3.4 using Pex straight up (i.e.: upgrading does not solve it) but Pants says no, that would be infuriating, because that works today with Pex locks. You can move an existing locked requirement anywhere in its written down requirement range as long as the transitive resolve leads to no conflict.

FWICT the experiments above show that Pex will fail to lock anything not in the written down requirement range (using the pip-2020-resolver); so its 0 skin off your back to just allow -p like Pex does. If the user enters some conflicting -p requests>3, they get yelled at immediately and are forced to go edit the orignal written down requests>=2,<3 controlled by Pants and all is good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, and this PR is not settled yet, but if it were settled on nannying Pex's -p un-necessarily, that would be exactly in one of the class of reasons I don't use Pants. It babysits too much for my taste by half.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for the response and your work on Pex. I find it useful both through Pants as a standalone tool. I don't have a strongly held opinion on when Pants is net helpful by imposing additional checks on what is passed to underlying tools. I understand you lean strongly against doing so! And those reasons make sense to me.

the experiments above show that Pex will fail to lock anything not in the written down requirement range (using the pip-2020-resolver); so its 0 skin off your back to just allow -p like Pex does.

Yes, I'm inclined towards choices that decease the number of if statements in this PR. The check was added when my understanding of the behavior was muddled by the old resolver as you pointed out.

If the user enters some conflicting -p requests>3, they get yelled at immediately and are forced to go edit the original written down requests>=2,<3 controlled by Pants and all is good.

But after being so yelled at and editing the original to new bounds, lock update can't then do that update right? There is no --project-with-new-version-specifier and as you have pointed out --project will maintain the original specifier. I don't think this is a Pants specific thing. A project using the common Python pattern of having a requirements.txt file alongside a lockfile would be in the same boat when making changes to said requirements.txt.

Just like "add a new project without changing other entries" or "remove a project without changing other entries" is reasonable, I think "change the version specifier of one project without changing other entries" is for the same reasons.

I'm happy to take that to another issue if you would prefer. I understand you have no obligation to support that use case.

Copy link
Contributor

@jsirois jsirois Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But after being so yelled at and editing the original to new bounds, lock update can't then do that update right?

Correct - you need to re-gen the lock. The current semantics of pex3 lock update are to update the original lock given all its settings. You can only add new requirements or push original requirements around in their version range as originally written down. You can't change ICs, etc.

Just like "add a new project without changing other entries" or "remove a project without changing other entries" is reasonable, I think "change the version specifier of one project without changing other entries" is for the same reasons.

I agree. If you file an issue for these two - remove and replace (instead of the current adjust) - I can get those knocked out tomorrow.

I noted a few other features that would make sense in response to question on this PR. I might be good to just get those all knocked out if you care to file those as well. My current work stint ends Friday afternoon and I won't pick back up until the 25th.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now available as -R <project name> / --replace-project <project name> (#2335) released in Pex 2.1.160.

Comment on lines +253 to +255
raise ValueError(
f"Empty or non-existent lockfile at {req.lockfile_dest}. Unable to update in place"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially this could be a warning instead? Given the destination doesn't exist, the consequence of generating from scratch inappropriately is just some lost time.

For my work repo, I'd be leaning towards setting:

[pex-lock]
update = true

in pants.toml, because I don't think I'd ever want to be using "update the whole lockfile" behaviour by default, and it'd be a bit annoying to have to disable it/pass --no-pex-lock-update when adding a new resolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following what the warning would allow. pex lock update won't succeed without a lockfile to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Okay, I guess we'd have to push the decision up a level to get the behaviour I was thinking off (i.e. if the lock doesn't exist go through the other/old code path, rather than this new one).

and isinstance(original_loaded_lockfile.metadata, (PythonLockfileMetadataV3))
):
raise ValueError(
f"Pants metadata for lockfile at {req.lockfile_dest} is in old <V3 format. Unable to update in place."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide advice for users about how to resolve this? E.g. "Pants can only update in-place (via PEX_LOCK_UPDATE setting) for , see for how to upgrade"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. I'm not sure in practice when these old formats would be extant in the wild. Would one have had to not upgraded a lockfile for ~years?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good question. I guess if they're (noisily) deprecated elsewhere, we don't need to put much energy into this here.

Comment on lines +275 to +279
elif set(resolve_config.no_binary) != original_loaded_lockfile.metadata.no_binary:
raise ValueError(
f"Resolve Config no_binary {set(resolve_config.no_binary)} does not match {original_loaded_lockfile.metadata.no_binary} in last lockfile, can not update in place"
)
elif set(resolve_config.only_binary) != original_loaded_lockfile.metadata.only_binary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these no_binary/only_binary options, what happens if one is adding a new library, and it needs to be in one of these sets from the start? (I could imagine this applies to a new transitive dependency too, that's not necessarily in the list of requirements the user provides.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The the way Pants handles per package binary with __pip__args.txt and relying on pip's(?") handling of arguments in cli arguments in requirements files is something... I admit I'm surprised works. I think the answer today is "yes if you change binary for single package you need to regenerate the whole thing." I don't know the history of why it was done this way or if per-pkg binary would be reasonable PEX feature.

Copy link
Contributor

@jsirois jsirois Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both per-package binary only and per-package source only would be reasonable PEX features.

You can be assured the reason it was done this way was to avoid having to wade that deep; since Pex itself tries to not get between you and Pip as much as possible, this was possible. In other words, pex -r requirements.txt actually corresponds to pip <subcommand> -r requirements.txt and Pip supports passing certain arguments via requirements file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is available as --only-wheel and --only-build (#2346) and will be released in Pex 2.1.161 shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Pex 2.1.161 with support for --only-wheel / --only-build is now available.

for project in pex_lock_subsystem.project:
if project not in requirements_projects:
raise ValueError(
f"project {project} not found among known requirements. Projects msut be listed in known Pants targets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"project {project} not found among known requirements. Projects msut be listed in known Pants targets"
f"project {project} not found among known requirements. Projects must be listed in known Pants targets"

)
elif set(resolve_config.no_binary) != original_loaded_lockfile.metadata.no_binary:
raise ValueError(
f"Resolve Config no_binary {set(resolve_config.no_binary)} does not match {original_loaded_lockfile.metadata.no_binary} in last lockfile, can not update in place"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know the name of the resolve here, to augment this error message and tell the user which resolve might be the issue?

Suggested change
f"Resolve Config no_binary {set(resolve_config.no_binary)} does not match {original_loaded_lockfile.metadata.no_binary} in last lockfile, can not update in place"
f"In resolve {name}, configured no_binary {set(resolve_config.no_binary)} does not match {original_loaded_lockfile.metadata.no_binary} in last lockfile, can not update in place"

Comment on lines +308 to +309
if len(inferred_new_projects) > 0:
logger.info(f"Inferred new requirements for lockfile update: {inferred_new_projects}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this makes me wonder what happens if a requirement is removed? Does PEX support doing an update that removes requirements from the lockfile? Potentially we should identify them, and either pass them to pex (if supported) or emit an error/warning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PEX support doing an update that removes requirements from the lockfile?

It does not today. That seems a reasonable thing to support though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now available as -d <project name> / --delete-project <project name> (#2335) released in Pex 2.1.160.

# Assuming that any lockfile we are trying to update has a destination
# that is a file and not an embedded read only resource. See
# pex_requirements.py
maybe_lockfile = await Get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't know.)

@huonw
Copy link
Contributor

huonw commented Jan 17, 2024

FYI @cburroughs, I'm going to be on leave for 2 weeks from tomorrow. Please feel free to work through my remaining comments and/or solicit more reviews on #development on slack. Thanks for taking the time to contribute!

Copy link
Member

@cognifloyd cognifloyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very much looking forward to more minimal lockfile updates.
Similar to https://github.com/pantsbuild/pants/pull/20364/files#r1451133900
My preferred workflow would be to:

  • with [pex-lock].update = true in pants.toml (or whatever enables minimal lockfile updates by default)
  • update my requirements in requirements file(s) and/or BUILD file(s)
  • update any constraints files if needed
  • run pants generate-lockfiles --resolve=st2

But I guess that's a slightly different feature. This PR adds a surgical lock update that is applicable when requirements/constraints (in BUILD, requirements, and constraints files) are NOT modified. Is that an accurate description of the feature this PR is adding?

),
)

project = StrListOption(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably with some aliases to shorten that...

header_delimiter = "//"
pip_args_setup = await _setup_pip_args_and_constraints_file(req.resolve_name)

if pex_lock_subsystem.use_pex_update_subcmd():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add and if lockfile exists here so that update without a lockfile will still generate it.
Basically this comment: https://github.com/pantsbuild/pants/pull/20364/files#r1451133900

options_scope = "pex-lock"
help = "Pex lock specific Python lockfile options."

update = BoolOption(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we used a different option here:

[pex-cli]
regenerate = false # default is true

This way, regenerate forces it to use the new lockfile codepath, otherwise it does a minimal lockfile update based on the updated requirements.

Though, I can also see the benefit of pants generate-lockfiles --pex-lock-update --resolve=st2 --pex-lock-project=foobar for surgical updates of the locked requirements without updating any of the requirements or constraints. Hmm. So I guess this PR implements the surgical lock update feature, not a minimal lockfile update based on modified requirements/constraints.

Does the pants metadata in the lockfile include info about the additional ad-hoc constraints used to surgically update the lockfile? Should it?

@x0f0
Copy link

x0f0 commented Aug 12, 2024

This feature would be really great to have. What are people doing right now when they need to adjust or add a dependency? Upgrade all dependencies at once? But what if that introduces a regression?

Poetry would use poetry lock --no-update.

@cburroughs cburroughs marked this pull request as draft January 9, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants