Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nuget CPM Lock file Proposal (#12409) #13288

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions accepted/2024/repo-level-lock-file.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# ***Repository Level Lock File***

- Author: [@CEbbinghaus](https://github.com/CEbbinghaus)
- Issue: [Nuget/Home#12409](https://github.com/NuGet/Home/issues/12409)

## Summary

Applications consisting of multiple solutions (using assembly references between projects) cannot control their transitive dependencies by using Nuget. The current lock file is on a per-project basis rather than an application/repository basis. Given a `Directory.packages.props` file, each project could resolve different versions of transitive dependencies, which a single central lock file would solve by allowing consistency between package versions across multiple solutions.


## Motivation

Monorepositories with multiple solutions utilizing assembly references that deploy all their code together as a single application are unable to use Nuget (with or without CPM) due to transitive version conflicts. This is due to NuGet being able to resolve different versions of transitive dependencies for each individual build, as they occur in isolation from one another. A single central lock file allows all transitive dependencies to have locked versions defined at the root, thereby eliminating third-party dependency management software such as Paket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something worth mentioning here is the concept of transitive pinning: https://learn.microsoft.com/en-us/nuget/consume-packages/central-package-management#transitive-pinning, which allows you to control the transitive dependencies by effectively elevating them to top level.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transitive pinning only helps when you recognise that you've gotten into this situation and want to do something about it. It doesn't prevent you from getting into a conflict in the first place unless you're very eager and go and manually specify every individual transitive dependency and keep track of them when updating packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. It's basically exhausitve transitive pinning where all packages are pinned.
The lock file you're asking for would effectively be that, but potentially in a more automated fashion.



## Explanation

### Functional explanation

<!-- Explain the proposal as if it were already implemented, and you're teaching it to another person. -->
<!-- Introduce new concepts, functional designs with real-life examples, and low-fidelity mockups or pseudocode to show how this proposal would look. -->

The `Directory.packages.props` now uses the`<NuGetLockFilePath>[path]</NuGetLockFilePath>` property within a `<PropertyGroup>` to specify its lock file path. It becomes the source of truth for all builds. Project level lock files are no longer required unless a package version is overridden.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make the lock file path configurable in the project? AFAIK, most package managers leave it unconfigured, and that way, the package manager wouldn't have to first read the configuration file first in order to install packages, only the lock file(which makes it easier to separate into steps).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package manager would need to read the Directory.packages.props file anyways as it is what enables CentralPackageManagement in the first place? (relevant docs)

We could get away with setting a property like <RestorePackagesWithLockFile>true</RestorePackagesWithLockFile> to enable CPM Lock files and implicitly assume the <NuGetLockFilePath> to be packages.lock.json but we can also achieve the same with a single property and a check $(NuGetLockFilePath) = ''. At the end it will be up to the maintainers to give feedback which way they prefer but the lock file path will need to be configurable in any case.

Copy link

@purepani purepani May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package manager would need to read the Directory.packages.props
Yeah, but you can separate that into steps right? I'm imagining a use case of being able to separate it into layers:
You read the configuration file and can generate the lock file in one step, and then you can read the lock file and get the packages in a different step. The reason this could be useful is that you can write a completely separate program to read the lock file if you wanted to(one that assumes that you're using CPM), and you wouldn't have to read the configuration file to figure out where the lock file is.

I personally can't think of a great reason to not have the lock file at the top level of a repo(or at some fixed path relative to the top level), and I can think of more reasons to avoid being able to configure it(Easier to make external tools to consume the lock files and process them separately, avoiding conflicts between, say, a git submodule with it's own lock file that's configured to generate this lock file, etc.).

Given that, as far as I'm aware, most package managers(mostly python and node ones) that I've used have a fixed path for their global lock files, I thought I'd at least bring it up.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need a way to define the "top level" location. This allows for that.

Keep in mind that not every repo-project relationship is 1:1, particularly in a monorepo situation where you might want a small handful of "top levels" rather than only one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be in the same folder as the Directory.Packages.props file, the same way that the "top level" of a nodejs project is where the package.json is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i feel you might have misunderstood. This is exactly the case @purepani the <NuGetLockFilePath> specifies the lock file for the CPM rather than each individual project. This will be located relative to the Directory.Packages.props at the root of the project.

Now that you mention it we will likely need a property like RestorePackagesWithLockFile but specifically to enable CPM lock files but the lockfile will be located at the root NOT under individual projects

Copy link

@purepani purepani May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this property is meant for each individual project rather than just the top level of the projects; that makes way more sense.
The way pnpm solves this is by symlinking the dependencies into each workspace after doing a pnpm install at the top level. If you only wanted some dependencies, you could just install it for that group. In this case, it is just the dependencies and not the lock file, but you could certainly symlink the lock file if you wanted(though the only advantage of that approach would be that you wouldn't have to move to the top level every time you wanted to install a package).

(note: I only mentioned pnpm specifically because I know how it works a bit better. Presumably other node package managers work similarly)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a technical complexity here since a property value doesn't get path resolved, so knowing where it comes would be required.
The simpler approach might be the already mentioned just making the file live next to the directory.packages.props.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For flexibility I prefer the idea of having the lock file path configurable which also tracks consistently with lock files on a project. The better solution is probably setting the default property to be next to the Directory.packages.props and instead offering (if it doesn't already exist) a $(ThisCurrentFilePath) MSBuild property which can be used to set the file path relative to the Directory.packages.props rather than something relative to the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's all doable, we'd just need to be specific to avoid confusion.


```xml
<Project>
<PropertyGroup>
<ManagePackageVersionsCentrally>true</ManagePackageVersionsCentrally>
<NuGetLockFilePath>lockfile.json</NuGetLockFilePath>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love JSON as much as the next guy. but if Directory.Packages.props, NuGet.props and most of the files are XML based, wouldn't it be better to not mix stacks? historically lock files use .json format in various programming platforms, so that might be one defense for it (but not a very compelling one).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing related files are already JSON - packages.lock.json and project.assets.json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally my preference is also xml but the nuget/dotnet team have already settled on a json based lock file and this change is supposed to be as small and simple as possible.

</PropertyGroup>
<ItemGroup>
<PackageVersion Include="Stuff" Version="1.0.0" />
</ItemGroup>
</Project>
```

This lock file is generated during any restore operation during the evaluation of the `Directory.packages.props` file. This allows the central lock file to be generated by any project as all of the dependencies are isolated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest challenges delivering this feature are the preexisting msbuild paradigms and we're not really tackling that here.

We need to dive deeper into the technical complexities to understand the feasibility of simply having a lock file.

How would the scope of the msbuild based restore operation be defined? Currently restore is either project based or solution based. Projects don't know about all other projects.

Here's a practical example:

You have 2 projects and 3 packages.

Project 1:

<PackageReference Include="A" />

Project 2:

<PackageReference Include="B" />

DPP

<PackageVersion Include="A" Version="1.0.0" />
<PackageVersion Include="B" Version="1.0.0" />

A 1.0.0 -> C 1.0.0
B 1.0.0 -> C 2.0.0

Without pinning:

  • Project 1: A 1.0.0 and C 1.0.0
  • Project 2: B 1.0.0 and C 2.0.0

These projects aren't necessarily correlated so the only way for them to know that they're conflicting with C is to either:

  • Consider what each project references each restore
  • Enforce version declaration for every package version encountered.

Today, a project only knows about itself and it's closure, so this would be a departure in how restore/build work, one that'd come with some perf considerations.

The 2nd idea could theoretically be solved with transitive pinning.

You can declare C 2.0.0 in DPP, enable transitive pinning and then the versions would be resolved would be equivalent.
You'd basically need to exhaustively declare all versions.

Within the current toolset, The central lock file as proposed is really a more algorithmically advanced transitive pinning that gets applied across all projects.

Another challenge is package creation for any projects that have a central lock file.

NuGet works really hard to make projects and the package versions of them behave as closely as possible.
What this means is that every PackageReference you have becomes a dependency in a generated package.
To avoid runtime issues, transitive pinning elevates those transitively pinned packages to top level dependencies. The practical example is Project 1 from above, where it could unwittingly take a dependency on C 2.0.0, and then if it gets published as a package with only its direct dependencies lead to runtime failures later.
I understand that's not really a relevant scenario here, but it is a concept that the design needs to account for.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the central lock file as proposed is really a more algorithmically advanced transitive pinning that gets applied across all projects

Isn't that exactly what a lock file does, by specifying the entire dependency graph and the versions used of each dependency?

The practical example is Project 1 from above, where it could unwittingly take a dependency on C 2.0.0, and then if it gets published as a package with only its direct dependencies lead to runtime failures later.

I assume we would need to keep track of which dependencies are direct and which are transitive, and dotnet pack would only look at direct dependencies when building a package. A transitive dependency can always be elevated to a direct dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that exactly what a lock file does, by specifying the entire dependency graph and the versions used of each dependency?

There's a key distinction. All version resolution is project based.
Lock files don't change the resolution. They skip it.

The idea for central lock files makes the version resolution more involved and requires to account for transitive packages declared from all projects, not just the current one.

You're not just asking for a file, you're asking for a completely new resolver. Beyond scoping, the resolver part is the most challenging technical part.

I assume we would need to keep track of which dependencies are direct and which are transitive, and dotnet pack would only look at direct dependencies when building a package. A transitive dependency can always be elevated to a direct dependency.

"dotnet pack would only look at direct dependencies when building a package" -> This doesn't work in all scenarios.
Here's an example:

Project 1 -> Package A 1.0.0 -> Package B 1.0.0
Regular resolution leads to A 1.0.0, B 1.0.0.
When the project is packaged, it only needs A 1.0.0 as a dependency, since at minimum B 1.0.0 will be resolved due to A's dependency.

Central lock file example could bump package B to 2.0.0.
Project 1 code ends up coding against B 2.0.0.
If the Project 1 package ends up declaring A 1.0.0, then when anyone installs that package generated from project 1, they'd get B 1.0.0, because no one knows about 2.0.0 in the Project 1 graph, it's there due to how other projects work. If we don't elevate B 2.0.0 to a direct dependency, then the consumer of Project 1's generated package would end up with runtime failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general food for thought for the concept of centralized lock files.

In general, you probably only need to consolidate the versions being used across your runnable projects. Your services, your tools etc, not across your libraries as well (and never downgrade ofc).
Just because you make your projects reference a specific version, that doesn't mean they'll all work. You could be referencing dependencies that reference different versions of a particular package, so consolidation is naturally going to happen in that case.

There's of course the theory for handling breaking changes in dependencies, but that'd only work if you control all libraries that reference a package with a breaking change, in which case regular CPM would work just fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the idea of promoting any transitive dependency at a version outside the already resolved range into a primary dependency so that any downstream library consumers don't need to think about it


## Drawbacks

<!-- Why should we not do this? -->
It complicates Nuget restoring as it will require a separate dependency resolution step exclusive to the `Directory.packages.props` file.

## Rationale and alternatives

<!-- Why is this the best design compared to other designs? -->
<!-- What other designs have been considered and why weren't they chosen? -->
<!-- What is the impact of not doing this? -->

Some alternatives exist. Namely:
* **Solution-based lock files**
This has the upside of giving us a comprehensive list of all projects that make up the lockfile. As such, the lockfile can be optimally generated, taking every dependency of every project into account, and doesn't require project-specific dependencies. The downside is lock files are defined in any of the multitude of solutions within a monorepository. This leaves the problem of syncing transitive dependencies between multiple solutions within the same repository. It also requires all restores to happen at the solution level, preventing building singular projects simultaneously.

* **Using Microsoft.Build.Traversal**
`Microsoft.Build.Traversal` could create a `.csproj` that references all projects under it. As such, it would have the context of every package as defined by the `ProjectReference`. This would then act as the restore project that, in turn, sorts out all dependencies and generates the most optimal lock file. The downside is that only this project can be used to restore packages since the lock file would only apply to it, making it impossible to restore single projects.


## Prior Art

<!-- What prior art, both good and bad, are related to this proposal? -->
<!-- Do other features exist in other ecosystems, and what experience has their community had? -->
<!-- What lessons from other communities can we learn from? -->
<!-- Are there any resources that are relevant to this proposal? -->

The primary example of prior art for a centralized lock file is [Paket](https://fsprojects.github.io/Paket/) built around this exact concept. It uses a `paket.dependencies` file, similar to the `Directory.packages.props` file from Nuget. It allows for 'groups' to pull two identical dependencies in different versions as each group is resolved independently. It then uses a `paket.lock` to lock the dependencies at the root (right next to the paket.dependencies), which pins all primary and transitive package versions for all projects. A project then includes a package by listing its name in the `paket.references` file in the csproj directory.

We can draw many parallels between Paket & NuGet (with a central lock file) function. For one, we define all the primary packages & versions at the root. Projects define which specific packages they depend on within the relative projects. Where paket would use groups to include dependencies at different versions, NuGet uses the `VersionOverride` property within the csproj. This is the primary difference in how locks are handled for version differences since Paket utilizes groups and writes the locked package versions into the `paket.lock` file. Nuget (with central locking) writes all the pinned versions into the local csproj lock file.

NodeJS has the concept of [Workspaces](https://docs.npmjs.com/cli/v7/using-npm/workspaces#skip-to-content) which act more like Solutions since NodeJS only has the concept of Projects. However a Workspace does allow for all packages to be resolved to the same version, ensuring no version conflicts between different projects within the same workspace.

## Unresolved Questions

<!-- What parts of the proposal do you expect to resolve before this gets accepted? -->
<!-- What parts of the proposal need to be resolved before the proposal is stabilized? -->
<!-- What related issues would you consider out of scope for this proposal but can be addressed in the future? -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add the mechanics of version resolution in the unresolved questions.


* How would the `Directory.packages.props` dependencies be separated out from the project dependencies to generate the lock file?
* How does the publish command handle this as it doesn't have full context of all packages. (Could be solved by the restore always running with --locked-mode)
CEbbinghaus marked this conversation as resolved.
Show resolved Hide resolved
* How will a project deleted from the repository get deleted from the lock file.
* How will Lock files be generated with nested `Directory.packages.props` files?

## Future Possibilities
<!-- What future possibilities can you think of that this proposal would help with? -->
Using the `TargetFrameworks` property within the `Directory.packages.props` file could narrow down the frameworks resolved for the provided packages. This can simplify the lock file and reduce some unnecessary dependency resolution as well as reduce merge conflicts as there is less content changing less often