Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: Verifying package names #891

Open
ianlewis opened this issue Jun 28, 2023 · 22 comments
Open

discussion: Verifying package names #891

ianlewis opened this issue Jun 28, 2023 · 22 comments
Labels
discussion policy Policy / verification of provenance

Comments

@ianlewis
Copy link
Member

For some scenarios it might be necessary to verify a language ecosystem's package name (or other metadata) which requires inspecting the contents of the package artifact(tarball) itself.

For example, npm package provenance references the artifact by package name. The subject of the in-toto attestation is a purl referencing the package name with a sha512 of the package tarball.

If you run npm install package.tgz it will install the package with the name in the package.json metadata located inside the tarball. This could open users up to attacks where users think they are downloading and verifying package A but are in reality installing (and potentially overwriting) package B.

What should a SLSA verifier do (if anything) in this case? If verification is checking the source code repo, is that good enough?

@github-project-automation github-project-automation bot moved this to 🆕 New in Issue triage Jun 28, 2023
@ianlewis ianlewis added the policy Policy / verification of provenance label Jun 28, 2023
@joshuagl
Copy link
Member

I'm not very familiar with npm, so I'm writing down my interpretation to check whether I'm understanding the issue correctly.

  • it's possible to do an npm install of a package tarball downloaded through a separate step
  • an npm package's provenance describes the package (subject) by purl and digest of the package tarball
  • the install name (and location?) of the package is determined by the package.json within the package tarball

Therefore, it's possible to download a package foo.tgz with provenance that is verified as foo.tgz but which ends up being installed as bar (and potentially overwriting any existing bar)?

In this scenario, wouldn't the combination of trusted builder and canonical source repository be sufficient to detect a malicious package? i.e., provenance metadata signed by a trusted builder would not describe the expected source repository for producing the foo.tgz tarball?

@laurentsimon
Copy link
Contributor

I'm not very familiar with npm, so I'm writing down my interpretation to check whether I'm understanding the issue correctly.

  • it's possible to do an npm install of a package tarball downloaded through a separate step
  • an npm package's provenance describes the package (subject) by purl and digest of the package tarball
  • the install name (and location?) of the package is determined by the package.json within the package tarball

Therefore, it's possible to download a package foo.tgz with provenance that is verified as foo.tgz but which ends up being installed as bar (and potentially overwriting any existing bar)?

This is correct. The package name is contained in the provenance (subject) and in the publish attestation (subject and other metadata).
For provenance, the builder is sort of oblivious to the package name. It may take it from the package.json, but ultimately the registry makes a decision to publish it under a package name or not. In npm, iiuc, the package name in the provenance need not match the package name in the package.json - at least that validation is not done today, as far as we know.. but is planned for future improvements.

So during verification (in particular of the publish attestation), should the verifier extract the package.json from the tarball to verify it's consistent with the provenance / publish attestation? Not doing so mean we must rely on the registry to do that verification - which today does not happen afaik.

If someone downloads the tarball and install from tarball, the package name used during installation is effectively the one in the package.json - which may be different from the one in publish / slsa attestations. (Note: for installation by package-name, npm CLI does not make use of the package.json in the tarball - @ianlewis to keep me honest).

Regardless of whether the registry does that verification, independent verifiers may want to verify provenance / publish attestation on their own, so as to keep the registry honest and detect problems; and to improve trust in this supply-chain metadata.

Extracting the package.json means the tarball need sot be available, so a verification-as-a-service would not work well, for example.

Hope this provides some clarification

@laurentsimon
Copy link
Contributor

laurentsimon commented Jun 28, 2023

Well, a timely post https://blog.vlt.sh/blog/the-massive-hole-in-the-npm-ecosystem illustrating exactly what @ianlewis described

@ianlewis
Copy link
Member Author

(Note: for installation by package-name, npm CLI does not make use of the package.json in the tarball - @ianlewis to keep me honest).

I believe that is correct. It's a bit inconsistent between installing by package name where it can get the metadata from the registry at the same time as it downloads the tarball, and installing by tarball on the local machine where all it has is the package.json inside the tarball.

@joshuagl
Copy link
Member

joshuagl commented Jul 5, 2023

I think in this instance it does make sense to verify package names, but I do not think we need to make that a stronger recommendation in general.

In SLSA we trust platforms, verify artifacts and as part of the verification we evaluate provenance against the expectations for a package. As the npm ecosystem doesn't (yet?) have an architecture for forming expectations, the slsa-verifier must determine how best to form expectations based on its view of the npm ecosystem.

Given the issue raised in this thread, it seems prudent that slsa-verifier's expectations for npm packages would include that the package name in the attestation matches the package name in the package's package.json.

@arewm
Copy link
Member

arewm commented Jul 5, 2023

Based on my reading of the npm issue presented above, this arises when there is no package lock present. The inconsistency arises when packages are installed from local caches vs. from npm directly.

If a future SLSA track were to try to address the best-practice of pinning dependencies, would we be able to bypass this issue? There might be a problem in generating the lock file due to the above use cases, but once the lockfile is generated, then that can be depended on by the build system.

@laurentsimon
Copy link
Contributor

laurentsimon commented Jul 5, 2023

Based on my reading of the npm issue presented above, this arises when there is no package lock present.

I don't think that's the problem. You may have a lock file but the name of the package from the API != name in package.json

@arewm
Copy link
Member

arewm commented Jul 5, 2023

What is the actual problem that we are trying to resolve with the mismatch of package names between the API and the package.json? The ability to create an accurate provenance?

I was thinking from the perspective of a specific build platform. If the build platform requires a lock file to be used, then that lock file can be reused for pulling dependencies (any name consistencies should already be "resolved" and represented in the lockfile itself). If a lockfile isn't used and the build environment is potentially "dirty" (i.e. there are local caches that are used resulting in the aforementioned behavior), then you have to be concerned about the mismatch between package names since the two methods of collecting names can result in different results.

By respecting the lockfile, we respect what the package manager itself attempts to do. In the blog post, they claim that there are likely many non-malicious discrepancies in the wild already.

I am not trying to indicate that a mismatch between the package.json and the API isn't an issue. Instead, if a build/build platform have the ability to completely define the dependencies (i.e. greater than SLSA v1.0 Build L3 ... maybe a future L4 around what was once hermetic builds) then the name mismatch issue present above should be mitigated.

I am not convinced that it should be a role of a slsa-verification mechanism to patch potential issues with various languages' package managers. Being aware of this issue and being able to react to it are important when you are including additional dependencies, but as long as you have complete and accurate provenance and SBOM data, verification can happen based on the packages and versions which are actually included in an artifact. If there are malicious side effects of this behavior, then they can be appropriately identified. Detecting the presence of these mismatches can be inputs into a classification of potentially malicious packages which further investigation can then be leveraged to appropriately classify.

@laurentsimon
Copy link
Contributor

What is the actual problem that we are trying to resolve with the mismatch of package names between the API and the package.json? The ability to create an accurate provenance?

"accurate" verification of provenance. If the publish attestation claims it's package A, installing the package should not end up installing a package under name B.

I was thinking from the perspective of a specific build platform. If the build platform requires a lock file to be used, then that lock file can be reused for pulling dependencies (any name consistencies should already be "resolved" and represented in the lockfile itself). If a lockfile isn't used and the build environment is potentially "dirty" (i.e. there are local caches that are used resulting in the aforementioned behavior), then you have to be concerned about the mismatch between package names since the two methods of collecting names can result in different results.

By respecting the lockfile, we respect what the package manager itself attempts to do.

That's the discrepancy. What is the "package manager" (the CLI or the registry or both)? npm registry will install package P under the name A if the user types npm install P but under package B if user types npm download P && npm install P.tar.gz. In effect, the registry's attestation is inconsistent with the package manager's metadata and the provenance. The resolution happens either at build time (no lock file) or it's pre-resolved (there's a lock file) but the inconsistency remains.

I am not convinced that it should be a role of a slsa-verification mechanism to patch potential issues with various languages' package managers.

That was our initial position, but we were not satisfied with the guarantees during verification, hence this issue. Thanks for sharing your opinion

Being aware of this issue and being able to react to it are important when you are including additional dependencies, but as long as you have complete and accurate provenance and SBOM data,

In this scenario, which package name would be reported in the SBOM? The sha512 will uniquely identify it, but the name may be inconsistent. I suppose it should report the package name from the registry... but if SLSA provenance was used, it would report possibly another package name. Same for a lock file: lock file is one resolution by the package manager (taken from the API or the package.json depending on user's command)

@arewm
Copy link
Member

arewm commented Jul 6, 2023

Apologies for the detour. I was coming at this from a perspective of a build platform which is only consuming npm packages and not one that is producing the packages (and therefore whose provenance and verification would be different). I realize that this was an inaccurate interpretation of the discussion.

After re-reading the artifact verification, this mismatch seems like it would fall well in check expectations. Therefore, when verifying the provenance of an npm package, the expectation that the package name is consistent should be checked.

The rationale for this would be to enable proper attribution of sources and clarity of the provenance subject. This would resolve the line of questions above around which name should be included.

The fact that a package's behavior can change when installed from the registry or from a tarball, however, would seem to fall outside of provenance verification. As long as provenance can be appropriately associated, the behavior of a package (i.e. for determining potential maliciousness) is extra-verification and might be more relevant to the build process when consuming the published npm artifact.

I am not convinced that it should be a role of a slsa-verification mechanism to patch potential issues with various languages' package managers

When worded in terms of expectations being formed around the packages, I no longer think that this previous argument holds. We wouldn't be trying to patch issues with the package ecosystems in general. The precedence that this decision would set is that if there are multiple ways that a name can be defined in some package ecosystem then all names should be consistent in order to disambiguate provenance references and associations.

@kpk47
Copy link
Contributor

kpk47 commented Jul 6, 2023

I agree with @arewm's conclusion as to what the precent should be:

if there are multiple ways that a name can be defined in some package ecosystem then all names should be consistent in order to disambiguate provenance references and associations.

I'm still a little confused about what we intend to do for npm.

Do we need to involve the publish attestation in this disambiguation? Doing so treats the npm registry as a definitive mapping from package digest to package name, and I don't know the npm ecosystem well enough to know if that's appropriate. Do people distribute npm packages by tarball without uploading them to the registry? If so, then we need a SLSA verification path for them.

Separately, I don't like the idea of the SLSA verifier having to open the tarball and inspect its contents, which seems to be the proposal here (if I'm following the thread correctly). Doing so seems to be developing expectations on the fly to work around a deficiency in the package's provenance.

The verifier already trusts the build platform to record build metadata faithfully in the provenance, so I don't see why we can't trust the build platform to record the package name in the subject's name (or URI) field. You can detect any changes to package.json because they would change the package's digest. Then the verifier needs to set an expectation on the package name, which they can verify against the provenance. Would this convention work, or am I missing something?

@MarkLodato
Copy link
Member

This seems like a straightforward implementation bug: the user requests to install package A and it actually gets installed under name B. That should be fixed in the npm tooling, not worked around by SLSA.

More specifically, the SLSA verification process forms expectations on the package name. It is assumed that the thing doing the verification knows what the package name is. What is described in this issue is a quirk in npm where it's not straightforward to tell what the name is. But that still seems npm-specific, rather than something with SLSA?

@ianlewis
Copy link
Member Author

ianlewis commented Jul 10, 2023

This seems like a straightforward implementation bug: the user requests to install package A and it actually gets installed under name B. That should be fixed in the npm tooling, not worked around by SLSA.

Part of the issue is that during SLSA verification we have a tarball and not a package name so users may not actually specify their expectation of the package name anywhere.

A workflow might include:

  1. Download the package tarball and provenance from a URL.
  2. Run slsa-verifier verify-npm-package <tarball> --attestations-path attestations.json to verify provenance against tarball.
  3. Run npm install <tarball> to install the package.

Nowhere in there does the user specify an expectation around a package name except perhaps implicitly via the URL it was downloaded from.

More specifically, the SLSA verification process forms expectations on the package name. It is assumed that the thing doing the verification knows what the package name is. What is described in this issue is a quirk in npm where it's not straightforward to tell what the name is. But that still seems npm-specific, rather than something with SLSA?

This doc seems to describe that users should have expectations about the package name but SLSA itself doesn't care (i.e. it only cares that the artifact matches). Is that an accurate interpretation of the meaning? Or do you mean by "It is assumed that the thing doing the verification knows what the package name is" that a SLSA verifier should be checking expectations about the package name?

The verifier already trusts the build platform to record build metadata faithfully in the provenance, so I don't see why we can't trust the build platform to record the package name in the subject's name (or URI) field. You can detect any changes to package.json because they would change the package's digest. Then the verifier needs to set an expectation on the package name, which they can verify against the provenance. Would this convention work, or am I missing something?

I'm not sure verifying just the subject in the provenance matches the user's expectations matters in this case since the name and digest are provided by an untrusted build and the tarball could just install a totally different package anyway. If we trust the builder to set all the package names consistently then that sounds a lot like a SLSA verifier doesn't actually need to check any expectations about the package name at all.


I think the questions we need to answer are:

  1. Does a SLSA verifier need to verify the package name(*1)? or does it just strictly verify the artifact contents against the provenance attestation's subject digest(*2)?
  2. If *1 then what package name should it verify against and how? All of them?
  3. If *2 then the user will definitely need to have some other way of verifying the package name before install other than SLSA verification, like perhaps providing an expected package name to npm (npm install foo --from-tarball mypackage.tar.gz maybe?`) but in this case I guess that's not our problem?

@kpk47
Copy link
Contributor

kpk47 commented Jul 10, 2023

Discussed in July 10 community meeting. Action items:

  • update verifying artifacts and verification model to be more explicit that verification is always on an {artifact, package_name} pair, never just {artifact}, and add a callout for the special case of the name being embedded in the artifact.

Would this address this issue?

@laurentsimon
Copy link
Contributor

laurentsimon commented Jul 10, 2023

That would help, yes. I am not sure verification is always {artifact,package_name} though. When there is a registry (npm, containers, OS distros) I think it works. For standalone binaries that users build and want to run, there is no actual namespace, except if you consider /bin/ls as the namespace. Often times there will be no path to copy the binary to, it may just be kept in a folder. Let's provide examples for the registry "types" above as part of the update.

@MarkLodato
Copy link
Member

I think we should be targeting cases that can be automatically verified, where there exists a well-defined package name. While there are other cases (e.g. you download a binary from a website, or someone hands you a binary and you execute it) where one might want to inspect the provenance and make a decision, I feel like that's not where our effort is best spent.

@joshuagl
Copy link
Member

I think we should be targeting cases that can be automatically verified, where there exists a well-defined package name.

That matches the guiding principles:

Establish trust in a small number of platforms and systems—such as change management, build, and packaging platforms—and then automatically verify the many artifacts produced by those platforms.

and helps those of us working on SLSA focus our efforts.

@laurentsimon
Copy link
Contributor

I was thinking of web browser, IDEs and devices that contain software that auto-update themselves and don't have a package name per se. But even in these case, the resourceUri = package name so I think it still works. I did not realize the scope of SLSA had been reduced to "packaging platforms". Maybe I'm mis-interpreting the term "packaging platform"

@kpk47
Copy link
Contributor

kpk47 commented Jul 17, 2023

Discussed again in community meeting July 17, 2023. We will update the spec to give examples of how to form expectations around an artifact's associated package name.

Separately, we think it's sensible for slsa-verifier to make sure that the package name in the tarball matches the one used by the registry. We don't see a good reason to let the two differ.

Moving issue to backlog.

@kpk47 kpk47 moved this from 🆕 New to 📋 Backlog in Issue triage Jul 17, 2023
@TomHennen
Copy link
Contributor

I don't know how well it would work but I'd always imagined that in cases where there's not a solid package name per-se that the download URL might be usable instead?

@kpk47
Copy link
Contributor

kpk47 commented Jul 18, 2023

@TomHennen What are you proposing we replace with the download URL? The package name?

@TomHennen
Copy link
Contributor

Sorry, I think that was a bit of a non-sequitur and was just a response to "what do we do if there's not a package name". I guess I expect whomever is asking for verification should know the package name and the download url. (I'm assuming verification happens close to when the thing is downloaded, but that could be wrong).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion policy Policy / verification of provenance
Projects
Status: 📋 Backlog
Development

No branches or pull requests

7 participants