Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support extracting from uv.lock #314

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Dec 1, 2024

This adds support for parsing uv.lock files which are TOML based and overall seem pretty straightforward - the main gotcha I came across was that they nest their "groups" table (called optional-dependencies) in the package array of tables (I think those are the right TOML terms), meaning it actually ends up getting stuck on the last "package" entry even though it is not package specific.

Beyond that, I think the main two areas that could do with expanding are:

  • support for more exoitc dependencies such as git (I figured this out)
  • support groups - it looks like uv only tracks the direct dependency that has been grouped, but does not mark transitive dependencies; however because it captures both top-level dependencies (in package.metadata) and package dependencies (in [[package]].dependencies) I think in theory we should be able to walk the tree ourselves to determine what dependencies belong in what group
    • I've not done that for now as I'm not 100% sure that'd be correct in every situation, and it feels like it should be fine to tackle in a dedicated PR (I also figure deps.dev can probably help here, which might be a more accurate or preferred alternative)

Resolves google/osv-scanner#1406

extractor/filesystem/language/python/uvlock/uvlock_test.go Outdated Show resolved Hide resolved
extractor/filesystem/language/python/uvlock/uvlock_test.go Outdated Show resolved Hide resolved
Name: lockPackage.Name,
Version: lockPackage.Version,
Locations: []string{input.Path},
SourceCode: &extractor.SourceCodeIdentifier{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just leave this struct nil if commit is ""

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you make the call on this since I'm not familiar with the rest of the codebase - my understanding is that by always setting this it would mean you don't need to check if its nil (since it's a pointer), which I personally like since the compiler doesn't do type narrowing (i.e. unlike say TypeScript, it won't tell you that you've forgotten to nil check your pointer).

However, that could very well be the expected value for this field in this situation 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other extractors we currently only set SourceCode if there's some commit to write there: https://github.com/google/osv-scalibr/blob/main/extractor/filesystem/language/python/pdmlock/pdmlock.go#L87

In yet others we never set this struct so at the moment we don't gain anything from writing empty structs here. So for now I'd just avoid writing an empty struct and if we decide we should consistently write it we can make that change in a refactoring PR.

},
WantInventory: []*extractor.Inventory{
{
Name: "emoji",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the inventories in the test just set the name and version and leave everything else empty. We could shorten the tests by introducing a local helper function and replacing these with something like inventory("emoji", "2.14.0")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though I think that'd be best done in its own PR as I'm guessing such a helper could be used in a lot of the extractor tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper would be specific to this test since there aren't that many other extractors that generate an inventory with this specific kind of empty values. I'd just add it here and we can see what kind of refactoring we can do across tests in a separate PR.

type uvLockPackageSource struct {
Virtual string `toml:"virtual"`
Git string `toml:"git"`
Registry string `toml:"registry"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't seem used so I'd not add it to the type here (unless we want to write it to e.g. the metadata)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup fine by me, though also happy to include it in the metadata (I'm not really across all this new metadata stuff, so very happy to just be told what we want to try to capture there 😄)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this at the moment in the metadata, we are assuming all these packages are from PyPI, otherwise the Ecosystem() won't be correct.

We could add a string check to filter out non PyPI packages, but that's probably too strict and cause breakages if the url changes slightly.

Let's just remove this field imo.

Name string `toml:"name"`
}
type uvLockFile struct {
Version int `toml:"version"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused so let's not include

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some value in capturing it because it helps highlight that there is a concept of lockfile versions that we might have to one day account for - also, this is also present in the poetrylock extractor where it is also unused 😅

I'm not actually fussed either way though so happy to remove if still if you'd prefer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the version here as we'll likely need it when the lockfile changes, and this is the one field that's guaranteed to be here even if the format changes.

I'm wondering if we want to be even more defensive here and only extract lockfile versions that we expect, but I don't think we are doing that for other extractors, so I guess that is moot.

extractor/filesystem/language/python/uvlock/uvlock.go Outdated Show resolved Hide resolved
@G-Rath G-Rath force-pushed the extractor/support-uv branch from 717e630 to a064497 Compare December 5, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REQUEST: Support uv.lock lockfile used by uv
3 participants