-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support extracting from uv.lock
#314
base: main
Are you sure you want to change the base?
Conversation
Name: lockPackage.Name, | ||
Version: lockPackage.Version, | ||
Locations: []string{input.Path}, | ||
SourceCode: &extractor.SourceCodeIdentifier{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just leave this struct nil if commit is ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let you make the call on this since I'm not familiar with the rest of the codebase - my understanding is that by always setting this it would mean you don't need to check if its nil (since it's a pointer), which I personally like since the compiler doesn't do type narrowing (i.e. unlike say TypeScript, it won't tell you that you've forgotten to nil
check your pointer).
However, that could very well be the expected value for this field in this situation 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other extractors we currently only set SourceCode if there's some commit to write there: https://github.com/google/osv-scalibr/blob/main/extractor/filesystem/language/python/pdmlock/pdmlock.go#L87
In yet others we never set this struct so at the moment we don't gain anything from writing empty structs here. So for now I'd just avoid writing an empty struct and if we decide we should consistently write it we can make that change in a refactoring PR.
}, | ||
WantInventory: []*extractor.Inventory{ | ||
{ | ||
Name: "emoji", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the inventories in the test just set the name and version and leave everything else empty. We could shorten the tests by introducing a local helper function and replacing these with something like inventory("emoji", "2.14.0")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, though I think that'd be best done in its own PR as I'm guessing such a helper could be used in a lot of the extractor tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper would be specific to this test since there aren't that many other extractors that generate an inventory with this specific kind of empty values. I'd just add it here and we can see what kind of refactoring we can do across tests in a separate PR.
type uvLockPackageSource struct { | ||
Virtual string `toml:"virtual"` | ||
Git string `toml:"git"` | ||
Registry string `toml:"registry"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one doesn't seem used so I'd not add it to the type here (unless we want to write it to e.g. the metadata)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup fine by me, though also happy to include it in the metadata (I'm not really across all this new metadata stuff, so very happy to just be told what we want to try to capture there 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this at the moment in the metadata, we are assuming all these packages are from PyPI, otherwise the Ecosystem() won't be correct.
We could add a string check to filter out non PyPI packages, but that's probably too strict and cause breakages if the url changes slightly.
Let's just remove this field imo.
Name string `toml:"name"` | ||
} | ||
type uvLockFile struct { | ||
Version int `toml:"version"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unused so let's not include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some value in capturing it because it helps highlight that there is a concept of lockfile versions that we might have to one day account for - also, this is also present in the poetrylock
extractor where it is also unused 😅
I'm not actually fussed either way though so happy to remove if still if you'd prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the version here as we'll likely need it when the lockfile changes, and this is the one field that's guaranteed to be here even if the format changes.
I'm wondering if we want to be even more defensive here and only extract lockfile versions that we expect, but I don't think we are doing that for other extractors, so I guess that is moot.
f9d6d49
to
8a98e1e
Compare
8a98e1e
to
cffad09
Compare
cffad09
to
717e630
Compare
717e630
to
a064497
Compare
This adds support for parsing
uv.lock
files which are TOML based and overall seem pretty straightforward - the main gotcha I came across was that they nest their "groups" table (calledoptional-dependencies
) in thepackage
array of tables (I think those are the right TOML terms), meaning it actually ends up getting stuck on the last "package" entry even though it is not package specific.Beyond that, I think the main two areas that could do with expanding are:
such as(I figured this out)git
uv
only tracks the direct dependency that has been grouped, but does not mark transitive dependencies; however because it captures both top-level dependencies (inpackage.metadata
) and package dependencies (in[[package]].dependencies
) I think in theory we should be able to walk the tree ourselves to determine what dependencies belong in what groupResolves google/osv-scanner#1406