-
Notifications
You must be signed in to change notification settings - Fork 9
perf: skip advisories that are for unrelated packages when loading databases #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a178f72 to
218a96b
Compare
6a21168 to
48bf4b4
Compare
|
due to another spike in npm advisories, this is pretty much a must-have for the local databases to be viable - without it I'm getting OOMs running the tests locally 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a performance optimization to reduce memory usage by filtering out advisories that don't affect any of the packages being scanned. The implementation adds a pkgNames []string parameter throughout the database loading chain, allowing databases to skip loading advisories for unrelated packages.
Key Changes:
- Added package name filtering in
memDB.addVulnerability()to skip advisories that don't affect specified packages - Modified database loading functions (
NewZippedDB,NewDirDB,Load) to accept and pass through a list of package names - Collected all package names from lockfiles in
main.goand passed them to database loaders
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/database/mem-check.go | Added filtering logic to skip advisories that don't affect the specified packages while maintaining total count |
| pkg/database/zip.go | Added mightAffectPackages helper function and updated signatures to accept pkgNames parameter |
| pkg/database/dir.go | Updated load and NewDirDB signatures to accept and pass pkgNames parameter |
| pkg/database/config.go | Updated Load function to accept pkgNames and route it to appropriate database constructors |
| main.go | Added collection of all package names from lockfiles and passed to loadDatabases |
| pkg/database/zip_test.go | Updated all test calls to pass nil for new parameter and added comprehensive test for filtering behavior |
| pkg/database/dir_test.go | Updated all test calls and added test validating package filtering works correctly |
| pkg/database/load_test.go | Updated test calls to pass nil for new parameter |
| pkg/database/testdata/db/nested-2/osv-3.json | Added test data for OSV-3 with package "mine2" to validate filtering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
48bf4b4 to
ce795ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ce795ee to
33f4384
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pkgSet := make(map[string]struct{}) | ||
| for _, p := range files { | ||
| for _, pkg := range p.lockf.Packages { | ||
| pkgSet[pkg.Name] = struct{}{} | ||
| } | ||
| } | ||
| allPackages := make([]string, 0, len(pkgSet)) | ||
| for name := range pkgSet { | ||
| allPackages = append(allPackages, name) | ||
| } |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package names collected here are not normalized, but the filtering logic in mightAffectPackages compares them against affected.Package.NormalizedName(). This creates a mismatch for PIP ecosystem packages where normalization converts names to lowercase and replaces [-_.] with -.
For example, if a lockfile contains "Django" but the OSV database has "django", the filtering will fail to match them because NormalizedName() returns "django" but the collected package name is "Django".
To fix this, the package names should be normalized when collecting them. Consider creating a map that accounts for the ecosystem and normalizes accordingly, or normalize package names before adding them to pkgSet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be fine because the lockfile parsers handle normalization - the only way this could happen is if someone passes an unnormalized name in via CSV, which already won't work
Based on google/osv-scanner#2241, this should greatly reduce peak memory usage of the detector and in theory make it faster since there should be less advisories to check, though in practice I don't expect this to be noticeable as it mainly matters for large databases like Ubuntu which we don't support out of the box.
I've implemented this as part of the
memDbmeaning we retain the total count of advisories loaded externally even though internally the number of advisories retained is likely far lower