Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Fully validate vulnerability.db by hash #1976

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joshuai96
Copy link

@joshuai96 joshuai96 commented Jul 3, 2024

Fixes: #1975
Fixes: #1648

Setting ValidateByHashOnStart to true by default, enables the database curator to execute ValidateByHash in validateIntegrity and report an invalid database.

$ cd $HOME/.cache/grype/db/5/
$ mv vulnerability.db vulnerability.db.bak

# with an invalid db
$ tail -n 50 vulnerability.db.bak > vulnerability.db
$ grype db status
Location:  /home/joshua/.cache/grype/db/5
Built:     2024-07-03 01:30:39 +0000 UTC
Schema:    5
Checksum:  sha256:a7920011e5de7de5e0acafb614f94c38dc4353d34784de07cf16fa765c07904f
Status:    invalid
bad db checksum (/home/joshua/.cache/grype/db/5/vulnerability.db): "sha256:a7920011e5de7de5e0acafb614f94c38dc4353d34784de07cf16fa765c07904f" vs "sha256:d3b1df971b1c7753cb658a2c0cb1046b781bc97f0727682bf6eacada5e92e5c8"

# with missing db
$ rm vulnerability.db
$ grype db status
Location:  /home/joshua/.cache/grype/db/5
Built:     2024-07-03 01:30:39 +0000 UTC
Schema:    5
Checksum:  sha256:a7920011e5de7de5e0acafb614f94c38dc4353d34784de07cf16fa765c07904f
Status:    invalid
failed to open file '/home/joshua/.cache/grype/db/5/vulnerability.db': open /home/joshua/.cache/grype/db/5/vulnerability.db: no such file or directory

Scans with grype now give a better error too:

# with invalid db
$ grype alpine:latest
 ✔ Vulnerability DB                [no update available]  
 ✔ Parsed image                              sha256:a606584aa9aa875552092ec9e1d62cb98d486f51f389609914039aabd9414687
 ✔ Cataloged contents                               dabf91b69c191a1a0a1628fd6bdd029c0c4018041c7f052870bb13c5a222ae76
   ├── ✔ Packages                        [14 packages]  
   ├── ✔ File digests                    [77 files]  
   ├── ✔ File metadata                   [77 locations]  
   └── ✔ Executables                     [17 executables]  
failed to load vulnerability db: vulnerability database is invalid (run db update to correct): bad db checksum (/home/joshua/.cache/grype/db/5/vulnerability.db): "sha256:a7920011e5de7de5e0acafb614f94c38dc4353d34784de07cf16fa765c07904f" vs "sha256:d3b1df971b1c7753cb658a2c0cb1046b781bc97f0727682bf6eacada5e92e5c8"

# with missing db
$ grype alpine:latest
 ✔ Vulnerability DB                [no update available]  
 ✔ Parsed image                              sha256:a606584aa9aa875552092ec9e1d62cb98d486f51f389609914039aabd9414687
 ✔ Cataloged contents                               dabf91b69c191a1a0a1628fd6bdd029c0c4018041c7f052870bb13c5a222ae76
   ├── ✔ Packages                        [14 packages]  
   ├── ✔ File digests                    [77 files]  
   ├── ✔ File metadata                   [77 locations]  
   └── ✔ Executables                     [17 executables]  
failed to load vulnerability db: vulnerability database is invalid (run db update to correct): failed to open file '/home/joshua/.cache/grype/db/5/vulnerability.db': open /home/joshua/.cache/grype/db/5/vulnerability.db: no such file or directory

This enables the db curator to fully validate the db file by hash

Signed-off-by: Joshua Irmer <joshua.irmer@gmail.com>
@wagoodman wagoodman self-requested a review July 24, 2024 20:16
@wagoodman wagoodman self-assigned this Jul 24, 2024
@wagoodman
Copy link
Contributor

It was intentional to only validate the DB on download and not on start (for performance reasons) since validating every time could lead to startup performance issues. That being said, I can also see the point where getting grype results with a known bad database is also not ideal, since we can no longer trust those results.

I think ultimately it's the right call to turn on validation by default, however, I feel that it should coincide with attempting to improve performance. Right now we're working on grype DB v6, which is very different than DB v1-5. Part of this work may also switch the distributing sha256 digests to xxh64, which is much faster.

One question I have: I have an anecdote from the past that startup was taking an extra second due to hashing the DB -- what extra time at startup are you seeing with this change @joshuai96 ?

@joshuai96
Copy link
Author

In the setup I described in #1975 (comment) I've seen grype db update stating Vulnerability database updated to latest version! and then having some files missing for example metadata.json, what leads to failing scan runs. So I think validation on download seems a little bit buggy, too. I am positive this is some kind of freak accident based on network hiccups, as our GitLab CI runners are known to have some networking issue.

You mean the startup before a scan runs, @wagoodman ? I didn't notice any inconvenient increase in time, but I can run some tests, to have some numbers backing that up. IMHO a second or two, has no impact on a scan that usually takes a few seconds anyway, but I'm not as sensitive to time, as this scans run over night in a CI/CD setup.

@joshuai96
Copy link
Author

Hello @wagoodman,
sorry for the delay.

I've run some simple timing tests for both versions.

Setup:

  • Using times real output in combination with a small image like apline (using alpine:3.20.2 with hash 0a4eaa0eecf5f8c050e5bba433f58c052be7587ee8af3e8b3910ef9ab5fbe9f5) to reduce scan time and see db hash time.
  • Updating the vulnerability database and pulling the image beforehand.
  • Small sample size, because of time constraints.
  • Compiled with go build -ldflags "-s -w" -o bin/grype cmd/grype/main.go
  • Running on an AMD Ryzen 7 4700U

Test:

Test No. Unmodified Modified
1 3,632 3,628
2 3,730 3,591
3 3,653 3,640
4 3,749 3,661
5 3,526 3,621
6 3,596 3,577
7 3,707 3,740
8 3,568 3,613
9 3,675 3,662
10 3,585 3,600
11 3,724 3,653
12 3,630 3,588
13 3,582 3,680
14 3,647 3,626
15 3,649 3,673
16 3,657 3,607
17 3,695 3,567
18 3,640 3,608
19 3,902 3,616
20 3,686 3,700
Average 3,662 3,633
Standard deviation 0,079 0,043

Result:

Looking at these numbers it may indicate that checking the database on startup decreased scanning time and made it more stable. But the sample size and incrrease are so small, that this can not be stated with confidence.

@kzantow
Copy link
Contributor

kzantow commented Aug 12, 2024

FWIW, enabling this option in the latest grype seems to increase the scan time of alpine:latest by ~7 seconds on my x86 mac.

@joshuai96
Copy link
Author

@wagoodman any idea, why I don't see a slowdown of scans?

@wagoodman wagoodman added this to the DB v6 milestone Sep 19, 2024
@wagoodman
Copy link
Contributor

I think we'll include this with the grype DB v6 work (as mentioned above) -- so we won't make this change in v5. Keep this PR open and we can rebase on that work and pull it in 👍

@wagoodman wagoodman added the blocked Progress is being stopped by something label Sep 19, 2024
@wagoodman wagoodman removed their assignment Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Progress is being stopped by something
Projects
Status: Ready
3 participants