[KF 1.0 Compliance] Vulnerability Scanning #3857

Bobgy · 2020-05-27T06:25:28Z

Part of #2884

Docker images must be scanned for vulnerabilities and known vulnerabilities published

@jlewi Do you know how other images share vulnerability issues?

I did a quick investigation, gcr.io provides vulnerability scanning, but the result is not visible to external visitors even if the image is public.

We can export the generated yaml report with commands like

gcloud beta container images describe --show-package-vulnerability gcr.io/ml-pipeline/api-server:1.0.0-test-5

Documented in https://cloud.google.com/container-registry/docs/get-image-vulnerabilities

Do you think that's good enough?

The text was updated successfully, but these errors were encountered:

Bobgy · 2020-05-29T00:25:26Z

@jbottum Do you have any ideas about this?

jlewi · 2020-05-29T13:36:07Z

kubeflow/kubeflow#3907 is tracking how we publish a list of vulnerabilities in our images.

A related issue is minimizing vulnerabilities e.g. by using distroless images.
There is documentation at
https://github.com/krishnadurai/community/blob/b1669588d785455a1e4e4cab456e03c08a05af7c/guidelines/creating_dockerfiles.md

Note the use of distroless images is recommended not a requirement.

kubeflow/kubeflow#4590 is a related issue about promoting the use of distroless in Kubeflow to minimize vulnerabilities.

To satisfy the vulnerability scanning requirement I think you just need to turn on vulnerability scanning in whatever GCR registry you are hosting your images in.

You might want to repurpose this issue or file a new one for reducing vulnerabilities if relevant.

Bobgy · 2020-06-01T04:13:34Z

@jlewi As reported in the kubeflow/kubeflow#3907, if we enable gcr vulnerability scanning, they are not visible for external viewers.
So in addition to that we'd still need to dump a yaml report for each KFP release, sounds reasonable?

Bobgy · 2020-06-01T04:21:46Z

Thanks for the relevant link to reducing vulnerability. I'll create a separate issue about it.

stale · 2020-08-30T11:43:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Bobgy · 2020-08-31T15:31:56Z

/lifecycle frozen

Bobgy · 2020-10-29T07:38:52Z

An example of fixing some vulnerability issues: #4531

some related readings:

My take aways:

it's impossible fixing all vulnerabilities and some (or probably many) can actually be false positives, so we need to
constantly update base images to get upstream fixes
if there hasn't been a fix in upstream, we need to review the vulnerability and see if it really matters to us, then act accordingly

going forward, we should:

utilize distroless images as much as possible (because they have near 0 vulnerability from the base image)
when not feasible, constantly update the base image to get vulnerability fixes
act in ad-hoc for remaining high/critical vulnerability that people care about

Bobgy · 2020-10-29T07:45:08Z

AIs:

formalize a vulnerability management process
understand current image vulnerability status and triage urgent fixes
build the needed vulnerability scanning automation that flag High and Critical issues (p1) before release and send vulnerability reports (p2) for each KFP release.

Bobgy · 2021-01-31T02:37:55Z

Requests to reduce vulnerabilities come more often than before, so I'm taking some time to continue this.

Bobgy · 2021-01-31T02:40:43Z

Formalize a vulnerability management process

I think the process should come with two parts:

Set up a process to update dependencies/base images more frequently.
This is already being addressed in [Project Health] Dependency upgrade process #4682
Add an automated vulnerability policy check step in our CI/CD pipelines.
In the pipeline, we'll unavoidably need to allowlist many CVEs (maybe even of high/critical level), because a fix may not have been released, or the CVE may not be exploitable in KFP use-case, or maybe risk is tolerable. We should add comment on this whitelist about the reasons, and mark some of them as TODOs.

I'll focus on 2. in this issue.

Bobgy · 2021-01-31T02:54:16Z

Research of tools suitable for this need:

Google Cloud provides vulnerability scanning in container analysis service, but it can only provide information for what we need. It lacks required tools to integrate in a CI/CD pipeline. https://cloud.google.com/container-analysis/docs/vulnerability-scanning

Kritis is a nice tool built by GCP, https://cloud.google.com/binary-authorization/docs/creating-attestations-kritis#check-only. It supports vulnerability policy like the following and integrates with data from Google Container Analysis:

apiVersion: kritis.grafeas.io/v1beta1
kind: VulnzSigningPolicy
metadata:
  name: my-vsp
spec:
  imageVulnerabilityRequirements:
    maximumFixableSeverity: MEDIUM
    maximumUnfixableSeverity: MEDIUM
    allowlistCVEs:
    - projects/goog-vulnz/notes/CVE-2020-10543
    - projects/goog-vulnz/notes/CVE-2020-10878
    - projects/goog-vulnz/notes/CVE-2020-14155

Using them combined seem to meet our basic needs.

Bobgy · 2021-01-31T03:13:05Z

There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.

Bobgy · 2021-01-31T03:35:00Z

A bit more research lead me to https://github.com/aquasecurity/trivy. It seems the leading open source option.
There are some extra nice features:

a local CLI for exploration -- it can group CVEs by library type:

$ trivy image knqyf263/vuln-image:1.2.3
2019-05-16T12:59:03.150+0900    INFO    Detecting Alpine vulnerabilities...
2019-05-16T12:59:04.941+0900    INFO    Detecting bundler vulnerabilities...
2019-05-16T12:59:05.967+0900    INFO    Detecting cargo vulnerabilities...
2019-05-16T12:59:07.834+0900    INFO    Detecting composer vulnerabilities...
2019-05-16T12:59:10.285+0900    INFO    Detecting npm vulnerabilities...
2019-05-16T12:59:11.487+0900    INFO    Detecting pipenv vulnerabilities...

knqyf263/vuln-image:1.2.3 (alpine 3.7.1)
========================================
Total: 26 (UNKNOWN: 0, LOW: 3, MEDIUM: 16, HIGH: 5, CRITICAL: 2)

+---------+------------------+----------+-------------------+---------------+----------------------------------+
| LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION |              TITLE               |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
| curl    | CVE-2018-14618   | CRITICAL | 7.61.0-r0         | 7.61.1-r0     | curl: NTLM password overflow     |
|         |                  |          |                   |               | via integer overflow             |
+         +------------------+----------+                   +---------------+----------------------------------+
|         | CVE-2018-16839   | HIGH     |                   | 7.61.1-r1     | curl: Integer overflow leading   |
|         |                  |          |                   |               | to heap-based buffer overflow in |
|         |                  |          |                   |               | Curl_sasl_create_plain_message() |
+         +------------------+          +                   +---------------+----------------------------------+
|         | CVE-2019-3822    |          |                   | 7.61.1-r2     | curl: NTLMv2 type-3 header       |
|         |                  |          |                   |               | stack buffer overflow            |
+         +------------------+          +                   +---------------+----------------------------------+
|         | CVE-2018-16840   |          |                   | 7.61.1-r1     | curl: Use-after-free when        |
|         |                  |          |                   |               | closing "easy" handle in         |
|         |                  |          |                   |               | Curl_close()                     |
+         +------------------+----------+                   +               +----------------------------------+
|         | CVE-2018-16842   | MEDIUM   |                   |               | curl: Heap-based buffer          |
|         |                  |          |                   |               | over-read in the curl tool       |
|         |                  |          |                   |               | warning formatting               |
+         +------------------+          +                   +---------------+----------------------------------+
|         | CVE-2018-16890   |          |                   | 7.61.1-r2     | curl: NTLM type-2 heap           |
|         |                  |          |                   |               | out-of-bounds buffer read        |
+         +------------------+          +                   +               +----------------------------------+
|         | CVE-2019-3823    |          |                   |               | curl: SMTP end-of-response       |
|         |                  |          |                   |               | out-of-bounds read               |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
| git     | CVE-2018-17456   | HIGH     | 2.15.2-r0         | 2.15.3-r0     | git: arbitrary code execution    |
|         |                  |          |                   |               | via .gitmodules                  |
+         +------------------+          +                   +               +----------------------------------+
|         | CVE-2018-19486   |          |                   |               | git: Improper handling of        |
|         |                  |          |                   |               | PATH allows for commands to be   |
|         |                  |          |                   |               | executed from...                 |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
...

there are existing github actions that use trivy: https://github.com/Azure/container-scan

Bobgy · 2021-01-31T04:10:44Z

For reference, vulnerability vector description:
https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator

Bobgy · 2021-01-31T05:07:49Z

An experimental feature of trivy is to use user defined open agent policy as checker for the vulnerabilities.
It can be used to filter based on vulnerability vector,
examples include:

ignore all vulnerabilities that cannot be exploited via network
ignore those that cannot be exploited with root permission
...

So it can reduce the amount of vulnerabilities we need to check based on our specific environment requirements.

References:

Bobgy · 2021-01-31T05:09:40Z

EDIT: what's described below doesn't work well, because the result of gcloud beta container images describe --show-package-vulnerability gcr.io/ml-pipeline/api-server:1.0.0-test-5 --format=json does not provide information on vulnerability vector.

Open Policy Agent is in fact a generic tool:

inputs: "JSON" and "Policy"
output: "pass?"

So we could just use it with gcr vulnerability scanning to get the best of both flexibility using a GCP managed service.

==

or alternatively we can just write a script to check the vulnerability JSON as our own policy.

Bobgy · 2021-01-31T06:33:51Z

Analysis of Options

Trivy

Onboarding cost: low (download a binary and run it)
Vulnerability DB confidence: unknown (it's a third party maintained DB, although it claims its sources are the common ones like NVD etc)
Configuration flexibility: high (especially with OPA)
Momentum: high (6k stars, 18 PRs merged and 12 issues closed last month -- at time of evaluation)

Kritis

Onboarding cost: low (there're official docs for using it in Cloud Build, it's a container)
Vulnerability DB confidence: very high (it uses GCP image scanning)
Configuration flexibility: medium (allowlist + filter by [fixable, severity])
Momentum: low (the repo have 0 new activities recently)

Other options look obviously worse than the two, so I'm leaving them out.

To note that, OPA looks like it has some learning curve because there's a new language to learn, so I'd prefer we stay away from it initially. Therefore, if not using OPA, Trivy's major advantage does not apply to us.

I think we can start with Kritis, if it proves to work as it is, we can delay further customization when we really need to.
If we discover blocking bugs, we can revisit Trivy as a backup plan.

shawnzhu · 2021-01-31T20:37:10Z

I'm interested in this issue. speaking of trivy, it supports filtering vulnerabilities by a number of options besides OPA:

--severity - https://github.com/aquasecurity/trivy#filter-the-vulnerabilities-by-severities
.trivyignore (ignore spedific vulnerabilities) - https://github.com/aquasecurity/trivy#ignore-the-specified-vulnerabilities
--skip-files - https://github.com/aquasecurity/trivy#skip-traversal-of-the-specific-files
--skip-dirs - https://github.com/aquasecurity/trivy#skip-traversal-in-the-specific-directory

the lack of activity of Kritis might be a problem, but willing to give it a try since I haven't use it before.

Bobgy · 2021-01-31T23:08:45Z

@shawnzhu You are right.

I didn't make it clear that my major preference for kritis is -- it uses GCP container scanning as data source (in fact, it directly reads GCP container scanning results, so you cannot use it outside GCP)

Bobgy · 2021-02-01T02:09:55Z

Some notes after experimenting with Kritis:

Although the official sample is in Cloud Build, I found it much faster in terms of developer speed writing a KFP pipeline that runs vulnerability checks using Kritis
Kritis does not output structured information for vulnerability check results, we can only look at its logs like

E0201 01:43:02.099893 1 main.go:211] found fixable CVE <redacted> in gcr.io/<redacted>, which has severity HIGH exceeding max fixable severity MEDIUM

Bobgy · 2021-02-01T10:03:25Z

I built a KFP pipeline that runs Kritis: #5066.
This is now a one off pipeline I use to verify existing released images.

P1 The next steps would be maintaining a long running KFP test cluster and run that pipeline as one of the post submit tests.

davidspek · 2021-02-06T09:41:08Z

There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.

@Bobgy I think this is a better link: https://github.com/quay/clair. Clair is what Amazon ECR uses: https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html.

Bobgy added the area/deployment/kubeflow label May 27, 2020

Bobgy self-assigned this May 27, 2020

Bobgy added status/triaged Whether the issue has been explicitly triaged kind/misc types beside feature and bug labels May 27, 2020

Bobgy mentioned this issue May 27, 2020

Compliance to Kubeflow 1.0 Guideline #2884

Closed

Bobgy added the priority/p1 label Jun 1, 2020

stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Aug 30, 2020

k8s-ci-robot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Aug 31, 2020

This was referenced Feb 1, 2021

fix: upgrade some images to reduce vulnerabilities #5065

Merged

test: pipeline to check vulnerabilities for KFP images #5066

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KF 1.0 Compliance] Vulnerability Scanning #3857

[KF 1.0 Compliance] Vulnerability Scanning #3857

Bobgy commented May 27, 2020 •

edited

Loading

Bobgy commented May 29, 2020

jlewi commented May 29, 2020

Bobgy commented Jun 1, 2020 •

edited

Loading

Bobgy commented Jun 1, 2020

stale bot commented Aug 30, 2020

Bobgy commented Aug 31, 2020

Bobgy commented Oct 29, 2020 •

edited

Loading

Bobgy commented Oct 29, 2020 •

edited

Loading

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021 •

edited

Loading

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021 •

edited

Loading

Bobgy commented Jan 31, 2021 •

edited

Loading

shawnzhu commented Jan 31, 2021 •

edited

Loading

Bobgy commented Jan 31, 2021

Bobgy commented Feb 1, 2021 •

edited

Loading

Bobgy commented Feb 1, 2021 •

edited

Loading

davidspek commented Feb 6, 2021

[KF 1.0 Compliance] Vulnerability Scanning #3857

[KF 1.0 Compliance] Vulnerability Scanning #3857

Comments

Bobgy commented May 27, 2020 • edited Loading

Bobgy commented May 29, 2020

jlewi commented May 29, 2020

Bobgy commented Jun 1, 2020 • edited Loading

Bobgy commented Jun 1, 2020

stale bot commented Aug 30, 2020

Bobgy commented Aug 31, 2020

Bobgy commented Oct 29, 2020 • edited Loading

Bobgy commented Oct 29, 2020 • edited Loading

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021 • edited Loading

Formalize a vulnerability management process

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021

Bobgy commented Jan 31, 2021 • edited Loading

Bobgy commented Jan 31, 2021 • edited Loading

Analysis of Options

shawnzhu commented Jan 31, 2021 • edited Loading

Bobgy commented Jan 31, 2021

Bobgy commented Feb 1, 2021 • edited Loading

Bobgy commented Feb 1, 2021 • edited Loading

davidspek commented Feb 6, 2021

Bobgy commented May 27, 2020 •

edited

Loading

Bobgy commented Jun 1, 2020 •

edited

Loading

Bobgy commented Oct 29, 2020 •

edited

Loading

Bobgy commented Oct 29, 2020 •

edited

Loading

Bobgy commented Jan 31, 2021 •

edited

Loading

Bobgy commented Jan 31, 2021 •

edited

Loading

Bobgy commented Jan 31, 2021 •

edited

Loading

shawnzhu commented Jan 31, 2021 •

edited

Loading

Bobgy commented Feb 1, 2021 •

edited

Loading

Bobgy commented Feb 1, 2021 •

edited

Loading