Resource limits not affecting scan-vulnerabilityreport pods #572

jeremeb · 2021-05-12T20:08:12Z

What steps did you take and what happened:

After deploying the starboard-operator-0.5.1 chart, I found that the scan-vulnerabilityreport pods do not have the CPU/Memory resource limits set from the values file. The operator however did have the correct resources set.

What did you expect to happen:

I expected the trivy scan-vulnerabilityreport pods being ran from the starboard-operator to use the cpu/memory resource limits that were set in the values file.

[Miscellaneous information that will assist in solving the issue.]

Environment:

GKE 1.17
Stardboard-Operator Helm Chart 0.5.1

danielpacak · 2021-05-13T11:02:51Z

That is correct @jeremeb In v0.10.2 resource requests and limits for scan jobs are hardcoded.

danielpacak · 2021-05-14T10:47:26Z

I'd relate somehow this issue to #538 #539 where we configure scheduling params for scan jobs

danielpacak · 2021-06-17T13:01:56Z

The Trivy plugin sets resource requests and resource limits from the hardcoded struct literal (defaultResourceRequirements). However, it's desirable to configure these values, e.g. increase memory limits, without recompiling Starboard.

Firstly, we have to decide whether resource requests and limits are configurable for all scan jobs spawned by Starboard (in starboard ConfigMap) or per plugin (in starboard-trivy-config ConfigMap). Notice that we already have config params to configure tolerations and custom annotations (scanJob.tolerations and scanJob.annotations properties respectively).
Secondly, apply resource requests and limits based on configured values. Test code.
Update documentation and Helm chart

elek · 2021-07-14T20:13:04Z

Firstly, we have to decide whether resource requests and limits are configurable for all scan jobs spawned by Starboard (in starboard ConfigMap) or per plugin (in starboard-trivy-config ConfigMap).

Was there any decision about this @danielpacak?

I think there is a third option. Spark k8s operator accepts pod templates which is a custom yaml partially adjusted by the operator itself.

It has the advantage that it's fully customizable: not only the cpu and memory resource limits, but any part can be adjusted (I used it to mount a temporary emptyDir which is populated by another initContainer but any other field, label, annotation, mounts... can be added).

Adding just memory and cpu limit might be enough for now, but more pod specific customization may require more and more configuration parameter.

Using podTemplate would require only one more parameter (eg. keyTrivyPodTemplate) and if it's set it can be used to initially populate pod instead of creating a new corev1.PodSpec{}

On the other hand, it's slightly harder to use this approach from Helm chart. You should add one more configmap (for the pod template) in case of any specific configuration is added, which is slightly more complex compared to adding only one or two environment variables / keys to the existing configmaps...

What do you think?

danielpacak · 2021-07-15T09:26:25Z

We haven't decided yet.

I think that configuring the whole pod template is an interesting idea. Especially, if one day we'd like to merge scanJob.tolerations, scanJob.annotations, and potentially scanJob.resources into one configuration object. What's more, with the current configuration parameters represented as key-value pairs, it's quite challenging to represent structured values such as tolerations or resource limits as string. (For example, we encode tolerations as JSON.) Having a structured config object would have another advantage over key-value pairs that we could version Starboard configuration.

Anyway, in scope of this issue I'd just focus on new config key for resource limits without changing the whole configuration model. We can always do that as a follow up.

However, the original question holds whether it makes sense to configure resource limits for all scanners (Trivy, Polaris, KubeBench, etc.) or each scanner should have its own resource requirements and limits. In other words, do we make resource requirements part of generic Starboard config or a plugin config? WDYT?

… pods aquasecurity#572

elek · 2021-07-15T20:07:56Z

In other words, do we make resource requirements part of generic Starboard config or a plugin config? WDYT?

Having the configuration on generic Starboard level is easier to understand, but IMHO it's more realistic to have a limit per-service bases. It really depends on the exact scanners how many limits/requests should be configured. I am not sure if it's realistic to use one generic limit for all the scanners.

But it would be great to have more inputs from @jeremeb, the original reporter: @jeremeb, can you please share the motivation changing the resource limit/requests for the trivy pods? If the default settings (request: 100m/100M limit: 500m/500M) are not enough for huge containers (?), it can be considered to slightly increase the defaults instead of making it customizable. I think the default can be good enough for most of the use-cases.

(BTW, I tried out an approach to add it to the trivy config in plain key-value format: elek@53bd652. Didn't find integration tests for helm/charts and as far as I see the itest/starboard-operator doesn't test different configuration, but I created a custom docker container and tested it and also added a unit test. This is one possible quick fix for this issue, but I am open to follow different approach)

danielpacak · 2021-07-15T22:18:38Z

I agree that configuring resources per service is probably more reasonable and flexible. For example, Trivy will have different resource requirements than Polaris or KubeHunter and it would be nice to configure them separately with sensible defaults. (The defaults for Trivy depend on container images' size and Trivy release. Recent Trivy releases allow scanning Java and Go apps, which requires a bit more memory than previous versions.)

That said, I like the approach that you took in 53bd652 I'd maybe use more explicit / self documenting keys:

trivy.resources.request.cpu: 100m
trivy.resources.request.memory: 100M
trivy.resources.limit.cpu: 500m
trivy.resources.limit.memory: 500M

Regarding the implementation I'm wondering if we can get rid of defaults hardcoded in Go and read them from config? Also maybe we should have a way to skip setting any resources requests & limits if someone did not configure trivy.resources.* properties.

elek · 2021-07-16T20:30:17Z

I'd maybe use more explicit / self documenting keys:

Thanks for the suggestion. I updated the branch with using the suggested keys.

Regarding the implementation I'm wondering if we can get rid of defaults hardcoded in Go and read them from config? Also maybe we should have a way to skip setting any resources requests & limits if someone did not configure trivy.resources.* properties.

This approach makes the code more simple but I am not sure what are the current compatibility guarantees. With moving defaults to helm/k8s resources:

The helm charts and the pure k8s deployment example should be updated with the same values
Existing starboard setups which are installed without (!) Helm or Kustomize (eg. with pure k8s resources + git ops tools) won't use limits anymore unless the configmap is updated with the desired default. Which is kind of backward incompatibility.

But I moved the defaults to helm chart / k8s resources files and opened the PR. We can continue the discussion about the code there: #639

danielpacak added 🚀 enhancement New feature or request 🙏 help wanted Extra attention is needed 🧑‍🎓 good first issue Good for newcomers labels May 13, 2021

jeremeb mentioned this issue May 19, 2021

Add annotations to scan jobs spawned by Starboard #539

Closed

danielpacak mentioned this issue Jun 9, 2021

scan-vulnerabilityreport Job OOMKilled #619

Closed

danielpacak added this to the Release v0.11.0 milestone Jun 17, 2021

danielpacak added the 🔦 plugin/trivy This issue is related to Trivy vulnerability scanner label Jun 17, 2021

danielpacak self-assigned this Jun 17, 2021

elek added a commit to elek/starboard that referenced this issue Jul 15, 2021

feat(trivy): support memory/CPU request/limit configuration for trivy…

53bd652

… pods aquasecurity#572

elek mentioned this issue Jul 16, 2021

Introduce resource request/limit parameters for trivy pod #639

Merged

danielpacak closed this as completed in 969c7b6 Jul 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource limits not affecting scan-vulnerabilityreport pods #572

Resource limits not affecting scan-vulnerabilityreport pods #572

jeremeb commented May 12, 2021

danielpacak commented May 13, 2021

danielpacak commented May 14, 2021

danielpacak commented Jun 17, 2021 •

edited

Loading

elek commented Jul 14, 2021

danielpacak commented Jul 15, 2021

elek commented Jul 15, 2021 •

edited

Loading

danielpacak commented Jul 15, 2021 •

edited

Loading

elek commented Jul 16, 2021

Resource limits not affecting scan-vulnerabilityreport pods #572

Resource limits not affecting scan-vulnerabilityreport pods #572

Comments

jeremeb commented May 12, 2021

danielpacak commented May 13, 2021

danielpacak commented May 14, 2021

danielpacak commented Jun 17, 2021 • edited Loading

elek commented Jul 14, 2021

danielpacak commented Jul 15, 2021

elek commented Jul 15, 2021 • edited Loading

danielpacak commented Jul 15, 2021 • edited Loading

elek commented Jul 16, 2021

danielpacak commented Jun 17, 2021 •

edited

Loading

elek commented Jul 15, 2021 •

edited

Loading

danielpacak commented Jul 15, 2021 •

edited

Loading