Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource limits not affecting scan-vulnerabilityreport pods #572

Closed
jeremeb opened this issue May 12, 2021 · 8 comments
Closed

Resource limits not affecting scan-vulnerabilityreport pods #572

jeremeb opened this issue May 12, 2021 · 8 comments
Assignees
Labels
🔦 plugin/trivy This issue is related to Trivy vulnerability scanner 🙏 help wanted Extra attention is needed 🚀 enhancement New feature or request 🧑‍🎓 good first issue Good for newcomers

Comments

@jeremeb
Copy link

jeremeb commented May 12, 2021

What steps did you take and what happened:

After deploying the starboard-operator-0.5.1 chart, I found that the scan-vulnerabilityreport pods do not have the CPU/Memory resource limits set from the values file. The operator however did have the correct resources set.

What did you expect to happen:

I expected the trivy scan-vulnerabilityreport pods being ran from the starboard-operator to use the cpu/memory resource limits that were set in the values file.

[Miscellaneous information that will assist in solving the issue.]

Environment:

GKE 1.17
Stardboard-Operator Helm Chart 0.5.1

@danielpacak
Copy link
Contributor

That is correct @jeremeb In v0.10.2 resource requests and limits for scan jobs are hardcoded.

@danielpacak danielpacak added 🚀 enhancement New feature or request 🙏 help wanted Extra attention is needed 🧑‍🎓 good first issue Good for newcomers labels May 13, 2021
@danielpacak
Copy link
Contributor

I'd relate somehow this issue to #538 #539 where we configure scheduling params for scan jobs

@danielpacak danielpacak added this to the Release v0.11.0 milestone Jun 17, 2021
@danielpacak danielpacak added the 🔦 plugin/trivy This issue is related to Trivy vulnerability scanner label Jun 17, 2021
@danielpacak
Copy link
Contributor

danielpacak commented Jun 17, 2021

The Trivy plugin sets resource requests and resource limits from the hardcoded struct literal (defaultResourceRequirements). However, it's desirable to configure these values, e.g. increase memory limits, without recompiling Starboard.

  • Firstly, we have to decide whether resource requests and limits are configurable for all scan jobs spawned by Starboard (in starboard ConfigMap) or per plugin (in starboard-trivy-config ConfigMap). Notice that we already have config params to configure tolerations and custom annotations (scanJob.tolerations and scanJob.annotations properties respectively).
  • Secondly, apply resource requests and limits based on configured values. Test code.
  • Update documentation and Helm chart

@danielpacak danielpacak self-assigned this Jun 17, 2021
@elek
Copy link
Contributor

elek commented Jul 14, 2021

Firstly, we have to decide whether resource requests and limits are configurable for all scan jobs spawned by Starboard (in starboard ConfigMap) or per plugin (in starboard-trivy-config ConfigMap).

Was there any decision about this @danielpacak?

I think there is a third option. Spark k8s operator accepts pod templates which is a custom yaml partially adjusted by the operator itself.

It has the advantage that it's fully customizable: not only the cpu and memory resource limits, but any part can be adjusted (I used it to mount a temporary emptyDir which is populated by another initContainer but any other field, label, annotation, mounts... can be added).

Adding just memory and cpu limit might be enough for now, but more pod specific customization may require more and more configuration parameter.

Using podTemplate would require only one more parameter (eg. keyTrivyPodTemplate) and if it's set it can be used to initially populate pod instead of creating a new corev1.PodSpec{}

On the other hand, it's slightly harder to use this approach from Helm chart. You should add one more configmap (for the pod template) in case of any specific configuration is added, which is slightly more complex compared to adding only one or two environment variables / keys to the existing configmaps...

What do you think?

@danielpacak
Copy link
Contributor

We haven't decided yet.

I think that configuring the whole pod template is an interesting idea. Especially, if one day we'd like to merge scanJob.tolerations, scanJob.annotations, and potentially scanJob.resources into one configuration object. What's more, with the current configuration parameters represented as key-value pairs, it's quite challenging to represent structured values such as tolerations or resource limits as string. (For example, we encode tolerations as JSON.) Having a structured config object would have another advantage over key-value pairs that we could version Starboard configuration.

Anyway, in scope of this issue I'd just focus on new config key for resource limits without changing the whole configuration model. We can always do that as a follow up.

However, the original question holds whether it makes sense to configure resource limits for all scanners (Trivy, Polaris, KubeBench, etc.) or each scanner should have its own resource requirements and limits. In other words, do we make resource requirements part of generic Starboard config or a plugin config? WDYT?

elek added a commit to elek/starboard that referenced this issue Jul 15, 2021
@elek
Copy link
Contributor

elek commented Jul 15, 2021

In other words, do we make resource requirements part of generic Starboard config or a plugin config? WDYT?

Having the configuration on generic Starboard level is easier to understand, but IMHO it's more realistic to have a limit per-service bases. It really depends on the exact scanners how many limits/requests should be configured. I am not sure if it's realistic to use one generic limit for all the scanners.

But it would be great to have more inputs from @jeremeb, the original reporter: @jeremeb, can you please share the motivation changing the resource limit/requests for the trivy pods? If the default settings (request: 100m/100M limit: 500m/500M) are not enough for huge containers (?), it can be considered to slightly increase the defaults instead of making it customizable. I think the default can be good enough for most of the use-cases.

(BTW, I tried out an approach to add it to the trivy config in plain key-value format: elek@53bd652. Didn't find integration tests for helm/charts and as far as I see the itest/starboard-operator doesn't test different configuration, but I created a custom docker container and tested it and also added a unit test. This is one possible quick fix for this issue, but I am open to follow different approach)

@danielpacak
Copy link
Contributor

danielpacak commented Jul 15, 2021

I agree that configuring resources per service is probably more reasonable and flexible. For example, Trivy will have different resource requirements than Polaris or KubeHunter and it would be nice to configure them separately with sensible defaults. (The defaults for Trivy depend on container images' size and Trivy release. Recent Trivy releases allow scanning Java and Go apps, which requires a bit more memory than previous versions.)

That said, I like the approach that you took in 53bd652 I'd maybe use more explicit / self documenting keys:

trivy.resources.request.cpu: 100m
trivy.resources.request.memory: 100M
trivy.resources.limit.cpu: 500m
trivy.resources.limit.memory: 500M

Regarding the implementation I'm wondering if we can get rid of defaults hardcoded in Go and read them from config? Also maybe we should have a way to skip setting any resources requests & limits if someone did not configure trivy.resources.* properties.

@elek
Copy link
Contributor

elek commented Jul 16, 2021

I'd maybe use more explicit / self documenting keys:

Thanks for the suggestion. I updated the branch with using the suggested keys.

Regarding the implementation I'm wondering if we can get rid of defaults hardcoded in Go and read them from config? Also maybe we should have a way to skip setting any resources requests & limits if someone did not configure trivy.resources.* properties.

This approach makes the code more simple but I am not sure what are the current compatibility guarantees. With moving defaults to helm/k8s resources:

  1. The helm charts and the pure k8s deployment example should be updated with the same values
  2. Existing starboard setups which are installed without (!) Helm or Kustomize (eg. with pure k8s resources + git ops tools) won't use limits anymore unless the configmap is updated with the desired default. Which is kind of backward incompatibility.

But I moved the defaults to helm chart / k8s resources files and opened the PR. We can continue the discussion about the code there: #639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔦 plugin/trivy This issue is related to Trivy vulnerability scanner 🙏 help wanted Extra attention is needed 🚀 enhancement New feature or request 🧑‍🎓 good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants