-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MicroShift 4.15+ support with helm charts #745
base: main
Are you sure you want to change the base?
MicroShift 4.15+ support with helm charts #745
Conversation
…e and create new Role for allowing nvdp to run with MicroShift 4.15+ Signed-off-by: Arthur Oliveira <arolivei@redhat.com>
…le-binding templates Signed-off-by: Arthur Oliveira <arolivei@redhat.com>
… new var Signed-off-by: Arthur Oliveira <arolivei@redhat.com>
@elezar testing helm chart with small changes for microshift 4.15+ support.
@fabiendupont FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a major difference between microshift and openshift? we don't have Helm values for Openshift. Are you not deploying the device plugin via the operator?
@ArangoGutierrez thanks reviewing this issue. In addition to standard Kubernetes APIs, MicroShift includes just a small subset of the APIs supported by OpenShift Container Platform:
See more with: For OpenShift, we've working so far with the NVIDIA Operator, due the full enterprise nature of OpenShift (cluster-operators based + native OLM). But for MicroShift, although seems to work fine with NVIDIA's Operator as well, we are looking for resource-constrained environments and also to support it with RPM based systems (not only ostree ones). Due that, the focus is to stay compatible with as minimum footprint as possible like just having nvidia-device-plugin. With this PR, I'm basically extending this existent doc https://docs.nvidia.com/datacenter/cloud-native/edge/latest/nvidia-gpu-with-device-edge.html# with helm charts while also testing time-slicing. Although this changes can also benefit OpenShift deployments, the NVIDIA Operator already covers well that use case and the focus here stays with MicroShift only. |
deployments/helm/nvidia-device-plugin/templates/role-binding.yml
Outdated
Show resolved
Hide resolved
… new var with values.yaml Signed-off-by: Arthur Oliveira <arolivei@redhat.com>
See [1] for a major differences on MicroShift and OpenShift and [2] about simplifying the PR. with latest changes as per recommended by @elezar, we don't need to have extra vars which makes it transparent for users. |
@@ -149,4 +149,4 @@ mps: | |||
# be created. This includes a daemon-specific /dev/shm and pipe and log | |||
# directories. | |||
# Pipe directories will be created at {{ mps.root }}/{{ .ResourceName }} | |||
root: "/run/nvidia/mps" | |||
root: "/run/nvidia/mps" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an oversight due to the removal of the explicit value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for this typo. at the end, better to remove this file from the PR. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arthur-r-oliveira. I think it looks a lot better now.
I have some minor comments / questions.
I would also like @cdesiniotis to just have a look before going ahead with merging this though.
{{- if .Capabilities.APIVersions.Has "security.openshift.io/v1/SecurityContextConstraints" }} | ||
- apiGroups: | ||
- security.openshift.io | ||
resourceNames: | ||
- privileged | ||
resources: | ||
- securitycontextconstraints | ||
verbs: | ||
- use | ||
{{- end }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to shift this to a named template to use here and below? Not a blocker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @elezar, apologies for so late reply, I'm back to this thread.
Yes, it looks like a good approach to follow. Should we include as part of this PR? Or can I do it as a secondary one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cdesiniotis did you had a chance to review this PR?
{{- if .Capabilities.APIVersions.Has "security.openshift.io/v1/SecurityContextConstraints" }} | ||
--- | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: Role |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we create a list consisting of ClusterRole
and Role
and loop over these to construct both? (note that for the default case we would only construct a ClusterRole
.
I'm happy to do these as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With MicroShift we do also need a Role
object to unblock such security issues:
[root@dell-prt7875-01 k8s-device-plugin]# helm upgrade -i nvdp deployments/helm/nvidia-device-plugin/ --version=0.15.0 --namespace nvidia-device-plugin --create-namespace --set-file config.map.config=/tmp/dp-example-config0.yaml
Release "nvdp" does not exist. Installing it now.
W0819 10:53:58.235559 3075925 warnings.go:70] would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.capabilities.drop=["ALL"]; containers "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must not include "SYS_ADMIN" in securityContext.capabilities.add), restricted volume types (volumes "device-plugin", "mps-root", "mps-shm", "cdi-root" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "nvidia-device-plugin-init", "nvidia-device-plugin-sidecar", "nvidia-device-plugin-ctr" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0819 10:53:58.236480 3075925 warnings.go:70] would violate PodSecurity "restricted:v1.24": privileged (containers "mps-control-daemon-mounts", "mps-control-daemon-ctr" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.capabilities.drop=["ALL"]), restricted volume types (volumes "mps-root", "mps-shm" use restricted volume type "hostPath"), runAsNonRoot != true (pod or containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "mps-control-daemon-mounts", "mps-control-daemon-init", "mps-control-daemon-sidecar", "mps-control-daemon-ctr" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
NAME: nvdp
LAST DEPLOYED: Mon Aug 19 10:53:58 2024
NAMESPACE: nvidia-device-plugin
STATUS: deployed
REVISION: 1
TEST SUITE: None
Apologies, but no sure if I do also visualize the need for the loop over a list of ClusterRoles
and Roles
, as it would be a bit short ihmo.
Could you please clarify a bit this point?
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed. |
As follow-up from previous PR #702, testing MicroShift 4.15+ integration through helm charts, instead of legacy static manifests, fails due lack of Pod Security Standards configurations: