Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize securityContext in injected metrics container #964

Merged
merged 1 commit into from
Jan 1, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pkg/webhook/v1alpha3/pod/inject_webhook.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ func (s *sidecarInjector) Mutate(pod *v1.Pod, namespace string) (*v1.Pod, error)
return nil, err
}

injectContainer, err := s.getMetricsCollectorContainer(trial)
injectContainer, err := s.getMetricsCollectorContainer(trial, pod)
if err != nil {
return nil, err
}
Expand All @@ -162,7 +162,7 @@ func (s *sidecarInjector) Mutate(pod *v1.Pod, namespace string) (*v1.Pod, error)
return mutatedPod, nil
}

func (s *sidecarInjector) getMetricsCollectorContainer(trial *trialsv1alpha3.Trial) (*v1.Container, error) {
func (s *sidecarInjector) getMetricsCollectorContainer(trial *trialsv1alpha3.Trial, originalPod *v1.Pod) (*v1.Container, error) {
mc := trial.Spec.MetricsCollector
if mc.Collector.Kind == common.CustomCollector {
return mc.Collector.CustomCollector, nil
Expand All @@ -178,11 +178,13 @@ func (s *sidecarInjector) getMetricsCollectorContainer(trial *trialsv1alpha3.Tri
}
args := getMetricsCollectorArgs(trial.Name, metricName, mc)
sidecarContainerName := getSidecarContainerName(trial.Spec.MetricsCollector.Collector.Kind)
securityContext := originalPod.Spec.Containers[0].SecurityContext.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can it make sure that originalPod.Spec.Containers[0].SecurityContext here can fit the metricsCollector and the originalPod.Spec.Containers[0].SecurityContext may not be generated by other mutating Webhook but defined explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand the questions. Please look at my comment below which explains how the securityContext is generated and see if it helps.

For how I understand your question - the metricsCollector should have the same securityContext as the [0] container as that is how it would be generated by PSP by default - based on the ServiceAccount privileges it would generate securityContext for Pod and from there for all the containers. So as the [0] container already has the right securityContext it is safe to simply replicate it into the metricsCollector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container securityContext can be generated by PSP, but also can be specified explicitly, right?
For example, a user adds a training job in a container with securityContext: runAsUser: 1000, if user 1000 can work well for metricsCollector container, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean now. That is a fair point. I would say it would work fine - IIUC both containers are working with the same files, so if the user container works with UID 1000, the metrics container should work too, right?

I am trying to figure out if there is a common pattern about how to solve these cases with the OpenShift team.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned somewhere else the best solution would be to use reinvocationPolicy which should take care of generating the SC in the new container after it is added by webhook. That has 2 issues at the moment -

  1. it is coming in next version of OpenShift, so I cannot even test it:)
  2. the code katib uses to register webhooks does not know about the field, so I'll have to figure out how to actually do it when it is available

It seems (after discussing with OpenShift team) that using the existing SC from the other container might be our best shot here. I will try to git into it a bit more, but as I mentioned - as long as the sidecar container does not do anything special, it should work fine with the SC from existing container. The only way I see this could potentially fail if if the user configures the container very restrictively and metrics container cannot get to what it needs to read.

The only other way I could think of is to configure the SC explicitly to the state that we know will work for most configurations. Question is what happens if the user then configures his container to run as root and metrics container does not get the same SC and thus will probably not be able to access files produced by the main container.

That said, I still think copying the SC is the best solution at the moment. I will try to play around with it a bit more tomorrow to see if I can actually break this approach:)

Copy link
Member

@hougangliu hougangliu Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can try another solution as workaround: now k8s runs Webhooks in string order by webhook name, we can try to run katib mutating before other related webhooks by renaming it.
for example, now its name is katib-mutating-webhook-config, we can rename it to admission-katib-mutating-webhook-config, not sure what openshift related Webhooks name is.
I think other projects like Istio have similar issue, have no idea how the project handle this case (for istio, I think istioctl kube-inject to bypass the istio mutatingwebhook as a workaround.).
BTW, @johnugeorge @richardsliu @gaocegege WDYT we provide a tool like istioctl kube-inject to inject metricsCollector container in job level offline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked RH istio team and their issue seems to be different because they set the SC explicitly as they need their container to run as privileged.

I am not sure about reordering by name because AFAIK the component generating SC is PSP (Pod Security Policies) which does not run as a webhook.

Adding the metrics container on Job level might help though - I added comment here #928 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics collector container now reads metrics file and then reports metrics to DB server, I wonder if two operations fail owning to training job SC configuration.

I tested this before sending the PR and the operations surely did not fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why RH istio team think their issue is different.
the root cause is: when a Pod generated and before persisted, PSP adds security context to all containers; then mutatingwebhook injecting new container into a Pod which leads that the new container has no consistent SC which conflicts with PSP, right?
so I think it is a common use case, many projects would inject new container into a Pod by mutatingwebhook, such as istio, kfserving.
we had better handle this case in a best practice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hougangliu @vpavlin

Can we have a command line flag to control the behavior whether we use other containers's securityContext if we cannot come to an agreement now?

injectContainer := v1.Container{
Name: sidecarContainerName,
Image: image,
Args: args,
ImagePullPolicy: v1.PullIfNotPresent,
SecurityContext: securityContext,
}
return &injectContainer, nil
}
Expand Down