Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1729242: OCP 4 AWS privilege escalation vulnerability by running pods on masters #524

Conversation

ravisantoshgudimetla
Copy link
Contributor

Make worker label as defaultNodeSelector to avoid pods getting scheduled to master nodes.

/cc @deads2k @enj @sttts

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 12, 2019
@openshift-ci-robot
Copy link

@ravisantoshgudimetla: This pull request references a valid Bugzilla bug. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1729242: OCP 4 AWS privilege escalation vulnerability by running pods on masters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jul 12, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 12, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ravisantoshgudimetla

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2019
@@ -36,7 +39,12 @@ func ObserveDefaultNodeSelector(genericListers configobserver.Listers, recorder
if err != nil {
return prevObservedConfig, errs
}

if schedulerConfig.Spec == (configv1.SchedulerSpec{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the defaulting pattern used in:

authConfig := defaultAuthConfig(authConfigNoDefaults)

@enj
Copy link
Contributor

enj commented Jul 12, 2019

/hold

Per David's request to continue review on Monday.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2019
@ravisantoshgudimetla
Copy link
Contributor Author

/test e2e-aws

@openshift-ci-robot
Copy link

@ravisantoshgudimetla: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator dd61ee9 link /test e2e-aws-operator
ci/prow/e2e-aws dd61ee9 link /test e2e-aws
ci/prow/e2e-aws-upgrade dd61ee9 link /test e2e-aws-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@@ -10,6 +11,8 @@ import (
"k8s.io/klog"
)

const workerNodeSelector = "node-role.kubernetes.io/worker=\"\""
Copy link
Contributor

@smarterclayton smarterclayton Jul 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not guaranteed to exist in all clusters.

Node roles are not an api and no business logic should based on node roles.

Copy link
Contributor Author

@ravisantoshgudimetla ravisantoshgudimetla Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, there are 2 ways to distinguish master nodes from worker nodes:

  • role label
  • Master taints

So, are you suggesting, we cannot rely on role label which the defaultNodeSelector is based on? IIRC, we used to have something like type=Compute as defaultNodeSelector in 3.11.

This is not to say that we cannot have a validation against pods have master tolerations

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton Can you explain what you mean by "Node roles are not an api and no business logic should based on node roles."? It has been common practice with OpenShift in the field to add role labels to nodes and separate workload based on these labels by using node selectors and the openshift.io/node-selector annotation. This has been a very effective design pattern, what is inappropriate about this? Should we encourage use of a different label?

@deads2k
Copy link
Contributor

deads2k commented Jul 15, 2019

I'm a -1 on resolving the issue this way. The mechanism for steering workloads away from masters is taints and restricting tolerations. I thought this is what PodTolerationRestrictions did, but that plugin is very confused. Mapping tolerations to rbac permissions for serviceaccounts seems more practical.

Doing it like this will restrict our future choices of node topologies

@sfowl
Copy link

sfowl commented Jul 16, 2019

CVE-2019-10200 was assigned to this issue.

@jkupferer
Copy link

Can we get an update on where we are at on this issue?

We need this resolved both for the potential security impact and also to clarify how users should isolate workload when there is a security concern.

In my experience as an openshift consultant and instructor, taints and tolerations are not well suited for security isolation because most users find these difficult to understand and implement. While it may be possible to resolve this with podTolerationRestrictions, this is likely going to be hopelessly complicated for many OpenShift administrators. Node selectors are much easier for our users and administrators to understand and node selectors on namespaces are easy to audit with automation.

imho, we should label all masters as node-role.kubernetes.io/master everything else as workers, node-role.kubernetes.io/worker, and set the default for the defaultNodeSeloctor to run on workers as this PR was aiming to do. In the case where we want workload to run on the masters, we simply add the worker label to the masters, so it really shouldn't limit node topologies.

@soltysh
Copy link

soltysh commented Dec 12, 2019

I think https://bugzilla.redhat.com/show_bug.cgi?id=1730165#c8 has the answer. In short version 2 of https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html should be able to solve the issue.
/close

@openshift-ci-robot
Copy link

@soltysh: Closed this PR.

In response to this:

I think https://bugzilla.redhat.com/show_bug.cgi?id=1730165#c8 has the answer. In short version 2 of https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html should be able to solve the issue.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants