Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Add optional auth token #736

Closed

Conversation

cgwalters
Copy link
Member

Since the bootstrap node serves Ignition to master, and today we
don't support scaling up/down masters, just disable access to
the master Ignition config in the in-cluster MCS by default.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 10, 2019
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 10, 2019
@cgwalters cgwalters changed the title server: Disable access to the master Ignition by default server: Disable access to the master Ignition by default, add optional auth token May 10, 2019
@cgwalters cgwalters force-pushed the mcs-master-worker-separation branch from 1af70b9 to 3bf7431 Compare May 10, 2019 20:49
@runcom
Copy link
Member

runcom commented May 10, 2019

Code lgtm, guess this can't hurt BYO either now since we just support workers there. Looks sane

@cgwalters cgwalters force-pushed the mcs-master-worker-separation branch from 3bf7431 to cd24821 Compare May 10, 2019 21:01
@cgwalters
Copy link
Member Author

Just for general info, I just discovered Ignition will do a retry loop infinitely on http status >= 500 but if you give it a 403 it gives up, so you need to de-provision the machine (e.g. oc -n openshift-machine-api delete machine/$x) if you were testing auth.

aws ec2 --region=us-east-2 get-console-output --instance-id i-0bc13fd82d10aee7a is handy to debug things like this.

@cgwalters
Copy link
Member Author

Just writing up some more steps for testing/using this. After you do something like this:
oc -n openshift-machine-config-operator create secret generic --from-literal=worker=$(pwgen 32 1) ignition-auth

Your next step is to oc -n openshift-machine-api edit secret/worker-user-data - base64 decode it to get the JSON, then ensure that the Ignition request has the same auth key you generated in the secret, e.g.:

{"ignition":{"config":{"append":[{"source":"https://api-int.mycluster.example.com:22623/config/worker?auth=pxzbz5n2zgfcd4kff8krv78z8wwdndvj","verification":{}}]}, ...

@@ -31,10 +31,30 @@ type kubeconfigFunc func() (kubeconfigData []byte, rootCAData []byte, err error)
// appenderFunc appends Config.
type appenderFunc func(*ignv2_2types.Config) error

// Error is returned by the GetConfig API
type Error struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to call it GetConfigError or something like that (doesn't need uppercase to be exported also, does it?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@runcom I thought upper to be an exported type)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this naming suggestion btw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I realized only after commenting that I uppercased my suggestion 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or possibly configError to be explicit.

@kikisdeliveryservice
Copy link
Contributor

aws route53 issues

/retest

@cgwalters
Copy link
Member Author

cgwalters commented May 10, 2019

Now a much stronger elaboration of this PR would be for the MCO to support one-time-use tokens.

Rather than having a secret, an admin (or automation) would do e.g.:

oc -n openshift-machine-config-operator create secret generic --from-literal=auth=$(pwgen 32 1) ignition-auth-worker-us-east-2a-sks8v

The secret name is the concatenation of ignition-auth- with the machineAPI object name.

Then the HTTP request would have e.g. ?n=worker-us-east-2a-sks8v&auth=<secret>.

The MCS would check for this secret's existence, and if it existed, accept the request and then delete the secret. This would also mean access to the EC2 metadata API from a pod couldn't gain a useful token.

The machine API would need to learn how to generate a per-instance userdata.

@cgwalters
Copy link
Member Author

/retest

@cgwalters
Copy link
Member Author

OK, all tests passed on this one, confirming my manual testing before submitting the PR that nothing depends on access to the /config/master endpoint. And we have the ability to gate access to /worker, but turning it on by default will require an installer PR.

I'd like to consider landing this as we're pretty sure it's not going to break anything, and gives us the mechanism to gate access to /config/worker.

cgwalters added a commit to cgwalters/installer that referenced this pull request May 12, 2019
This is an optional hardening for access to Ignition; the installer
generates a random key (separately for master/worker pool) and installs
it into the `openshift-machine-config-operator` namespace.  If the MCS
finds an `ignition-auth` secret with the `master/worker` keys, it will use it:
openshift/machine-config-operator#736

This PR just generates those secrets, so we can land it before the
MCO PR as well.
@cgwalters
Copy link
Member Author

Installer PR: openshift/installer#1740

@runcom
Copy link
Member

runcom commented May 12, 2019

/approve

This lgtm, can this land separately from the installer PR? do we need a go-on from Auth team?

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits but those are nits 😄

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashcrow, cgwalters, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ashcrow,cgwalters,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

func (cs *clusterServer) GetConfig(cr poolRequest) (*ignv2_2types.Config, error) {
mp, err := cs.machineClient.MachineConfigPools().Get(cr.machineConfigPool, metav1.GetOptions{})
func (cs *clusterServer) GetConfig(cr poolRequest, auth string) (*ignv2_2types.Config, error) {
authSecretObj, err := cs.kubeClient.CoreV1().Secrets(componentNamespace).Get(ignitionAuth, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ability to configure multiple secrets would make later rotation much easier.


// If there's a secret, require that it was passed as a query parameter.
if authEnabled {
authSecret := string(authSecretObj.Data[name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an administrator forgets to configure an authsecret for a new machine pool, this will fail open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional for backcompat reasons.

@crawford
Copy link
Contributor

/hold

This doesn't but us much in terms of security and it makes disaster recovery more difficult.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2019
@cgwalters
Copy link
Member Author

This doesn't but us much in terms of security

If access to the EC2 metadata endpoint is shut off (which it needs to be anyways), or in cases where the bootstrap config isn't accessible at all, I think this is fairly strong security. It entirely shuts down the problem.

How is this not buying much?

and it makes disaster recovery more difficult.

Slightly...and if desired we can switch this to just landing code to enable auth tokens for both master/worker without actually enabling it by default. In other words, drop the change to disable the master pool.

Further, keep in mind that this code is a step towards further integration with machineAPI.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Apr 22, 2020
Ignition may contain secret data; pods running on the cluster
shouldn't have access.

This adds opt-in support for denying serving that data.  It is
disabled by default so we can check whether this would happen
in any CI scenarios to start.  Run
`oc -n openshift-machine-config-operator create configmap machine-config-server provision-check=yes`
to switch to enforcing mode.

First, we deny any request that appears to come from the pod overlay
network.  This closes off a lot of avenues without any risk.

However, we can't guarantee all in-cluster requests appear to originate from
the pod network; in some cases according to the SDN team, particularly
for machines that have multiple NICs.

Hence, this PR also closes off access to any IP that responds on
port 22, as that is a port that is:

 - Known to be active by default
 - Not firewalled

A previous attempt at this was to have an [auth token](openshift#736);
but this fix doesn't require changing the installer and people's PXE setups.

In the future we may reserve a port in the 9xxx range and have the
MCD respond on it so that admins who disable/firewall SSH don't
have indirectly reduced security.
@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws 5e6a17b link /test e2e-aws
ci/prow/e2e-aws-op 5e6a17b link /test e2e-aws-op
ci/prow/e2e-aws-disruptive 5e6a17b link /test e2e-aws-disruptive
ci/prow/e2e-gcp-op 5e6a17b link /test e2e-gcp-op
ci/prow/e2e-aws-upgrade 5e6a17b link /test e2e-aws-upgrade
ci/prow/e2e-gcp-upgrade 5e6a17b link /test e2e-gcp-upgrade
ci/prow/e2e-vsphere 5e6a17b link /test e2e-vsphere
ci/prow/build-rpms-from-tar 5e6a17b link /test build-rpms-from-tar
ci/prow/e2e-ovn-step-registry 5e6a17b link /test e2e-ovn-step-registry
ci/prow/e2e-aws-proxy 5e6a17b link /test e2e-aws-proxy
ci/prow/e2e-upgrade 5e6a17b link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-upgrade 5e6a17b link /test e2e-agnostic-upgrade
ci/prow/e2e-aws-serial 5e6a17b link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

Closing in favor of #2223

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. team-mco
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants