Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove public IPs from masters #1045

Merged
merged 5 commits into from
Feb 14, 2019

Conversation

eparis
Copy link
Member

@eparis eparis commented Jan 10, 2019

This renames our subnets from master/worker to be public/private. It then moves the masters to the 'private' subnet. It also removes the public IP from all of the masters.

It leaves the bootstrap node in the public subnet and leaves it with a public ip.

This better follows AWS best security practices by not exposing our machines on the open internet. It does mean that a customer will need to provide their own bastion server for access.

Fixes: #747

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 10, 2019
@wking
Copy link
Member

wking commented Jan 10, 2019

Don't we still need these until we get a better fix than openshift/origin#21700 for CI?

@eparis
Copy link
Member Author

eparis commented Jan 10, 2019

I'm hacking on a simple tool for dev and CI (but not for customers) to get an ssh pod running which can act as an easy to deploy bastion. Maybe I'll be able to show that PoC tomorrow. I think that should be enough to merge this...

@wking
Copy link
Member

wking commented Jan 10, 2019

Linking the bastion config support so I can find it later: openshift/release#2469 ;).

@abhinavdahiya
Copy link
Contributor

/cc @blrm

@eparis eparis force-pushed the master-private branch 2 times, most recently from f107702 to c00463a Compare January 10, 2019 22:01
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2019
@DanyC97
Copy link
Contributor

DanyC97 commented Jan 16, 2019

@wking on a slightly different topic, shouldn't the installer deploy a bastion too for operators to be able to ssh into masters/ workers ?

While is all nice the work has been already done for CI, for prod deployment we leave the operator in the dark ? ... my 0.02$ though

@crawford
Copy link
Contributor

I'm in favor of pushing people away from jumping to SSH as their go-to. Providing a bastion by default goes against that goal. Creating a bastion host should be trivial for administrators and if it isn't, there are plenty of strategies that leverage SSH pods (like @eparis mentioned).

@wking wking added platform/aws and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 18, 2019
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 26, 2019
@eparis
Copy link
Member Author

eparis commented Jan 26, 2019

https://github.com/eparis/ssh-bastion provides an easy to deploy bastion pod.

@eparis eparis changed the title WIP: Remove public IPs from masters Remove public IPs from masters Jan 26, 2019
@eparis
Copy link
Member Author

eparis commented Jan 26, 2019

now to watch what is still using ssh in CI. hopefully most things have moved to oc adm logs.

@crawford
Copy link
Contributor

Holding while I wait for a little more buy-in.
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2019
@wking
Copy link
Member

wking commented Jan 30, 2019

Holding while I wait for a little more buy-in.

I'm sold :).

/hold cancel

e2e-aws:

Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs  should adhere to Three Laws of Controllers [Conformance] [Suite:openshift/conformance/parallel/minimal]
[Feature:DeploymentConfig] deploymentconfigs rolled back [Conformance] should rollback to an older deployment [Suite:openshift/conformance/parallel/minimal]
[Feature:DeploymentConfig] deploymentconfigs when run iteratively [Conformance] should immediately start a new deployment [Suite:openshift/conformance/parallel/minimal]
[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

Failing tests:

[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Inline-volume (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: vSphere] [Testpattern: Inline-volume (xfs)] volumes should be mountable [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2019
@crawford
Copy link
Contributor

crawford commented Jan 30, 2019

/hold

@wking not from this team specifically ;) I'll remove the hold once I'm satisfied with the responses I get (offline).

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2019
@deads2k
Copy link
Contributor

deads2k commented Jan 31, 2019

I'm hacking on a simple tool for dev and CI (but not for customers) to get an ssh pod running which can act as an easy to deploy bastion. Maybe I'll be able to show that PoC tomorrow. I think that should be enough to merge this...

Does this exist yet? Being able to have an oc adm node rsh nodes/foo would be really helpful.

@cgwalters
Copy link
Member

Does this exist yet? Being able to have an oc adm node rsh nodes/foo would be really helpful.

FWIW I sometimes rsh into the MCD, which is running on each node chrooted into the host's /.

@deads2k
Copy link
Contributor

deads2k commented Jan 31, 2019

I understand why this change is a good idea. I think some help in the description breaking down the debugging scenarios would help everyone.

  1. A mostly working cluster, mostly working kubelet - in this case, I think we can create a privileged pod on a particular node in kube-system.
  2. A mostly working cluster, node without a kubelet - in this case, I think we create the pod from 1 on a master and then ssh from there to node. Perhaps we even have a variant of a command that creates a hostnetwork pod, sshs to an IP, then attaches?
  3. A mostly broken cluster that is long lived - in this case we should be able to create a bastion, connect there and fan out.
  4. A mostly broken cluster that is short lived - this is the CI case and one that's causing me problems now on workers. I'd like CI to attempt to gather journals off of nodes in state 2.

@wking
Copy link
Member

wking commented Jan 31, 2019

  1. A mostly broken cluster that is short lived...

If we get far enough for bootstrap teardown, we were at least beyond this point at some point. So use kubelet pods on a master, or, lacking that, fall back to the bootstrap node.

@crawford
Copy link
Contributor

Now that openshift/origin#21997 has merged, I think cases 1, 2, and 4 are reasonably covered.

/hold cancel
/retest

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 13, 2019
@cuppett
Copy link
Member

cuppett commented Feb 13, 2019

@eparis Can we add a "Fixes #747" here?

@eparis
Copy link
Member Author

eparis commented Feb 13, 2019

/retest

@wking
Copy link
Member

wking commented Feb 14, 2019

e2e-aws:

Failing tests:

[Area:Networking] NetworkPolicy when using a plugin that implements NetworkPolicy should enforce policy based on NamespaceSelector and PodSelector [Feature:OSNetworkPolicy] [Suite:openshift/conformance/parallel]
[Area:Networking] multicast when using one of the plugins 'redhat/openshift-ovs-multitenant, redhat/openshift-ovs-networkpolicy' should block multicast traffic in namespaces where it is disabled [Suite:openshift/conformance/parallel]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should respond with 503 to unrecognized hosts [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should serve a route that points to two services and respect weights [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should serve routes that were created from an ingress [Suite:openshift/conformance/parallel/minimal]
[Conformance][templates] templateinstance object kinds test should create and delete objects from varying API groups [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds] build with empty source  started build should build even with an empty source in build config [Suite:openshift/conformance/parallel]
[Feature:Builds][Conformance] oc new-app  should succeed with a --name of 58 characters [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds][Conformance] remove all builds when build configuration is removed  oc delete buildconfig should start builds and delete the buildconfig [Suite:openshift/conformance/parallel/minimal]
[Feature:DeploymentConfig] deploymentconfigs with env in params referencing the configmap [Conformance] should expand the config map key to a value [Suite:openshift/conformance/parallel/minimal]
[k8s.io] Pods should allow activeDeadlineSeconds to be updated [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-cli] Kubectl Port forwarding [k8s.io] With a server listening on localhost [k8s.io] that expects a client request should support a client that connects, sends DATA, and disconnects [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-cli] Kubectl client [k8s.io] Proxy server should support --unix-socket=/path  [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Downward API volume should provide node allocatable (memory) as default memory limit if the limit is not set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV (default fs)] subPath should fail for new directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: azure] [Testpattern: Dynamic PV (default fs)] subPath should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Inline-volume (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Inline-volume (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Inline-volume (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Projected should be consumable from pods in volume with mappings and Item Mode set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Projected should project all components that make up the projection API [Projection][NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

/retest

@wking
Copy link
Member

wking commented Feb 14, 2019

e2e-aws had a number of flakes and then:

Failing tests:

[sig-cli] Kubectl alpha client [k8s.io] Kubectl run CronJob should create a CronJob [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: hostPathSymlink] [Testpattern: Dynamic PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@wking
Copy link
Member

wking commented Feb 14, 2019

e2e-aws:


Failing tests:

[Conformance][Area:Networking][Feature:Router] The HAProxy router converges when multiple routers are writing conflicting status [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The HAProxy router should enable openshift-monitoring to pull metrics [Suite:openshift/conformance/parallel/minimal]
[Conformance][Area:Networking][Feature:Router] The default ClusterIngress should support default wildcard reencrypt routes through external DNS [Suite:openshift/conformance/parallel/minimal]
[Conformance][templates] templateinstance impersonation tests should pass impersonation update tests [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds] Multi-stage image builds  should succeed [Conformance] [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds] Optimized image builds  should succeed [Conformance] [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds] build have source revision metadata  started build should contain source revision information [Suite:openshift/conformance/parallel]
[Feature:Builds][Conformance] build without output image  building from templates should create an image from a S2i template without an output image reference defined [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds][Conformance] imagechangetriggers  imagechangetriggers should trigger builds of all types [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds][pullsecret][Conformance] docker build using a pull secret  Building from a template should create a docker build that pulls using a secret run it [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds][timing] capture build stages and durations  should record build stages and durations for docker [Suite:openshift/conformance/parallel]
[Feature:DeploymentConfig] deploymentconfigs with failing hook [Conformance] should get all logs from retried hooks [Suite:openshift/conformance/parallel/minimal]
[image_ecosystem][mongodb] openshift mongodb image  creating from a template should instantiate the template [Suite:openshift/conformance/parallel]
[k8s.io] Container Runtime blackbox test when starting a container that exits should report termination message from log output if TerminationMessagePolicy FallbackToLogOnError is set [NodeConformance] [Suite:openshift/conformance/parallel] [Suite:k8s]
[k8s.io] Docker Containers should be able to override the image's default command and arguments [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-api-machinery] Downward API should provide pod name, namespace and IP address as env vars [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-auth] ServiceAccounts should allow opting out of API token automount  [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-cli] Kubectl client [k8s.io] Kubectl describe should check if kubectl describe prints relevant information for rc and pods  [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-cli] Kubectl client [k8s.io] Kubectl rolling-update should support rolling-update to same image  [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-cli] Kubectl client [k8s.io] Simple pod should support exec through an HTTP proxy [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-network] Networking Granular Checks: Pods should function for intra-pod communication: udp [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-network] Services should have session affinity work for NodePort service [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-scheduling] Multi-AZ Clusters should spread the pods of a service across zones [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Downward API volume should provide container's cpu limit [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Downward API volume should provide podname as non-root with fsgroup and defaultMode [NodeFeature:FSGroup] [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: emptydir] [Testpattern: Dynamic PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: emptydir] [Testpattern: Inline-volume (default fs)] subPath should support file as subpath [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: emptydir] [Testpattern: Inline-volume (default fs)] subPath should support readOnly file specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] Projected should be consumable from pods in volume with mappings and Item mode set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] Projected should set mode on item file [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

/retest

@openshift-merge-robot openshift-merge-robot merged commit 5e9e87b into openshift:master Feb 14, 2019
@eparis eparis deleted the master-private branch February 18, 2019 20:03
wking added a commit to wking/openshift-installer that referenced this pull request Feb 27, 2019
In 6c10827 (Removing unused/deprecated security groups and ports,
2019-02-23, openshift#1306), we restricted master SSH access to the cluster,
catching up with 6add0ab (data/aws: move the masters to the private
subnets, 2019-01-10, openshift#1045).  But the bootstrap node is a useful SSH
bastion for debugging hung installs (until we get far enough along to
tear down the bootstrap resources).  This commit restores global SSH
access to the bootstrap node, now that it is no longer provided by the
master security group.
wking added a commit to wking/openshift-release that referenced this pull request Apr 24, 2019
These are from b7cc916 (Set KUBE_SSH_USER for new installer for AWS
tests, openshift#2274) and 43dde9e (Set KUBE_SSH_BASTION and
KUBE_SSH_KEY_PATH in installer tests, 2018-12-23, openshift#2469).  But moving
forward, reliable SSH access direct to nodes will be hard, with things
like openshift/installer@6add0ab447 (Remove public IPs from masters,
2019-01-10, openshift/installer#1045) making a SSH bastion a
requirement for that sort of thing (at least on AWS).  Going forward,
ideally e2e tests can be ported to use privileged pods within the
cluster to check what they need to check.  But however that works out,
stop carrying local dead code that is not affecting test results.  We
can always drag it back out of version control later if it turns out
we actually want to go down the KUBE_SSH_* route.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. platform/aws size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.