Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove kube-dns from installer #548

Merged

Conversation

pravisankar
Copy link

  • Now openshift-cluster-dns-operator will provide the needed internal domain resolution functionality

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 26, 2018
@pravisankar
Copy link
Author

/retest

@ironcladlou
Copy link
Contributor

/retest

4 similar comments
@ironcladlou
Copy link
Contributor

/retest

@pravisankar
Copy link
Author

/retest

@ironcladlou
Copy link
Contributor

/retest

@pravisankar
Copy link
Author

/retest

@pravisankar pravisankar reopened this Oct 30, 2018
@ironcladlou
Copy link
Contributor

/retest

2 similar comments
@pravisankar
Copy link
Author

/retest

@pravisankar
Copy link
Author

/retest

@ironcladlou
Copy link
Contributor

The latest failure may not be a flake... better double check this branch against AWS and manually try the commands the CI job template is using to find the router

@pravisankar pravisankar force-pushed the remove-kube-dns branch 2 times, most recently from 4fc908b to 0c981d3 Compare November 1, 2018 21:32
@ramr
Copy link
Contributor

ramr commented Nov 2, 2018

@pravisankar it seems like these changes are causing the cluster to not come up. Finally managed to get past the nat gateway limit!

Here's what I saw on one of the master nodes:

[root@ip-10-0-20-251 core]# podman ps
CONTAINER ID   IMAGE                                                          COMMAND                  CREATED         STATUS             PORTS   NAMES
87c26e314870   docker.io/abhinavdahiya/origin-setup-etcd-environment:latest   /bin/setup-etcd-env...   3 minutes ago   Up 3 minutes ago           naughty_golick
[root@ip-10-0-20-251 core]# podman logs 87c26e314870
I1102 00:06:09.036710       1 main.go:62] ip addr is 10.0.20.251
E1102 00:06:09.040767       1 main.go:68] error looking up self: lookup _etcd-server-ssl._tcp.devcluster.openshift.com on 10.0.0.2:53: no such host
E1102 00:07:09.046228       1 main.go:68] error looking up self: lookup _etcd-server-ssl._tcp.devcluster.openshift.com on 10.0.0.2:53: no such host
E1102 00:08:09.046187       1 main.go:68] error looking up self: lookup _etcd-server-ssl._tcp.devcluster.openshift.com on 10.0.0.2:53: no such host
E1102 00:09:09.046474       1 main.go:68] error looking up self: lookup _etcd-server-ssl._tcp.devcluster.openshift.com on 10.0.0.2:53: no such host
E1102 00:10:09.046067       1 main.go:68] error looking up self: lookup _etcd-server-ssl._tcp.devcluster.openshift.com on 10.0.0.2:53: no such host

So it looks like because we disabled the dns server, the etcd setup fails and nothing starts up.

@pravisankar
Copy link
Author

@ramr I thought pr #526 addressed this specific issue.

@ramr
Copy link
Contributor

ramr commented Nov 2, 2018

@pravisankar aah, k I just tested directly with this branch and its rebased upto sha: f7080b1469da1e258b6aa83c1a242607604773e7 which is a couple of commits away from that merge. Also looks like you do need to rebase this PR - couple of conflicts. Thx

@pravisankar pravisankar force-pushed the remove-kube-dns branch 2 times, most recently from f06ac29 to c9daf5c Compare November 2, 2018 17:58
@abhinavdahiya
Copy link
Contributor

openshift/cluster-dns-operator#44 should allow the dns operator to schedule on masters.

otherwise the error from ci

sudo oc --config /opt/tectonic/auth/kubeconfig -n openshift-cluster-dns-operator get pods cluster-dns-operator-7cdc75466d-nksx4 -oyaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: 2018-11-02T18:26:40Z
  generateName: cluster-dns-operator-7cdc75466d-
  labels:
    name: cluster-dns-operator
    pod-template-hash: "3787310228"
  name: cluster-dns-operator-7cdc75466d-nksx4
  namespace: openshift-cluster-dns-operator
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: cluster-dns-operator-7cdc75466d
    uid: d738a17f-decc-11e8-9871-124667a4a6a6
  resourceVersion: "1618"
  selfLink: /api/v1/namespaces/openshift-cluster-dns-operator/pods/cluster-dns-operator-7cdc75466d-nksx4
  uid: d739f920-decc-11e8-9871-124667a4a6a6
spec:
  containers:
  - command:
    - cluster-dns-operator
    env:
    - name: WATCH_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: OPERATOR_NAME
      value: cluster-dns-operator
    image: registry.svc.ci.openshift.org/ci-op-m3n2pzv1/stable@sha256:462a4ae61446e9c98f82c7d63d20d9c060a00678ec6957ddcfe83ddd31af9270
    imagePullPolicy: IfNotPresent
    name: cluster-dns-operator
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: cluster-dns-operator-token-zqr9j
      readOnly: true
  dnsPolicy: Default
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: cluster-dns-operator
  serviceAccountName: cluster-dns-operator
  terminationGracePeriodSeconds: 30
  volumes:
  - name: cluster-dns-operator-token-zqr9j
    secret:
      defaultMode: 420
      secretName: cluster-dns-operator-token-zqr9j
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-11-02T18:26:40Z
    message: '0/3 nodes are available: 3 node(s) had taints that the pod didn''t tolerate.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

@pravisankar
Copy link
Author

/retest

@ironcladlou
Copy link
Contributor

@abhinavdahiya looks like this is ready to go, PTAL

@ironcladlou
Copy link
Contributor

@pravisankar remove the WIP tag?

@pravisankar
Copy link
Author

@ironcladlou @abhinavdahiya this will break extended test, prefer this to be merged after openshift/origin#21406 (updating the pr as per the feedback)

@ironcladlou
Copy link
Contributor

@pravisankar I think you have it backwards: e2e-aws passes today with this PR (wildcard extended tests aren't enabled yet in the job). This one should merge ASAP so we can start detecting and dealing with any fallout.

@pravisankar pravisankar changed the title [WIP] Remove kube-dns from installer Remove kube-dns from installer Nov 5, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2018
@pravisankar
Copy link
Author

@abhinavdahiya can you please review/merge this pr?

@abhinavdahiya
Copy link
Contributor

@ramr can you rebase on master, #605 and #549 were merged and you might have to resync this file and update in both variables... sorry about this 😇

@abhinavdahiya
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2018
- Now openshift-cluster-dns-operator will provide the needed internal domain resolution functionality
@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, pravisankar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ironcladlou
Copy link
Contributor

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants