Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Make libvirt support completely conditional (behind TAGS=libvirt) #955

Merged
merged 1 commit into from
Dec 22, 2018

Conversation

wking
Copy link
Member

@wking wking commented Dec 20, 2018

Previously, destroy support was behind TAGS=libvirt_destroy and create support was always built in. But since 3fb4400 (#919), the bundled libvirt Terraform provider has also been behind libvirt_destroy. That leads to cluster creation failing with:

$ openshift-install create cluster
...
ERROR Missing required providers.
ERROR
ERROR The following provider constraints are not met by the currently-installed
ERROR provider plugins:
ERROR
ERROR * libvirt (any version)
ERROR
ERROR Terraform can automatically download and install plugins to meet the given
ERROR constraints, but this step was skipped due to the use of -get-plugins=false
ERROR and/or -plugin-dir on the command line.
...

With this commit, folks trying to create cluster without libvirt compiled in will get:

FATAL failed to fetch Common Manifests: failed to load asset "Install Config": invalid "install-config.yaml" file: platform: Invalid value: types.Platform{AWS:(*aws.Platform)(nil), Libvirt:(*libvirt.Platform)(0xc4209511f0), OpenStack:(*openstack.Platform)(nil)}: platform must be one of: aws, openstack

before we get to Terraform.

Now that the build tag guards both creation and deletion, I've renamed it from libvirt_destroy to the unqualified libvirt.

CC @abhinavdahiya

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 20, 2018
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 20, 2018
Copy link
Contributor

@michaelgugino michaelgugino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BYO needs libvirt support by default. We should always bundle in libvirt, this patch set should be going the otherway.

@crawford
Copy link
Contributor

We discussed this earlier and decided to add a vanilla platform. This can be used by BYOR.

Previously, destroy support was behind TAGS=libvirt_destroy and create
support was always built in.  But since 3fb4400 (terraform/plugins:
add `libvirt`, `aws`, `ignition`, `openstack` to KnownPlugins,
2018-12-14, openshift#919), the bundled libvirt Terraform provider has also
been behind libvirt_destroy.  That leads to cluster creation failing
with:

  $ openshift-install create cluster
  ...
  ERROR Missing required providers.
  ERROR
  ERROR The following provider constraints are not met by the currently-installed
  ERROR provider plugins:
  ERROR
  ERROR * libvirt (any version)
  ERROR
  ERROR Terraform can automatically download and install plugins to meet the given
  ERROR constraints, but this step was skipped due to the use of -get-plugins=false
  ERROR and/or -plugin-dir on the command line.
  ...

With this commit, folks trying to 'create cluster' without libvirt
compiled in will get:

  FATAL failed to fetch Common Manifests: failed to load asset "Install Config": invalid "install-config.yaml" file: platform: Invalid value: types.Platform{AWS:(*aws.Platform)(nil), Libvirt:(*libvirt.Platform)(0xc4209511f0), OpenStack:(*openstack.Platform)(nil)}: platform must be one of: aws, openstack

before we get to Terraform.

Now that the build tag guards both creation and deletion, I've renamed
it from 'libvirt_destroy' to the unqualified 'libvirt'.

I've also adjusted the install-config validation testing to use
regular expressions so we can distinguish between failures because
libvirt was not compiled in as a valid platform and failures because
some portion of the libvirt configuration was broken.  In order to get
stable error messages for comparison, I've added some strings.Sort
calls for various allowed-value string-slice computations.
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 21, 2018
@wking
Copy link
Member Author

wking commented Dec 22, 2018

e2e-aws included:

fail [github.com/openshift/origin/test/extended/operators/cluster.go:109]: Expected <[]string | len:1, cap:1>: [ "Pod openshift-cluster-machine-approver/machine-approver-649d8ffd59-t5fjm is not healthy: container machine-approver-controller has restarted more than 5 times", ] to be empty

Must be unrelated.

/retest

@wking
Copy link
Member Author

wking commented Dec 22, 2018

Digging into the machine-approver restarts:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/955/pull-ci-openshift-installer-master-e2e-aws/2493/artifacts/e2e-aws/events.json | jq '.items[] | select(.involvedObject.namespace == "openshift-cluster-machine-approver" and .message == "Back-off restarting failed container")'
{
  "apiVersion": "v1",
  "count": 17,
  "eventTime": null,
  "firstTimestamp": "2018-12-21T21:35:41Z",
  "involvedObject": {
    "apiVersion": "v1",
    "fieldPath": "spec.containers{machine-approver-controller}",
    "kind": "Pod",
    "name": "machine-approver-649d8ffd59-t5fjm",
    "namespace": "openshift-cluster-machine-approver",
    "resourceVersion": "1599",
    "uid": "11c9551a-0568-11e9-a8a1-122c3fd60c36"
  },
  "kind": "Event",
  "lastTimestamp": "2018-12-21T21:42:26Z",
  "message": "Back-off restarting failed container",
  "metadata": {
    "creationTimestamp": "2018-12-21T21:35:41Z",
    "name": "machine-approver-649d8ffd59-t5fjm.157276b502925b15",
    "namespace": "openshift-cluster-machine-approver",
    "resourceVersion": "10209",
    "selfLink": "/api/v1/namespaces/openshift-cluster-machine-approver/events/machine-approver-649d8ffd59-t5fjm.157276b502925b15",
    "uid": "5dc520a3-0568-11e9-a8a1-122c3fd60c36"
  },
  "reason": "BackOff",
  "reportingComponent": "",
  "reportingInstance": "",
  "source": {
    "component": "kubelet",
    "host": "ip-10-0-23-128.ec2.internal"
  },
  "type": "Warning"
}

The pod logs suggest the issue may have been a missing Kubernetes API server:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/955/pull-ci-openshift-installer-master-e2e-aws/2493/artifacts/e2e-aws/pods/openshift-cluster-machine-approver_machine-approver-649d8ffd59-t5fjm_machine-approver-controller_previous.log.gz | gunzip | tail
E1221 21:42:24.392324       1 leaderelection.go:234] error retrieving resource lock openshift-cluster-machine-approver/cluster-machine-approver-lock: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-machine-approver/configmaps/cluster-machine-approver-lock: dial tcp 127.0.0.1:6443: connect: connection refused
I1221 21:42:25.279413       1 reflector.go:240] Listing and watching *v1beta1.CertificateSigningRequest from github.com/openshift/cluster-machine-approver/vendor/k8s.io/client-go/informers/factory.go:130
E1221 21:42:25.279885       1 reflector.go:205] github.com/openshift/cluster-machine-approver/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta1.CertificateSigningRequest: Get https://127.0.0.1:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6443: connect: connection refused
I1221 21:42:26.280099       1 reflector.go:240] Listing and watching *v1beta1.CertificateSigningRequest from github.com/openshift/cluster-machine-approver/vendor/k8s.io/client-go/informers/factory.go:130
E1221 21:42:26.280507       1 reflector.go:205] github.com/openshift/cluster-machine-approver/vendor/k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta1.CertificateSigningRequest: Get https://127.0.0.1:6443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests?limit=500&resourceVersion=0: dial tcp 127.0.0.1:6443: connect: connection refused
E1221 21:42:26.392153       1 leaderelection.go:234] error retrieving resource lock openshift-cluster-machine-approver/cluster-machine-approver-lock: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-machine-approver/configmaps/cluster-machine-approver-lock: dial tcp 127.0.0.1:6443: connect: connection refused
E1221 21:42:26.392401       1 leaderelection.go:234] error retrieving resource lock openshift-cluster-machine-approver/cluster-machine-approver-lock: Get https://127.0.0.1:6443/api/v1/namespaces/openshift-cluster-machine-approver/configmaps/cluster-machine-approver-lock: dial tcp 127.0.0.1:6443: connect: connection refused
E1221 21:42:26.392463       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'd6a1f746-0568-11e9-b47d-0a9f534285c0 stopped leading'
I1221 21:42:26.392532       1 leaderelection.go:213] failed to renew lease openshift-cluster-machine-approver/cluster-machine-approver-lock: timed out waiting for the condition
F1221 21:42:26.392554       1 leaderelection.go:65] leaderelection lost

which matches the event's lastTimestamp. But it's hard to imaging the Kubernetes API server being down for the full seven minutes we were seeing that event without it affecting other parts of the e2e tests.

@wking
Copy link
Member Author

wking commented Dec 22, 2018

All green :)

@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 22, 2018
@openshift-merge-robot openshift-merge-robot merged commit a29ed8f into openshift:master Dec 22, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants