Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hack/build: Pin to RHCOS 47.249 and quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 #1068

Closed
wants to merge 2 commits into from

Conversation

wking
Copy link
Member

@wking wking commented Jan 15, 2019

DO NOT MERGE! This PR is just for CI coverage.

Recycling the RHCOS build from 76f91bd (#1009, v0.9.1).

Clayton just pushed 4.0-art-latest-2019-01-15-010905 to quay.io/openshift-release-dev/ocp-release:4.0.0-0.1, although we might update that tag with a later hot fix. We're cutting this release on 4.0.0-0.1 so folks can use a (mostly) pinned installer with an (almost) released update payload ;).

Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (openshift/release#1793).

Also comment out regions which this particular RHCOS build wasn't pushed to, leaving only:

$ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.246/meta.json | jq -r '.amis[] | .name'
ap-northeast-1
ap-northeast-2
ap-south-1
ap-southeast-1
ap-southeast-2
ca-central-1
eu-central-1
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-1
us-west-2

I'd initially expected to export the pinning environment variables in release.sh, but I've put them in build.sh here because our continuous integration tests use build.sh directly and don't go through release.sh.

See #1002 for the very similar commit behind v0.9.0.

Through 63bdb7f (Merge pull request openshift#1050 from
ironcladlou/trouble-doc, 2019-01-14).
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 15, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2019
…-release:4.0.0-0.1

Recycling the RHCOS build from 76f91bd (hack/build: Bump RHCOS from
47.245 to 47.249, 2019-01-07, openshift#1009, v0.9.1).

Clayton just pushed 4.0-art-latest-2019-01-15-010905 to
quay.io/openshift-release-dev/ocp-release:4.0.0-0.1, although we might
update that tag with a later hot fix.  We're cutting this release on
4.0.0-0.1 so folks can use a (mostly) pinned installer with an
(almost) released update payload ;).

Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing
of the pinned release despite openshift/release@60007df2 (Use
RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03,
openshift/release#1793).

Also comment out regions which this particular RHCOS build wasn't
pushed to, leaving only:

  $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.246/meta.json | jq -r '.amis[] | .name'
  ap-northeast-1
  ap-northeast-2
  ap-south-1
  ap-southeast-1
  ap-southeast-2
  ca-central-1
  eu-central-1
  eu-west-1
  eu-west-2
  eu-west-3
  sa-east-1
  us-east-1
  us-east-2
  us-west-1
  us-west-2

I'd initially expected to export the pinning environment variables in
release.sh, but I've put them in build.sh here because our continuous
integration tests use build.sh directly and don't go through
release.sh.
@wking
Copy link
Member Author

wking commented Jan 15, 2019

@tkatarki, this is the commit that will become 0.10.0.

@crawford
Copy link
Contributor

/hold

Looks good to me.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 15, 2019
@wking
Copy link
Member Author

wking commented Jan 15, 2019

images:

info: Included 56 referenced images into the payload
Uploading ... failed
error: received unexpected HTTP status: 500 Internal Server Error
2019/01/15 17:59:52 Container release in pod release-latest failed, exit code 1, reason Error

/retest

@wking
Copy link
Member Author

wking commented Jan 15, 2019

e2e-aws:

level=warning msg="Failed to connect events watcher: Get https://ci-op-c9nti40h-1d3f3-api.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/kube-system/events?resourceVersion=149&watch=true: dial tcp 100.26.0.166:6443: connect: connection refused"
level=fatal msg="waiting for bootstrap-complete: timed out waiting for the condition"
2019/01/15 18:57:15 Container setup in pod e2e-aws failed, exit code 1, reason Error

I'm not entirely clear on what happened. Checking from my dev box while the installer was waiting (after copying the admin kubeconfig out of ci.openshift.org), showed events:

$ oc get events --all-namespaces
NAMESPACE                                    LAST SEEN   FIRST SEEN   COUNT     NAME                                                                              KIND         SUBOBJECT                                                  TYPE      REASON                       SOURCE                                                      MESSAGE
kube-system                                  31m         31m          1         kube-scheduler.157a18ee94b33467                                                   Endpoints                                                               Normal    LeaderElection               default-scheduler                                           ip-10-0-15-172_28613320-18f3-11e9-bc14-126bc65547cc became leader
kube-system                                  30m         30m          1         kube-controller-manager.157a18ef984b44d1                                          ConfigMap                                                               Normal    LeaderElection               kube-controller-manager                                     ip-10-0-15-172_286df4e8-18f3-11e9-8fff-126bc65547cc became leader
openshift-cluster-version                    30m         30m          12        cluster-version-operator.157a18f24287242f                                         Deployment                                       
...
openshift-apiserver-operator                 4m          4m           1         openshift-apiserver-operator.157a1a5d1dd13f11                                     Deployment                                                              Normal    OperatorStatusChanged        openshift-cluster-openshift-apiserver-operator              Status for operator openshift-apiserver changed: Available message changed from "apiservice/v1.apps.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.authorization.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\napiservice/v1.build.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.image.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.oauth.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.project.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\napiservice/v1.quota.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.route.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\napiservice/v1.security.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.template.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.user.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" to "apiservice/v1.apps.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.authorization.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.build.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.image.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.oauth.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.project.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.quota.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.route.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.security.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.template.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host\napiservice/v1.user.openshift.io: not available: no response from https://172.30.150.77:443: Get https://172.30.150.77:443: dial tcp 172.30.150.77:443: connect: no route to host"
default                                      4m          29m          6         ip-10-0-25-19.ec2.internal.157a1902779684e5                                       Node                                                                    Warning   ImageGCFailed                kubelet, ip-10-0-25-19.ec2.internal                         failed to get imageFs info: non-existent label "crio-images"

So maybe this is our master stability issue? I dunno what's going on with crio-images either.

/retest

@openshift-ci-robot
Copy link
Contributor

@wking: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws 9ccba22 link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Jan 15, 2019

e2e-aws:

level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
level=fatal msg="waiting for openshift-console URL: context deadline exceeded"

but at least the non-console parts of the cluster came up, and we cut 0.10.0 based on this commit.

/close

@openshift-ci-robot
Copy link
Contributor

@wking: Closed this PR.

In response to this:

e2e-aws:

level=info msg="Waiting up to 10m0s for the openshift-console route to be created..."
level=fatal msg="waiting for openshift-console URL: context deadline exceeded"

but at least the non-console parts of the cluster came up, and we cut 0.10.0 based on this commit.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants