-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci-operator/config/openshift/installer/master: Move to openshift-install #1677
ci-operator/config/openshift/installer/master: Move to openshift-install #1677
Conversation
ae3bfb4
to
5073bea
Compare
/approve |
5073bea
to
606af40
Compare
I've pushed 5073bea -> 606af40 updating the kubeconfig path, now that the installer is writing straight to /hold cancel |
606af40
to
37c6646
Compare
Rebased around #1690. |
/hold Waiting for one more fix in the installer. |
cp terraform.tfvars /tmp/artifacts/installer/ | ||
echo "Invoking installer ..." | ||
OPENSHIFT_INSTALL_CLUSTER_NAME="${INSTANCE_PREFIX}" \ | ||
OPENSHIFT_INSTALL_BASE_DOMAIN=origin-ci-int-aws.dev.rhcloud.com \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be aware that at some point in the next month or so we'll likely need to move this back out of the job definition and make parametrizable for different teams and content streams. But it's fine for now.
@@ -181,7 +180,7 @@ objects: | |||
- name: AWS_SHARED_CREDENTIALS_FILE | |||
value: /etc/openshift-installer/.awscred | |||
- name: KUBECONFIG | |||
value: /tmp/cluster/generated/auth/kubeconfig | |||
value: /tmp/artifacts/installer/auth/kubeconfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting kubeconfig in artifacts isn't a great idea if we accidentally put a secret in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i.e. if something shows up in there that is longer lived than the cluster)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting kubeconfig in artifacts isn't a great idea if we accidentally put a secret in there.
I'd guessed that we were we generating the secret for this CI run on the fly. Is that not accurate? Or is this just a "watch out for long-lived secrets in the future"?
from: installer-bazel | ||
to: installer-smoke | ||
source_path: /go/src/github.com/openshift/installer/bin/terraform | ||
to: installer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the replacement dockerfile centos based and using the right tooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the replacement dockerfile centos based and using the right tooling?
There is no replacement Dockerfile
. openshift/release:golang-1.10 has what we need for hack/build.sh && hack/get-terraform.sh
, and I was hoping to build the :installer
image with just that here. But I don't know what I'm doing with ci-operator; maybe that's not how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no replacement
Dockerfile
.
This was a dead end. I've pushed a replacement Dockerfile
with openshift/installer#343, but it's based on scratch
, not CentOS. Do we need CentOS for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ART team recommends this, may be outright required, I'm not sure. (https://mojo.redhat.com/docs/DOC-1179058#jive_content_id_Images)
I also quickly noticed I can't exec in with /bin/bash, which is awfully handy for debugging. Actually in the way we're probably going to use this in Hive we probably need a more substantial base layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also quickly noticed I can't exec in with /bin/bash, which is awfully handy for debugging.
The Dockerfile
is stuffing the binaries into scratch
(for folks who just need the binaries), but I couldn't figure out the right phrasing for that here (lucky for you ;), so we're currently using the base image (defined here) as the base layer that gets the binaries added. I chat about this a bit in the 3f2f01c commit message if you want more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so your published CI output is quite different from your scratch Dockerfile.ci if you built it yourself? As long as the published image is CentOS/RHEL based I think we'll have what we need.
You could also consider multiple Dockerfiles if need be. I can hack my own for now until this is getting publishing.
(none of my comments are blocking) /approve |
/hold cancel openshift/installer#340 landed, so we should be ok here, although I still haven't gotten this tested locally (@sallyom's been helping me with her notes, but I've been distracted). |
37c6646
to
e4ae796
Compare
Rebased around #1688 and updated with a @sallyom suggestion to avoid:
in local testing. e4ae796 -> 3144205. |
e4ae796
to
3144205
Compare
Ok, in local testing with this slightly-tweaked template, I'm getting: $ ci-operator -template templates/cluster-launch-installer-e2e-new.yaml -config ~/src/openshift/release/ci-operator/config/openshift/installer/master.yaml -git-ref=openshift/installer@master -namespace wking-next-gen-installer
2018/09/26 13:57:05 Resolved openshift/installer@master to commit 60991d84b2a266187c3a46f4bb91662cd15afccc
2018/09/26 13:57:05 Resolved source https://github.com/openshift/installer to master@60991d84
2018/09/26 13:57:06 Resolved openshift/release:golang-1.10 to sha256:43ad4740f25e4cbc6de8a964e2ce65ce50fbc24fc4e5e268e0a7370ca3b09bd1
2018/09/26 13:57:07 Resolved openshift/origin-v4.0:base to sha256:8026efab865dcb2d776b47ce05d639c6cb4113225df75c1fbce785bd6dfc88b7
2018/09/26 13:57:07 Using namespace wking-next-gen-installer
2018/09/26 13:57:07 Running [input:root], [input:base], [release-inputs], src, bin, installer, unit, gofmt, govet, golint, [output:stable:installer], [images], cluster-launch-installer-e2e-new, [release:latest]
2018/09/26 13:57:07 Creating namespace wking-next-gen-installer
2018/09/26 13:57:07 Setting a soft TTL of 1h0m0s for the namespace
2018/09/26 13:57:07 Setting a hard TTL of 12h0m0s for the namespace
2018/09/26 13:57:08 warning: Could not mark the namespace to be deleted later because you do not have permission to update the namespace (details: namespaces "wking-next-gen-installer" is forbidden: User "wking" cannot update namespaces in the namespace "wking-next-gen-installer": no RBAC policy matched)
2018/09/26 13:57:08 Setting up pipeline imagestream for the test
2018/09/26 13:57:09 Populating secrets for test
2018/09/26 13:57:09 Tagging openshift/release:golang-1.10 into pipeline:root
2018/09/26 13:57:09 Tagging openshift/origin-v4.0:base into pipeline:base
2018/09/26 13:57:09 Tagged release images from openshift/origin-v4.0:${component}, images will be pullable from registry.svc.ci.openshift.org/wking-next-gen-installer/stable:${component}
2018/09/26 13:57:09 Building src
2018/09/26 13:57:12 Build src already succeeded in 27s
2018/09/26 13:57:12 Executing test golint
2018/09/26 13:57:12 Executing test gofmt
2018/09/26 13:57:12 Building bin
2018/09/26 13:57:12 Executing test unit
2018/09/26 13:57:12 Executing test govet
2018/09/26 13:57:13 Build bin already succeeded in 35s
2018/09/26 13:57:13 Building installer
2018/09/26 13:57:14 Build installer failed, printing logs:
Pulling image "docker-registry.default.svc:5000/wking-next-gen-installer/pipeline@sha256:d913d470e79f89cf1797725d2aa0884fd9d3424fb902f41a697df3e4f1d708d4" ...
Pulling image "docker-registry.default.svc:5000/wking-next-gen-installer/pipeline@sha256:e3eb8f4e82cfffda0568c0853ddbe11c695b877365be67ed1b77393bea68df43" ...
error: open /tmp/build/inputs/Dockerfile: no such file or directory
2018/09/26 13:57:19 Container test in pod gofmt completed successfully
2018/09/26 13:57:19 Pod gofmt succeeded after 4s
2018/09/26 13:57:21 Container test in pod golint completed successfully
2018/09/26 13:57:21 Pod golint succeeded after 5s
2018/09/26 13:57:33 Container test in pod unit completed successfully
2018/09/26 13:57:33 Pod unit succeeded after 19s
2018/09/26 14:00:24 Container test in pod govet completed successfully
2018/09/26 14:00:24 Pod govet succeeded after 3m10s
2018/09/26 14:00:25 Ran for 3m19s
error: could not run steps: could not wait for build: the build installer failed with reason GenericBuildFailed: Generic Build failure - check logs for details. I dunno where the |
Probably something in here is assuming the presence of a |
Regarding missing region see openshift/installer#335. Fairly confident that the Terraform is not respecting the same methods and ordering of loading AWS credentials as the installer. |
Ah, thanks for the reproduction instructions :). Does openshift/installer#346 fix it for you? |
A bunch more installer work later, and now I'm running: diff --git a/ci-operator/config/openshift/installer/master.yaml b/ci-operator/config/openshift/installer/master.yaml
index 9f451bb..f030c07 100644
--- a/ci-operator/config/openshift/installer/master.yaml
+++ b/ci-operator/config/openshift/installer/master.yaml
@@ -44,12 +44,3 @@ tests:
- as: unit
commands: go test ./pkg/...
from: src
-- as: gofmt
- commands: IS_CONTAINER=TRUE ./hack/go-fmt.sh .
- from: src
-- as: govet
- commands: IS_CONTAINER=TRUE ./hack/go-vet.sh ./...
- from: src
-- as: golint
- commands: IS_CONTAINER=TRUE ./hack/go-lint.sh -min_confidence 0.3 $(go list -f '{{ .ImportPath }}' ./...)
- from: src
diff --git a/ci-operator/templates/cluster-launch-installer-e2e.yaml b/ci-operator/templates/cluster-launch-installer-e2e.yaml
index 898b69f..23b2c88 100644
--- a/ci-operator/templates/cluster-launch-installer-e2e.yaml
+++ b/ci-operator/templates/cluster-launch-installer-e2e.yaml
@@ -8,6 +8,7 @@ parameters:
required: true
- name: NAMESPACE
required: true
+ value: "wking-next-gen-installer"
- name: IMAGE_FORMAT
required: true
- name: LOCAL_IMAGE_INSTALLER
@@ -16,6 +17,7 @@ parameters:
required: true
- name: CLUSTER_TYPE
required: true
+ value: "aws"
- name: TEST_FOCUS
- name: TEST_SKIP
- name: TEST_FOCUS_SERIAL
@@ -60,7 +62,7 @@ objects:
emptyDir: {}
- name: cluster-profile
secret:
- secretName: ${JOB_NAME_SAFE}-cluster-profile
+ secretName: cluster-profile
containers:
@@ -212,7 +214,7 @@ objects:
- name: OPENSHIFT_INSTALL_CLUSTER_NAME
value: ${NAMESPACE}-${JOB_NAME_HASH}
- name: OPENSHIFT_INSTALL_BASE_DOMAIN
- value: origin-ci-int-aws.dev.rhcloud.com
+ value: coreservices.team.coreos.systems
- name: OPENSHIFT_INSTALL_EMAIL_ADDRESS
value: test@ci.openshift.io
- name: OPENSHIFT_INSTALL_PASSWORD (with the base domain change because I'm using teamcoreservices credentials to launch the cluster). Launched with: $ oc delete is pipeline stable -n wking
$ ci-operator -template ci-operator/templates/cluster-launch-installer-e2e.yaml -config ci-operator/config/openshift/installer/master.yaml -git-ref=openshift/installer@master -namespace wking the setup container runs to a successful completion:
The test container is still waiting:
but the API returns something when I hit it directly: $ curl -ks https://wking-ef260-api.coreservices.team.coreos.systems:6443/ | head
{
"paths": [
"/api",
"/api/v1",
"/apis",
"/apis/",
"/apis/admissionregistration.k8s.io",
"/apis/admissionregistration.k8s.io/v1beta1",
"/apis/apiextensions.k8s.io",
"/apis/apiextensions.k8s.io/v1beta1", So it's possible that I just need to be patient. Or it's possible that I broke the wait-for-the-API logic. |
Anyhow, I think this is close enough that I'd like to land it, and if there are any remaining bugs we can sort them out on the fly. @smarterclayton, can you take another look and drop a |
The cluster continues:
so better and better :). |
I ctrl-ced
|
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, smarterclayton, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: Updated the following 5 configmaps:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The final installer-origin-release and tectonic-installer consumers were removed in openshift/release@3f2f01c4 (ci-operator/config/openshift/installer/master: Move to openshift-install, 2018-09-26, openshift/release#1677). The final reference to kubernetes-e2e was removed locally in 40eb6f0 (Update README.md, 2018-06-22, coreos/tectonic-installer#3310).
Like d44e106 (remove ref to openshift.yaml aws installer config, 2018-09-27, openshift#1755), this is cleaning up after 3f2f01c (ci-operator/config/openshift/installer/master: Move to openshift-install, 2018-09-26, openshift#1677).
We'd dropped this in 3f2f01c (ci-operator/config/openshift/installer/master: Move to openshift-install, 2018-09-26, openshift#1677). But since openshift/installer@98a35316 (pkg/asset/installconfig: Add _CI_ONLY_STAY_AWAY_OPENSHIFT_INSTALL_AWS_USER_TAGS, 2018-09-28, openshift/installer#364), the installer supports setting it again, so we can add it back. As the name suggests, this variable is not a stable API. Soon the new installer will be able to load the configuration from YAML (again), but we can keep adjusting as the installer evolves.
I'd removed these in 3f2f01c (ci-operator/config/openshift/installer/master: Move to openshift-install, 2018-09-26, openshift#1677) while removing the shell from the setup container. But we got a shell back in b20d571 (cluster-launch-installer-e2e: Start setting expirationDate again, 2018-09-29, openshift#1761), and restoring these handlers will help: * Trigger early pod wrap-up when the setup container dies. Both the test and teardown containers are monitoring the 'exit' file so they can bail early. * Forward TERM to the installer, so we can exit gracefully when someone TERMs the shell script. For the signal handling to work, we need to keep the shell process around, so I've dropped the 'exec' from the openshift-install invocation and moved it to the background instead (so 'jobs -p' will list it).
The teardown container requires oc (for retrieving cluster logs), gzip (for compressing those logs), and openshift-install (for destroying the cluster). We were already copying in oc from another container; with this commit we copy in the statically-linked openshift-install too. That leaves gzip, which we will be built in to IMAGE_TESTS as "a command needed in the CI-test context". Using IMAGE_TESTS for teardown means we no longer need to copy oc into the shared volume for the teardown container (although we may need to copy it in for other containers, e.g. the src template's test container). I've also dropped an unnecessary /bin/ prefix from the 'create cluster' invocations, which we'd been dragging around from the pre-shell days of 3f2f01c (ci-operator/config/openshift/installer/master: Move to openshift-install, 2018-09-26, openshift#1677).
This is the next-gen installer. With the new installer, we only need the installer binary and terraform to launch and destroy clusters. I've removed the smoke tests for now, because I haven't looked into building them without Bazel. Hopefully we'll get expanded e2e testing and other OpenShift projects using our installer soon to keep us honest.
Also missing from the new installer is a way to set
expirationDate
. But we'll just not leak until we regain the ability to set that, right? ;)The new config dumps less cruft into the output directory (most of the generated output goes into a temporary directory), so I've adjusted the openshift-install calls to just use an artifacts subdir for their state storage. A bonus of this approach is that if the installer hangs up, we'll capture anything it wrote to disk without needing an explicit
cp
call.I've also dropped the 3.11 config, since the installer is 4.0-only.
We'll need openshift/installer#333 to land before this PR will work.
/hold
And even once that PR lands, I dunno if this one will work. @sallyom, can you run this through its paces or give me some pointers?
CC @crawford; once this lands, we can delete the old installer without breaking CI.