Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube 1.25 fixes #1764

Merged
merged 1 commit into from
Sep 30, 2022
Merged

Conversation

sjenning
Copy link
Contributor

@sjenning sjenning commented Sep 28, 2022

Kube 1.25 rebase has landed as of https://amd64.ocp.releases.ci.openshift.org/releasestream/4.12.0-0.ci/release/4.12.0-0.ci-2022-09-28-193725

CI is offline due to KAS not starting with flag: cannot set feature gate CSIMigrationAWS to false, feature is locked to true

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/1746/pull-ci-openshift-hypershift-main-e2e-aws/1575163249484107776/artifacts/e2e-aws/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/namespaces/e2e-clusters-nb7x7-example-nqbhw/core/pods/logs/kube-apiserver-5587dd5c7f-p8js7-kube-apiserver-previous.log

  • bump github.com/openshift/api to get feature gate changes
  • bump k8s.io/client-go as a secondary dependency
  • KCM flag --experimental-cluster-signing-duration was removed replaced with --cluster-signing-duration

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 28, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2022
@sjenning
Copy link
Contributor Author

There is still something out there that is causing nodes not to join
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 29, 2022
@enxebre
Copy link
Member

enxebre commented Sep 29, 2022

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/1764/pull-ci-openshift-hypershift-main-e2e-aws/1575229696872812544/artifacts/e2e-aws/run-e2e/artifacts/TestCreateCluster_PreTeardownClusterDump/machine-journals/journal_10.0.133.214.log

Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: I0928 22:06:31.184140 2077 daemon.go:239] Booted osImageURL: (412.86.202208101039-0) a4cd5482264860d07e6830708ba5952df8ec2596feff273f8ccddf2d4331bdb6
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: I0928 22:06:31.184269 2077 rpm-ostree.go:447] Running captured: rpm-ostree --version
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: I0928 22:06:31.204374 2077 update.go:2068] rpm-ostree is not new enough for new-format image; forcing an update via container and queuing immediate reboot
Sep 28 22:06:31 ip-10-0-133-214 root[2646]: machine-config-daemon[2077]: rpm-ostree is not new enough for new-format image; forcing an update via container and queuing immediate reboot
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: I0928 22:06:31.206737 2077 update.go:2053] Running: systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host rpm-ostree ex deploy-from-self /run/host
Sep 28 22:06:31 ip-10-0-133-214 systemd[1]: Started /usr/bin/podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host rpm-ostree ex deploy-from-self /run/host.
Sep 28 22:06:31 ip-10-0-133-214 podman[2648]: Error: repository name must have at least one component
Sep 28 22:06:31 ip-10-0-133-214 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Main process exited, code=exited, status=125/n/a
Sep 28 22:06:31 ip-10-0-133-214 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Failed with result 'exit-code'.
Sep 28 22:06:31 ip-10-0-133-214 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Consumed 52ms CPU time
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: W0928 22:06:31.269036 2077 firstboot_complete_machineconfig.go:46] error: error running systemd-run --unit machine-config-daemon-update-rpmostree-via-container --collect --wait -- podman run --authfile /var/lib/kubelet/config.json --privileged --pid=host --net=host --rm -v /:/run/host rpm-ostree ex deploy-from-self /run/host: Running as unit: machine-config-daemon-update-rpmostree-via-container.service
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: Finished with result: exit-code
Sep 28 22:06:31 ip-10-0-133-214 machine-config-daemon[2077]: Main processes terminated with: code=exited/status=125

@cgwalters @yuqi-zhang does it ring any bell?

@csrwng
Copy link
Contributor

csrwng commented Sep 29, 2022

/retest-required

@cgwalters
Copy link
Member

Looks like fallout from openshift/machine-config-operator#3317 - we probably should have tested hypershift with that PR before merging. Seems like we're not getting the new format rendered image in the hypershift flow.

@cgwalters
Copy link
Member

Filed #1767 to track this since it's not related to this PR

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 29, 2022

@sjenning: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/capi-provider-agent-sanity ed71fb1 link false /test capi-provider-agent-sanity
ci/prow/kubevirt-e2e-kubevirt-gcp-ovn ed71fb1 link false /test kubevirt-e2e-kubevirt-gcp-ovn
ci/prow/kubevirt-e2e-kubevirt-azure-ovn ed71fb1 link false /test kubevirt-e2e-kubevirt-azure-ovn
ci/prow/e2e-aws ed71fb1 link true /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented Sep 29, 2022

Ah yeah, we did create https://issues.redhat.com/browse/MCO-373 to make sure this doesn't happen but we haven't gotten there yet.

In this particular case it does get caught on rpm-ostree is not new enough for new-format image first, which is odd, and according to the log, we are running 412.86.202208101039-0. Which has rpm-ostree 2022.10.94.g89f58028

which, hmm, is that rpm-ostree not new enough? Do we think bumping the bootimage here will help temporarily in getting past the first error here? (Presumably it could still fail, but may attempt to use a backup legacy path instead?)

@cgwalters
Copy link
Member

In this particular case it does get caught on rpm-ostree is not new enough for new-format image first, which is odd, and according to the log, we are running 412.86.202208101039-0. Which has rpm-ostree 2022.10.94.g89f58028

That will be fixed by https://issues.redhat.com/browse/OCPBUGS-1429 which would only work around the firstboot problem, but then we'd the problem of missing rhel-coreos-8 for day 2 changes and on that, let's discuss in #1767 where I debugged this farther.

@csrwng csrwng mentioned this pull request Sep 30, 2022
1 task
@openshift-merge-robot openshift-merge-robot merged commit ed71fb1 into openshift:main Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants