-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failures: machine-config-server is not ready #414
Comments
@dgoodwin I believe that those errors will be seen a few times and aren't what caused your failure. I'm not familiar with hive, but I looked thru the MCS logs and see:
Are you using port 49500 elsewhere? Due to issue: #166 I recently put in a fix (#368) to transition from using 49500 -> 22623. Might that solve the problem? You'd have to pull master to get it though since it just merged this afternoon. see also: openshift/installer#1180 |
Also @dgoodwin can you add a bit of info about what you are running installer version, any modifictions, etc.. as I'm not familiar with Hive. |
The MCC logs seem reasonably normal, the machineconfigs were generated as expected:
|
MCO logs seem ok as well:
I'm also no seeing any errors in the the MCD logs either but I'll spare everyone more pasting :) |
Summary: And after looking thru all of the MCx logs, can confirm that the only error seems to be in MCS with the port: The installer change only went in a few hours ago, but master should be using 22623 now instead of 49500. |
This was a CI job flake, so it's basically just running the installer from master and it's floating release image and would have occurred around the timestamp on the files in the links. There's nothing we're doing with that port in play that I can think of. It's possible I missed the merge, and I do know we made it through a PR later in the day. |
Thanks for the update. I'll leave this open for a bit to see if it recurs though aws was having lot of issues yesterday. |
Closing for now, feel free to reopen if this comes up again. |
Reopening this bc I just saw this in CI today: |
Saw exact same error as author above: and saw in MCS logs: |
is the installer that's running in e2e-aws master (since I put in that PR that doesn't use 49500 anymore)? Will #423 take care of this once and for all since it completely removes 49500 or is the problem coming from some other issue like the api serve breakdowns? The PR in question has passed e2e-aws (with no code changes) before it failed with this. Not sure if this why this is happening. |
Yeah, should be fixed now that #423 is merged. |
fixed by #423 |
In Hive we're seeing failures launching a cluster that appear related to machine config:
https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_hive/221/pull-ci-openshift-hive-master-e2e/40/
Installer log (https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_hive/221/pull-ci-openshift-hive-master-e2e/40/artifacts/e2e/installer/.openshift_install.log) shows:
time="2019-02-12T18:39:57Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator machine-config is reporting a failure: Failed when progressing towards 3.11.0-605-g56569135-dirty because: error syncing: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-server is not ready. status: (desired: 3, updated: 3, ready: 2, unavailable: 2)"
time="2019-02-12T18:59:07Z" level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"
In the pod logs I found:
I0212 18:29:07.506788 1 render_controller.go:456] Generated machineconfig master-701b1a947a6b7a021d9adb12cbfbe2ab from 3 configs: [{MachineConfig 00-master machineconfiguration.openshift.io/v1 } {MachineConfig 00-master-ssh machineconfiguration.openshift.io/v1 } {MachineConfig 01-master-kubelet machineconfiguration.openshift.io/v1 }]
I0212 18:29:07.606512 1 node_controller.go:345] Error syncing machineconfigpool worker: Empty Current MachineConfig
I0212 18:29:07.606548 1 node_controller.go:345] Error syncing machineconfigpool master: Empty Current MachineConfig
More info available in the artifacts linked above.
The text was updated successfully, but these errors were encountered: