-
Notifications
You must be signed in to change notification settings - Fork 461
MCO-1877: MCO-1879: MCO-1882: MCO-1884: Implement boot image skew enforcement MVP #5428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1877 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
dc9203e to
7b578ab
Compare
|
/retest-required |
This change adds logic to populate the BootImageSkewEnforcementStatus field in the MachineConfiguration status based on spec configuration, platform support, and cluster version information.
This adds new unit tests for TestSyncMachineConfiguration to test the BootImageSkewEnforcementStatus sync logic added in the previous commit.
This commit updates the machine-set-boot-image controller to track and update the BootImageSkewEnforcementStatus when in Automatic mode.
This commit implements upgrade blocking when boot image version skew exceeds acceptable limits, via the ClusterOperator Upgradeable condition.
This commit adds unit tests for the new Upgradeable guards added in the previous commit.
7b578ab to
dddd5c7
Compare
|
@djoshy: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR integrates the boot image skew enforcement API introduced in openshift/api#2357. This involves the following changes:
bootImageSkewEnforcementStatusfield in theMachineConfigurationobject based onspec.bootImageSkewEnforcement, platform defaults and cluster version.bootImageSkewEnforcementStatuson a successful boot image update. Note that this requires the skew enforcement to be set toAutomaticmode, and all machinesets to be opt-ed in for boot image updates.Upgradeable=Falseif the cluster is to be detected to be out of skew. This is done by comparing the boot image values referenced in thebootImageSkewEnforcementStatusfield against the MCO's hardcoded skew limits.sync_test.goandstatus_test.goto verify the above mechanisms.Verifying API behavior
This verification will have to be done based on the platform. If the platform:
status.managedBootImagesStatusis set toAllifspec.managedBootImagesis empty. Then, skew enforcement status will be set toAutomatic, with a boot image version estimated from cluster version. Then, the boot image controller will perform a sync which will update the boot image(if required) and after all resources have been successfully updated, it will update the boot image value stored in the skew enforcement status. The value set will be the OCPreleaseVersiondescribed by thecoreos-bootimagesconfigmap. Here's an example:status.managedBootImagesStatusis set toNoneifspec.managedBootImagesis empty. Then, skew enforcement status will be set toManual, with a boot image version estimated from cluster version. The object would now look like this:The admin can choose to opt-in for boot image updates in this case(set
spec.ManagedBootImagestoAll), and the operator should automatically switch the skew enforcement status toAutomatic, with the appropriate boot image version. This would mean the object would finally look like this:status.managedBootImagesStatusis empty andspec.managedBootImagescannot be set by the admin. Then, skew enforcement status will be set toManual, with a boot image version estimated from cluster version. The object would now look like this:In this case, the admin is expected to manually perform boot image updates and then add a spec field like so:
The operator should then update the status to include this:
The above snippet is if an admin had chosen to record the
OCPVersion. In manual mode, the admin can also choose to to store theRHCOSVersion, like so:Note that only one of RHCOSVersion or OCPVersion is permitted in
Manualmode.The admin can also choose to disable skew enforcement altogether by setting it
Nonemode in spec.Verifying upgrade block
Upgrades will be blocked when the cluster is to determined out of skew. This mechanism works the same way in manual and automatic mode, although it is likely easier to verify in manual mode. The current thresholds for a skew violation is set to when OCP first moved to RHEL9, which corresponds to RHEL version 9.2 and OCP version 4.13.0. The operator will perform semver comparisons of these thresholds against the boot image versions stored in
bootImageSkewEnforcementStatusand setUpgradeable=Falseif necessary. To verify this, first set the mode to Manual with an out of skew boot image version like so:Now, examine the
machine-configCO object's conditions field, it should indicate an issue preventing upgrades like so:Next, set the boot image to one within the skew limits:
Then, the
Upgradeablecondition should be restored back toTrueThese set of steps can be repeated with the OCPVersion specified too. This comparison should only take place in
AutomaticandManualmode however, asAutomaticis only permitted on the status side, I don't think there is an easy way to test that(other than the units I've included).In
Nonemode, this version check should not take place.Some caveats to note about
Automaticmode:Automaticmode within the spec. This is in an intentional choice because only the MCO will always be able to self determine if a platform is eligible for automatic skew enforcement.Automaticmode, API validations will prevent changing the boot image configuration to a setting other thanAll. To change the boot image configuration, the admin is first expected to go toManualskew enforcement mode and then attempt to change the boot image configuration of the cluster.Automaticmode, if any machinesets are skipped for boot image updates(for example a marketplace or an unknown boot image was detected in any of the machinesets), the boot image controller will not update the boot image value stored in bootImageEnforcementStatus. This is because the cluster cannot be considered up to date on boot image if even one of the machine resources are out of skew.Automaticmode, the operator will only populate the OCPVersion. This is because each platform may not have the same RHCOS version of the boot image(for example, across marketplace streams) in a given release, and it would involve a lot of per-platform piping to correctly track the RHCOS version per machineset within the boot image controller. I did not deem this to be worth the effort, but am open to implementing that later if the need arises.