-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement nodegroup updates #1469
Implement nodegroup updates #1469
Conversation
I would love to add a |
yes please implement |
The AWS API calls it EDIT: I'll get back to this on monday. I need to check what happens if an update fails due to blocking PDBS. Of course, there is the force option, but do we need to tell the controller to stop updating if the update failed? The last thing we want to do is keep a NodeGroup permanently updating. I think. |
yes then use force - not checked the API |
Creating a pull request after reviewing an existing pull request that inplements the same functionality is pretty bad etiquette and sets a bad example for open source projects. I have closed my PR in favour of yours as it is better and more comprehensive but doing things in this manner is pretty discouraging to others who want to contribute, take time to do so and then are basically told their time means nothing. |
@adamjohnson01 I am sorry that my PR made you feel that way and it was in no way meant to devalue your work. Thank you for teaching me about the etiquette. I appreciate that, since I am quite new to this process. |
@baldorch, no problem. Thank you for your acknowledgement and reply! |
When we are reconciling a NodeGroup and encounter an empty LaunchTemplate Version we can think of two reasons why that version is empty: It was removed or it has never been there. We cannot (easily) differentiate between the two causes. Since leaving it empty and doing nothing is unexpected behaviour that would need an explanation, especially when the version was removed from the spec, this commit will add code that makes sure that the LaunchTemplate Version is always being reconciled into the spec. Like this, installation with an empty LaunchTemplate Version to select the default Version of that LaunchTemplate is possible and there are no ambiguities in state later on. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
When changing the releaseVersion or the version of the LaunchTemplate, the NodeGroup is no longer up to date. This Commit extends the `eks.IsNodeGroupUpToDate` method to recognize changes in these properties and adds tests for the new functionality. Two test cases defined a releaseVersion within the `ekstypes.Nodegroup` object but not the `manualv1alpha1.NodeGroupParameters` object. Since this difference is now important these test cases needed adjustment as well. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
When the Version, ReleaseVersion or LaunchTemplate version of a NodeGroup changes an `UpdateNodegroupVersion` request needs to be sent to AWS to trigger an update of the NodeGroup. This commit adds a method that detects what changes have been made and constructs an `UpdateNodegroupVersionInput` object which can be used to request the inteded update. If no changes have been made, this method will indicate that, so that it can be used to test if a VersionUpdate should be performed. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
e9ba9ed
to
bba8c59
Compare
After thinking about the There is a NodegroupUpdateConfig AWS EKS API Object where we can specify |
After thinking a bit more about this, I came to the conclusion that keeping the node Updating is exactly what we want. That is the purpose of a reconciler: Trying again and again to move Spec to Status. We still have no information about why or if an update failed, that is why I've created #1475. |
This commit will add the NodeGroupUpdateConfig type that allows configuration of how many nodes are unavailable during update and if an update should be forced (i.e. if PodEvictionFailures should be ignored) or not. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
if a NodeGroupUpdateConfig is specified it will now be passed on to AWS on creation Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
If a NodeGroup is created without an update config, AWS will create an update config for us. This commit will make sure that the update config is late initialized to our resource. Furthermore, we do not want to force updates by default. This commit will make sure that if no Force property is set the default of "false" will be added to the resource. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit adds functionality to detect changes in the NodeGroupUpdateConfig and to apply these changes to the external resource. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit will apply the force flag to the UpdateNodegroupVersionInput if it is configured in the UpdateConfig. Like this, PodEvictionFailure during Updates can be mitigated. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
This commit will add the current state of the update config to the NodeGroup status. Signed-off-by: Robert Jabs <robert.jabs@deutschebahn.com>
Since this was not merged an I based my work on this branch I've added the changes for #1474 and the implementation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks we tested this function the last day in our infrastructure is working as expected - thanks for implementation ;)
Description of your changes
This PR implements almost full
NodeGroupUpdateVersion
behaviour. It allows for updating thereleaseVersion
and thelaunchTemplate .version
properties of a NodeGroup. It also late initialises thelaunchTemplate.version
property after creation, so that reconciliation can be performed without ambiguities.Fixes #1073 by triggering an Update on
releaseVersion
changeFixes #1455 by triggering an Update on
launchTemplate.version
changeFixes #1474 by implementing NodegroupUpdateConfig and forcing if configured
I have:
make reviewable test
to ensure this PR is ready for review.make reviewable test
is running as I write this. I will tick the box as soon as it finished without errors.How has this code been tested
Testing LaunchTemplate and ReleaseVersion Updates
I created a NodeGroup for an existing EKS Cluster (1.22) where I configured
launchTemplate.ID
version
1.22releaseVersion
1.22.9-20220802I made sure that the
launchTemplate.version
was added by the controller.I created a new LaunchTemplate version and changed the NodeGroup to use
releaseVersion
1.22.12-20220811launchTemplate.version
2The update was constructed and performed as expected.
I edited the NodeGroup to again use
launchTemplate.version
1 and verified that the update was performed as expected.I updated the EKS Cluster to version 1.23 and edited the NodeGroup to use:
version
1.23releaseVersion
1.23.7-20220802launchTemplate.version
2I verified that the update was performed as expected.
I then just changed the
releaseVersion
to 1.23.9-20220811 and again verified that the update was performed as expected.Testing NodeGroupUpdateConfig
I created a NodeGroup without configuring an
updateConfig
and made sure that:NodeGroupUpdateConfig
got late initialised with the default valuefalse
forupdateConfig.force
and1
forupdateConfig.maxUnavailable
NodeGroupUpdateConfigStatus
was populated correctlyThen I created a Deployment and an aggressive PDB (i.e.
minAvailable: "100%"
), created a new LaunchTemplateVersion, updated the NodeGroup to use the new Version and waited for the update to fail - which it did withPodEvictionFailure
.I then changed the LaunchTemplateVersion back and set
updateConfig.force
totrue
and verified that the update was successful.I also made sure that I can change the value of
updateConfig.maxUnavailable
to2
and that this change is reconciled to AWS and also visible in the status of the NodeGroup.