-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support rolling upgrade on openstack #9927
Support rolling upgrade on openstack #9927
Conversation
Skipping CI for Draft Pull Request. |
e466b49
to
683de5f
Compare
683de5f
to
b9b75d0
Compare
b9b75d0
to
65b3899
Compare
@zetaab maybe worth having a look at the changes here together with the detach instance work? |
pkg/instancegroups/rollingupdate.go
Outdated
@@ -72,10 +74,16 @@ type RollingUpdateCluster struct { | |||
|
|||
// ValidateCount is the amount of time that a cluster needs to be validated after single node update | |||
ValidateCount int | |||
|
|||
Ctx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually there is no need to embed the context within this struct. We can propagate it at every step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got tired of passing on this one all the time :p what is the benefit of keep propagating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the doc states not to embed a context.Context within a struct. However, I found golang/go#22602 which points out that if it is a parameter struct, i.e. to not get a parameter explosion within the function then it would be fine. In our case it is not really only a parameter struct so I'd prefer to keep the context as a first parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not strictly a parameter struct since we define functions on it, but none of the functions mutate the struct, so for all intents and purposes it is. The concerns in that issue is not applicable to this struct either as far as I can tell.
/cc @zetaab |
65b3899
to
6d773d0
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like mentioned in last comment: this will panic. The problem is that c.Clientset is NIL in reconcileInstanceGroup
function
You need pass clientset to variables in cmd/kops/rollingupdatecluster.go
row 330
6d773d0
to
badf57a
Compare
badf57a
to
aa66c4f
Compare
Looks like a bit went missing when I factored out some of the BuildCloud stuff. Could you try again now, @zetaab ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this is not perfect, but it is already much better than before this PR.
Two issues:
- if something happens between single instance delete & apply cluster run command. You need run kops update cluster once to get machines up and running.
- running whole kops cluster update is not the perfect way. We should have possibility to run only some parts of it
However, this is good start!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: olemarkus, zetaab The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Will this make it into 1.18.2? |
No this will be not be backported. The change is pretty big. |
This is a rather brutish approach running close to full
update cluster
between rolls.In order to do a smaller parts of update cluster, I think we first need to split
ApplyClusterCmd.Run()
into smaller parts, perhaps with cloud-specific functions instead of all the switch statements.There are lots of good suggestions in #9635 on how to make the update process smaller, such as writing bootstrapscript to VFS. But there are too many other interdependencies between tasks (such as instance -> port -> subnet -> network) that needs splitting first. Constructing tasks for resources we know exists when inside a rolling-pudate should be easy once ApplyClusterCmd has been split out in the various "phases".
Fixes #9635