WIP: Add proposal for API to automatically spread MachineSets across AZs. #483

dgoodwin · 2020-09-21T12:25:50Z

No description provided.

openshift-ci-robot · 2020-09-21T12:26:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dgoodwin · 2020-09-21T12:26:19Z

Rough start on this, need a lot of collaboration with the cloud team to fill this out, just the first few sections worked on so far to get the discussion going.

JoelSpeed

Left some initial thoughts, I think we should try to investigate if keeping the AZ spread within MachineSets is viable before we start work on a new CRD controller. Will bring up the topic within the cloud team architecture call next week to discuss

JoelSpeed · 2020-09-25T13:05:59Z

enhancements/machine-api/machine-pools-in-cluster.md

+
+### Implementation Details/Notes/Constraints [optional]
+
+Implementation would require a new CRD and controller aligned hopefully with a pre-existing operator in OCP.


There is upstream discussion about adding this first part into MachineSets (kubernetes-sigs/cluster-api#3358), do we want to explore this as an alternative before we make a decision on implementing a new CRD?

I would 100% prefer this, it's dramatically simpler. If there's a chance, that would be great.

Need to walk back my comment above, after discussing with my team this would not help us with either use case really as MachineSet still requires substantial and non-trivial delicate settings that vary from OpenShift release to release. This makes it difficult for customers to just add their own, and for Hive to add them for customers and stay in sync with what the product expects.

Examples: AMI ID, many fields with the infra ID specified in the name (label selectors, iamInstanceProfile, security group filters, subnet, tags), userDataSecret (name changed twice during the 4.6 cycle, once to new and then back, nearly caused 4.6 blocker because OSD could not install GCP as a result this week).

We still think the best approach here is to make an API to allow customers to easily add new pools of machines.

Not sure if it helps or not, but we added a number of default options for MachineSets in the 4.6 cycle which has drastically reduced the number of options that have to be specified (as the defaulting webhook will fill them in on create anyway).

My concern over creating an extra CRD that creates MachineSets is that it will still need to have these kinds of options wouldn't it? Eg what if a customer wants to change things such as the subnet? Or will this just be a convenience with a limited set of options?

Could the default hook cover everything I mentioned above? In conjunction with spreading over AZs, that might be workable. I will run this past the team.

No the machine pool API would not expose those settings, it's just meant to be a dead simple interface as it is today in hive and the InstallConfig. If you need control over subnets etc, you can go straight to MachineSet.

JoelSpeed · 2020-09-25T13:07:54Z

enhancements/machine-api/machine-pools-in-cluster.md

+
+The code for spreading MachineSets across AZs [already exists today](https://github.com/openshift/installer/blob/master/pkg/asset/machines/aws/machinesets.go) and could be removed from the installer, which would then just pass through an instance of the new API.
+
+The [Hive MachinePool API](https://github.com/openshift/hive/blob/master/pkg/apis/hive/v1/machinepool_types.go) also includes support for enabling auto-scaling on the resulting MachineSets, and this functionality should be carried over to the new API even if not initially supported by the Installer.


This seems like a tangential issue to the main aim of the proposal, do we want to include this here or can we drop this for now and narrow the scope to just the multi-az support?

Somewhat depends on what the outcome is here. If we end up with upstream cluster-api MachineSets supporting multiple AZs, Hive likely then just has to sync one stable MachineSet and one stable Autoscaler config, which seems ok.

However if that is not the case and we do end up with a new MachinePool API of some form, then Hive would be syncing a singular MachinePool but then would need to introspect what was done to sync a per MachineSet autoscaling config. If we've got a higher level construct in-cluster, IMO that needs to encompass the autoscaling config to apply to what it generates.

JoelSpeed · 2020-09-25T13:11:36Z

enhancements/machine-api/machine-pools-in-cluster.md

+
+This is where to call out areas of the design that require closure before deciding
+to implement the design.  For instance,
+ > 1. Upstream cluster-api project has begun using the MachinePool naming for an API that is backed by native cloud provider auto-scaling groups. (ASGs, MIGs) We however are already using this term in both the Installer and Hive. Is it acceptable to push this naming in cluster even if it may one day overlap with a cluster-api type in a different apigroup?


I would prefer we came up with a new name for this as I would expect that at some point we will adopt MachinePools from the cluster-api proejct and I think having two resources in different API groups which do rather different things would be confusing for users

Yeah this is really unfortunate. MachinePool is a thing in the installer InstallConfig and in Hive.

We can work with a new API name though but it's going to be confusing for a long time to come.

JoelSpeed · 2020-09-25T13:12:54Z

enhancements/machine-api/machine-pools-in-cluster.md

+to implement the design.  For instance,
+ > 1. Upstream cluster-api project has begun using the MachinePool naming for an API that is backed by native cloud provider auto-scaling groups. (ASGs, MIGs) We however are already using this term in both the Installer and Hive. Is it acceptable to push this naming in cluster even if it may one day overlap with a cluster-api type in a different apigroup?
+ > 1. How does this feature related to MachineDeployments?
+ > 1. Do we have an existing operator which would be a good home for this controller?


I would suggest the MAO, however it's getting a bit bloated lately 🤔

openshift-bot · 2021-01-07T17:53:24Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-02-06T19:46:34Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-03-08T20:01:25Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2021-03-08T20:01:37Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Add proposal for API to automatically spread MachineSets across AZs.

d0a2b90

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 21, 2020

openshift-ci-robot requested review from abhinavdahiya and ecordell September 21, 2020 12:26

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2020

JoelSpeed reviewed Sep 25, 2020

View reviewed changes

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2021

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 6, 2021

openshift-ci-robot closed this Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add proposal for API to automatically spread MachineSets across AZs. #483

WIP: Add proposal for API to automatically spread MachineSets across AZs. #483

dgoodwin commented Sep 21, 2020

openshift-ci-robot commented Sep 21, 2020

dgoodwin commented Sep 21, 2020

JoelSpeed left a comment

JoelSpeed Sep 25, 2020

dgoodwin Sep 28, 2020

dgoodwin Oct 9, 2020

JoelSpeed Oct 9, 2020

dgoodwin Oct 9, 2020

JoelSpeed Sep 25, 2020

dgoodwin Sep 28, 2020

JoelSpeed Sep 25, 2020

dgoodwin Sep 28, 2020

JoelSpeed Sep 25, 2020

openshift-bot commented Jan 7, 2021

openshift-bot commented Feb 6, 2021

openshift-bot commented Mar 8, 2021

openshift-ci-robot commented Mar 8, 2021


		### Implementation Details/Notes/Constraints [optional]

		Implementation would require a new CRD and controller aligned hopefully with a pre-existing operator in OCP.


		The code for spreading MachineSets across AZs [already exists today](https://github.com/openshift/installer/blob/master/pkg/asset/machines/aws/machinesets.go) and could be removed from the installer, which would then just pass through an instance of the new API.

		The [Hive MachinePool API](https://github.com/openshift/hive/blob/master/pkg/apis/hive/v1/machinepool_types.go) also includes support for enabling auto-scaling on the resulting MachineSets, and this functionality should be carried over to the new API even if not initially supported by the Installer.

WIP: Add proposal for API to automatically spread MachineSets across AZs. #483

WIP: Add proposal for API to automatically spread MachineSets across AZs. #483

Conversation

dgoodwin commented Sep 21, 2020

openshift-ci-robot commented Sep 21, 2020

dgoodwin commented Sep 21, 2020

JoelSpeed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Jan 7, 2021

openshift-bot commented Feb 6, 2021

openshift-bot commented Mar 8, 2021

openshift-ci-robot commented Mar 8, 2021