-
Notifications
You must be signed in to change notification settings - Fork 35
Add cluster-api machine/machineset controllers. #217
Conversation
b96ad46
to
aee4b78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the comments that I made on the machine controller are applicable to the machineset controller as well (although I don't think we want the machineset controller at all).
go machinesetapi.NewController( | ||
ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().Clusters(), | ||
ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().MachineSets(), | ||
ctx.ClientBuilder.KubeClientOrDie("clusteroperator-cluster-controller"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change name of user agent here and in startMachineAPIController
to reflect the name of the controller.
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
machineset: master | ||
machineset: ${CLUSTER_NAME}-master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these label keys come from upstream? If not, then we should use clusteroperator.openshift.io/machineset
and clusteroperator.openshift.io/cluster
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not but IMO they should, no movement on the issues opened though.
pkg/apis/clusteroperator/types.go
Outdated
@@ -159,6 +159,7 @@ type ClusterProviderConfigSpec struct { | |||
|
|||
// ClusterSpec is the specification of a cluster's hardware and configuration | |||
type ClusterSpec struct { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an accidental newline?
pkg/apis/clusteroperator/types.go
Outdated
@@ -413,19 +430,36 @@ type MachineSetConfig struct { | |||
|
|||
// NodeLabels specifies the labels that will be applied to nodes in this | |||
// MachineSet | |||
// TODO: may be obsolete as well via Cluster API selector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything in the upstream API that would replace this. This is meant to apply labels to the Nodes created. The Selector
in MachineSetSpec
is for labels on the Machines.
// VMImage contains a specified single image to use for a supported cloud provider. | ||
type VMImage struct { | ||
// +optional | ||
AWSImage *string `json:"awsImage"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a *string
instead of a string
? Do we need to differentiate between a nil and an empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just optional per API conventions, as we expand to other clouds this may or may not be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is not correct otherwise I'll assume to leave it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that *string
is fine for now. I would expect that one of two things happen as we add more clouds. Either (1) the *string
will be replaced by a pointer to an AWS-specific struct, that may just contain a string
, or (2) we forego the union idea since the the VM image information is the same for every cloud and there is a one-to-one relationship between machineset and cloud.
} | ||
|
||
// Run runs c; will not return until stopCh is closed. workers determines how | ||
// many clusters will be handled in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/clusters/machines/
} | ||
|
||
// enqueueAfter will enqueue a cluster after the provided amount of time. | ||
func (c *Controller) enqueueAfter(cluster *clustopv1.Cluster, after time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used. And enqueueing the wrong thing if it were.
return true | ||
} | ||
|
||
// syncCluster will sync the cluster with the given key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/syncCluster/syncMachineSet/
|
||
// TODO: once cluster controller is setting resolved cluster version refs on the | ||
// ClusterStatus, replace this with the resolved ref not a lookup | ||
clusterVersion, err := c.clustopClient.ClusteroperatorV1alpha1().ClusterVersions(clusterSpec.ClusterVersionRef.Namespace).Get(clusterSpec.ClusterVersionRef.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use a lister to fetch the Cluster but the client directly to fetch the ClusterVersion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which should be used when?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the official stance on this. My ruke of thumb is to prefer Lister for any type that is used regularly and of which we need a large subset of the total population. ClusterVersion fits into that description. Lister saves us from having to go back to the API Server to service the query at the cost of having to store the objects locally and watch the API Server for changes.
return err | ||
} | ||
|
||
machineSpec, err := controller.PopulateMachineSpec(ms.Spec.Template.Spec, clusterSpec, clusterVersion, msLog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we have this controller at all. Presumably, this is running in the root cluster, where the user is creating the MachineSet objects. We should not have a controller that is modifying user-created objects. The whole point of the capi-machine controller is to update the template for the Machine objects, that cluster-operator (via cluster-api) owns.
If we stop trying to write the Cluster and ClusterVersion information into the MachineSet, then we won't have a problem with trying to store user-specified and cluster-derived defaults in the same field as detailed in my comment in PopulateMachineSpec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller exists to make sure the spec template is updated before we send down to the target cluster (soon) and it starts to act on it, as we don't intend to have the capi-machine controller running remotely and there are no cluster versions defined. I expected we would do similar for MachineDeployment.
We need some place to pass that data, and ideally do some calculation logic that remains under the root clusters control. If all of MachineTemplateSpec providerConfig is off limits I'm not sure how we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we wouldn't fill out the template at the time that we are sending the data to the remote cluster. I don't see why we need to mutate our root cluster MachineSet objects to satisfy the information being in the remote Machine templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but then we've got different representations of the same thing in the root and the target which strikes me as quite unusual, and if we get into syncing or ever supporting changes made remotely, we'll have some problems to deal with. Possible that can be reconciled but it's something we should watch out for. I can back this controller up and pull it out of this PR though.
Updated and dropping WIP, ready for merge or more review. I do want to revisit that is generated vs set by user on that template in future though. |
/test e2e |
@dgoodwin You did not drop the WIP. Is that an oversight? Or is it still WIP? |
return c | ||
} | ||
|
||
// Controller manages clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is not accurate.
@@ -338,7 +339,9 @@ func NewControllerInitializers() map[string]InitFunc { | |||
controllers["nodeconfig"] = startNodeConfigController | |||
controllers["deployclusterapi"] = startDeployClusterAPIController | |||
|
|||
// Controllers ported to upstream cluster API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for the controllers to be "ported to upstream cluster API"? At best this comment is confusing. At worst it is inaccurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR started out as a port of cluster controller to use the new API, and has been through many iterations since. I will update it.
Please keep tone of PR comments in mind.
@@ -536,6 +539,21 @@ func startClusterController(ctx ControllerContext) (bool, error) { | |||
return true, nil | |||
} | |||
|
|||
func startMachineAPIController(ctx ControllerContext) (bool, error) { | |||
if !resourcesAvailable(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This always returns true since it is not checking for any resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating to check for both API resources being ready as this uses both.
ctx.ClientBuilder.KubeClientOrDie("clusteroperator-capi-machine-controller"), | ||
ctx.ClientBuilder.ClientOrDie("clusteroperator-capi-machine-controller"), | ||
ctx.ClientBuilder.ClusterAPIClientOrDie("clusteroperator-capi-machine-controller"), | ||
).Run(int(ctx.Options.ConcurrentClusterSyncs), ctx.Stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not appropriate to use ConcurrentClusterSyncs
here.
"time" | ||
|
||
"k8s.io/apimachinery/pkg/api/errors" | ||
//metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
|
||
if !apiequality.Semantic.DeepEqual(machine, newMachine) { | ||
// TODO: updating the machine here, should this be a patch operation on the template spec? | ||
// NOTE: updating here immediately triggers another sync... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this note is necessary. That is overwhelmingly (if not universally) true in controllers.
pkg/clusterapi/aws/actuator.go
Outdated
rawClusterVersion, ok := coMachineSet.Annotations[clusterVersionAnnotation] | ||
if !ok { | ||
return nil, nil, fmt.Errorf("Missing ClusterVersion resource annotation in MachineSet %#v", coMachineSet) | ||
func (a *Actuator) clusterOperatorMachineSetSpec(m *clusterv1.Machine) (*cov1.MachineSetSpec, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use MachineSetSpecFromClusterAPIMachineSpec
from pkg/controller/controller_utils.go
instead of defining this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost surely, this is quite old and was written in a separate repo IIRC.
pkg/controller/controller_utils.go
Outdated
} | ||
spec, ok := obj.(*clusteroperator.MachineSetProviderConfigSpec) | ||
if !ok { | ||
return nil, fmt.Errorf("Unexpected object: %#v", obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in clusterOperatorMachineSetSpec
in pkg/clusterapi/aws/actuator.go
, let's use the GroupVersionKind
instead of dumping the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating this here and for a couple other similar methods in this package.
pkg/controller/controller_utils.go
Outdated
for _, regionAMI := range clusterVersion.Spec.VMImages.AWSImages.RegionAMIs { | ||
if regionAMI.Region == clusterSpec.Hardware.AWS.Region { | ||
return &clusteroperator.VMImage{ | ||
AWSImage: ®ionAMI.AMI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy regionAMI.AMI
to a new string and use a pointer to that. It would be really hard to track down if somewhere the value stored in this pointer where modified and it mutated a ClusterVersion stored in the lister.
pkg/logging/logging.go
Outdated
@@ -33,3 +35,18 @@ func WithMachineSet(logger log.FieldLogger, machineSet *clusteroperator.MachineS | |||
func WithCluster(logger log.FieldLogger, cluster *clusteroperator.Cluster) log.FieldLogger { | |||
return logger.WithField("cluster", fmt.Sprintf("%s/%s", cluster.Namespace, cluster.Name)) | |||
} | |||
|
|||
// WithClusterAPI expands a logger's context to include info about the given cluster. | |||
func WithClusterAPI(logger log.FieldLogger, cluster *clusterv1.Cluster) log.FieldLogger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/WithClusterAPI/WithClusterAPICluster/
PR updates. |
These are responsible for populating some portions of the machineSpec providerConfig on machines. - calculate the VMImage to use (uses cluster version and we expect additional logic in future, which will only need to exist in the root cluster) - copy the cluster hardware spec for use in deletion (when cluster may no longer exist) - apply hardware defaults if applicable These are only used in the root cluster, when we decide to send a machineset down to the target cluster, we will update the spec beforehand. AWS actuator is updated to use the correct data going forward. (followup PR will solve credentials to the actuator can actually create the master machineset in the root cluster, presently this is still broken)
Updated again. /test unit |
/lgtm |
These are responsible for populating some portions of the machineSpec
providerConfig on both machines and machinesets.
additional logic in future, which will only need to exist in the root
cluster)
no longer exist)
These are only used in the root cluster, when we decide to send a
machineset down to the target cluster, we will update the spec
beforehand.
AWS actuator is updated to use the correct data going forward. (followup
PR will solve credentials to the actuator can actually create the master
machineset in the root cluster, presently this is still broken)