Add cluster-api machine/machineset controllers. #217

dgoodwin · 2018-05-14T17:14:04Z

These are responsible for populating some portions of the machineSpec
providerConfig on both machines and machinesets.

calculate the VMImage to use (uses cluster version and we expect
additional logic in future, which will only need to exist in the root
cluster)
copy the cluster hardware spec for use in deletion (when cluster may
no longer exist)
apply hardware defaults if applicable

These are only used in the root cluster, when we decide to send a
machineset down to the target cluster, we will update the spec
beforehand.

AWS actuator is updated to use the correct data going forward. (followup
PR will solve credentials to the actuator can actually create the master
machineset in the root cluster, presently this is still broken)

staebler

Most of the comments that I made on the machine controller are applicable to the machineset controller as well (although I don't think we want the machineset controller at all).

staebler · 2018-05-14T18:18:03Z

cmd/cluster-operator-controller-manager/app/controllermanager.go

+	go machinesetapi.NewController(
+		ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().Clusters(),
+		ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().MachineSets(),
+		ctx.ClientBuilder.KubeClientOrDie("clusteroperator-cluster-controller"),


Change name of user agent here and in startMachineAPIController to reflect the name of the controller.

staebler · 2018-05-14T18:20:30Z

contrib/examples/clusterapi-template.yml

  spec:
    replicas: 1
    selector:
      matchLabels:
-        machineset: master
+        machineset: ${CLUSTER_NAME}-master


Do these label keys come from upstream? If not, then we should use clusteroperator.openshift.io/machineset and clusteroperator.openshift.io/cluster instead.

They do not but IMO they should, no movement on the issues opened though.

staebler · 2018-05-14T18:22:21Z

pkg/apis/clusteroperator/types.go

@@ -159,6 +159,7 @@ type ClusterProviderConfigSpec struct {

 // ClusterSpec is the specification of a cluster's hardware and configuration
 type ClusterSpec struct {
+


Is this an accidental newline?

staebler · 2018-05-14T18:28:43Z

pkg/apis/clusteroperator/types.go

@@ -413,19 +430,36 @@ type MachineSetConfig struct {

 	// NodeLabels specifies the labels that will be applied to nodes in this
 	// MachineSet
+	// TODO: may be obsolete as well via Cluster API selector


I don't see anything in the upstream API that would replace this. This is meant to apply labels to the Nodes created. The Selector in MachineSetSpec is for labels on the Machines.

staebler · 2018-05-14T18:56:28Z

pkg/apis/clusteroperator/v1alpha1/types.go

+// VMImage contains a specified single image to use for a supported cloud provider.
+type VMImage struct {
+	// +optional
+	AWSImage *string `json:"awsImage"`


Why is this a *string instead of a string? Do we need to differentiate between a nil and an empty string?

No, just optional per API conventions, as we expand to other clouds this may or may not be set.

Let me know if this is not correct otherwise I'll assume to leave it in.

I suppose that *string is fine for now. I would expect that one of two things happen as we add more clouds. Either (1) the *string will be replaced by a pointer to an AWS-specific struct, that may just contain a string, or (2) we forego the union idea since the the VM image information is the same for every cloud and there is a one-to-one relationship between machineset and cloud.

staebler · 2018-05-15T02:56:48Z

pkg/controller/clusterapimachine/machine_controller.go

+}
+
+// Run runs c; will not return until stopCh is closed. workers determines how
+// many clusters will be handled in parallel.


s/clusters/machines/

staebler · 2018-05-15T02:57:55Z

pkg/controller/clusterapimachine/machine_controller.go

+}
+
+// enqueueAfter will enqueue a cluster after the provided amount of time.
+func (c *Controller) enqueueAfter(cluster *clustopv1.Cluster, after time.Duration) {


Not used. And enqueueing the wrong thing if it were.

staebler · 2018-05-15T02:58:35Z

pkg/controller/clusterapimachineset/machineset_controller.go

+	return true
+}
+
+// syncCluster will sync the cluster with the given key.


s/syncCluster/syncMachineSet/

staebler · 2018-05-15T03:00:07Z

pkg/controller/clusterapimachine/machine_controller.go

+
+	// TODO: once cluster controller is setting resolved cluster version refs on the
+	// ClusterStatus, replace this with the resolved ref not a lookup
+	clusterVersion, err := c.clustopClient.ClusteroperatorV1alpha1().ClusterVersions(clusterSpec.ClusterVersionRef.Namespace).Get(clusterSpec.ClusterVersionRef.Name, metav1.GetOptions{})


Why do we use a lister to fetch the Cluster but the client directly to fetch the ClusterVersion?

Which should be used when?

I don't know the official stance on this. My ruke of thumb is to prefer Lister for any type that is used regularly and of which we need a large subset of the total population. ClusterVersion fits into that description. Lister saves us from having to go back to the API Server to service the query at the cost of having to store the objects locally and watch the API Server for changes.

staebler · 2018-05-15T03:05:13Z

pkg/controller/clusterapimachineset/machineset_controller.go

+		return err
+	}
+
+	machineSpec, err := controller.PopulateMachineSpec(ms.Spec.Template.Spec, clusterSpec, clusterVersion, msLog)


I don't understand why we have this controller at all. Presumably, this is running in the root cluster, where the user is creating the MachineSet objects. We should not have a controller that is modifying user-created objects. The whole point of the capi-machine controller is to update the template for the Machine objects, that cluster-operator (via cluster-api) owns.

If we stop trying to write the Cluster and ClusterVersion information into the MachineSet, then we won't have a problem with trying to store user-specified and cluster-derived defaults in the same field as detailed in my comment in PopulateMachineSpec.

The controller exists to make sure the spec template is updated before we send down to the target cluster (soon) and it starts to act on it, as we don't intend to have the capi-machine controller running remotely and there are no cluster versions defined. I expected we would do similar for MachineDeployment.

We need some place to pass that data, and ideally do some calculation logic that remains under the root clusters control. If all of MachineTemplateSpec providerConfig is off limits I'm not sure how we do that.

I'm not sure why we wouldn't fill out the template at the time that we are sending the data to the remote cluster. I don't see why we need to mutate our root cluster MachineSet objects to satisfy the information being in the remote Machine templates.

We could but then we've got different representations of the same thing in the root and the target which strikes me as quite unusual, and if we get into syncing or ever supporting changes made remotely, we'll have some problems to deal with. Possible that can be reconciled but it's something we should watch out for. I can back this controller up and pull it out of this PR though.

dgoodwin · 2018-05-16T15:18:30Z

Updated and dropping WIP, ready for merge or more review. I do want to revisit that is generated vs set by user on that template in future though.

dgoodwin · 2018-05-16T15:21:06Z

/test e2e

staebler · 2018-05-17T02:05:44Z

Updated and dropping WIP

@dgoodwin You did not drop the WIP. Is that an oversight? Or is it still WIP?

staebler · 2018-05-17T02:09:07Z

pkg/controller/clusterapimachine/machine_controller.go

+	return c
+}
+
+// Controller manages clusters.


Comment is not accurate.

staebler · 2018-05-17T02:10:43Z

cmd/cluster-operator-controller-manager/app/controllermanager.go

@@ -338,7 +339,9 @@ func NewControllerInitializers() map[string]InitFunc {
 	controllers["nodeconfig"] = startNodeConfigController
 	controllers["deployclusterapi"] = startDeployClusterAPIController

+	// Controllers ported to upstream cluster API


What does it mean for the controllers to be "ported to upstream cluster API"? At best this comment is confusing. At worst it is inaccurate.

PR started out as a port of cluster controller to use the new API, and has been through many iterations since. I will update it.

Please keep tone of PR comments in mind.

staebler · 2018-05-17T02:12:20Z

cmd/cluster-operator-controller-manager/app/controllermanager.go

@@ -536,6 +539,21 @@ func startClusterController(ctx ControllerContext) (bool, error) {
 	return true, nil
 }

+func startMachineAPIController(ctx ControllerContext) (bool, error) {
+	if !resourcesAvailable(ctx) {


This always returns true since it is not checking for any resources.

Updating to check for both API resources being ready as this uses both.

staebler · 2018-05-17T02:12:56Z

cmd/cluster-operator-controller-manager/app/controllermanager.go

+		ctx.ClientBuilder.KubeClientOrDie("clusteroperator-capi-machine-controller"),
+		ctx.ClientBuilder.ClientOrDie("clusteroperator-capi-machine-controller"),
+		ctx.ClientBuilder.ClusterAPIClientOrDie("clusteroperator-capi-machine-controller"),
+	).Run(int(ctx.Options.ConcurrentClusterSyncs), ctx.Stop)


It is not appropriate to use ConcurrentClusterSyncs here.

staebler · 2018-05-17T02:20:44Z

pkg/controller/clusterapimachine/machine_controller.go

+	"time"
+
+	"k8s.io/apimachinery/pkg/api/errors"
+	//metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"


staebler · 2018-05-17T02:30:39Z

pkg/controller/clusterapimachine/machine_controller.go

+
+	if !apiequality.Semantic.DeepEqual(machine, newMachine) {
+		// TODO: updating the machine here, should this be a patch operation on the template spec?
+		// NOTE: updating here immediately triggers another sync...


I don't think that this note is necessary. That is overwhelmingly (if not universally) true in controllers.

staebler · 2018-05-17T02:34:29Z

pkg/clusterapi/aws/actuator.go

-	rawClusterVersion, ok := coMachineSet.Annotations[clusterVersionAnnotation]
-	if !ok {
-		return nil, nil, fmt.Errorf("Missing ClusterVersion resource annotation in MachineSet %#v", coMachineSet)
+func (a *Actuator) clusterOperatorMachineSetSpec(m *clusterv1.Machine) (*cov1.MachineSetSpec, error) {


Can we use MachineSetSpecFromClusterAPIMachineSpec from pkg/controller/controller_utils.go instead of defining this function?

Almost surely, this is quite old and was written in a separate repo IIRC.

staebler · 2018-05-17T02:35:36Z

pkg/controller/controller_utils.go

+	}
+	spec, ok := obj.(*clusteroperator.MachineSetProviderConfigSpec)
+	if !ok {
+		return nil, fmt.Errorf("Unexpected object: %#v", obj)


As in clusterOperatorMachineSetSpec in pkg/clusterapi/aws/actuator.go, let's use the GroupVersionKind instead of dumping the object.

Updating this here and for a couple other similar methods in this package.

staebler · 2018-05-17T02:48:51Z

pkg/controller/controller_utils.go

+	for _, regionAMI := range clusterVersion.Spec.VMImages.AWSImages.RegionAMIs {
+		if regionAMI.Region == clusterSpec.Hardware.AWS.Region {
+			return &clusteroperator.VMImage{
+				AWSImage: &regionAMI.AMI,


Copy regionAMI.AMI to a new string and use a pointer to that. It would be really hard to track down if somewhere the value stored in this pointer where modified and it mutated a ClusterVersion stored in the lister.

staebler · 2018-05-17T02:51:54Z

pkg/logging/logging.go

@@ -33,3 +35,18 @@ func WithMachineSet(logger log.FieldLogger, machineSet *clusteroperator.MachineS
 func WithCluster(logger log.FieldLogger, cluster *clusteroperator.Cluster) log.FieldLogger {
 	return logger.WithField("cluster", fmt.Sprintf("%s/%s", cluster.Namespace, cluster.Name))
 }
+
+// WithClusterAPI expands a logger's context to include info about the given cluster.
+func WithClusterAPI(logger log.FieldLogger, cluster *clusterv1.Cluster) log.FieldLogger {


s/WithClusterAPI/WithClusterAPICluster/

dgoodwin · 2018-05-17T13:15:53Z

PR updates.

These are responsible for populating some portions of the machineSpec providerConfig on machines. - calculate the VMImage to use (uses cluster version and we expect additional logic in future, which will only need to exist in the root cluster) - copy the cluster hardware spec for use in deletion (when cluster may no longer exist) - apply hardware defaults if applicable These are only used in the root cluster, when we decide to send a machineset down to the target cluster, we will update the spec beforehand. AWS actuator is updated to use the correct data going forward. (followup PR will solve credentials to the actuator can actually create the master machineset in the root cluster, presently this is still broken)

dgoodwin · 2018-05-17T15:05:26Z

Updated again.

/test unit

staebler · 2018-05-17T16:56:40Z

/lgtm

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 14, 2018

dgoodwin force-pushed the cluster-api branch 2 times, most recently from b96ad46 to aee4b78 Compare May 14, 2018 17:32

staebler reviewed May 15, 2018

View reviewed changes

dgoodwin force-pushed the cluster-api branch from aee4b78 to 867c61e Compare May 16, 2018 15:16

staebler reviewed May 17, 2018

View reviewed changes

dgoodwin force-pushed the cluster-api branch from 867c61e to d1ce6f2 Compare May 17, 2018 13:14

dgoodwin force-pushed the cluster-api branch from d1ce6f2 to 2e809c6 Compare May 17, 2018 13:49

openshift-ci-robot assigned staebler May 17, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2018

dgoodwin changed the title ~~WIP: Add cluster-api machine/machineset controllers.~~ Add cluster-api machine/machineset controllers. May 17, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2018

openshift-merge-robot merged commit 1d86123 into openshift:master May 17, 2018

dgoodwin deleted the cluster-api branch May 17, 2018 18:24

		@@ -159,6 +159,7 @@ type ClusterProviderConfigSpec struct {

		// ClusterSpec is the specification of a cluster's hardware and configuration
		type ClusterSpec struct {

Add cluster-api machine/machineset controllers. #217

Add cluster-api machine/machineset controllers. #217

Conversation

dgoodwin commented May 14, 2018

staebler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgoodwin commented May 16, 2018

dgoodwin commented May 16, 2018

staebler commented May 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgoodwin commented May 17, 2018

dgoodwin commented May 17, 2018

staebler commented May 17, 2018