Skip to content
This repository has been archived by the owner on Jan 28, 2020. It is now read-only.

Add cluster-api machine/machineset controllers. #217

Merged
merged 1 commit into from
May 17, 2018

Conversation

dgoodwin
Copy link
Contributor

These are responsible for populating some portions of the machineSpec
providerConfig on both machines and machinesets.

  • calculate the VMImage to use (uses cluster version and we expect
    additional logic in future, which will only need to exist in the root
    cluster)
  • copy the cluster hardware spec for use in deletion (when cluster may
    no longer exist)
  • apply hardware defaults if applicable

These are only used in the root cluster, when we decide to send a
machineset down to the target cluster, we will update the spec
beforehand.

AWS actuator is updated to use the correct data going forward. (followup
PR will solve credentials to the actuator can actually create the master
machineset in the root cluster, presently this is still broken)

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 14, 2018
@dgoodwin dgoodwin force-pushed the cluster-api branch 2 times, most recently from b96ad46 to aee4b78 Compare May 14, 2018 17:32
Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the comments that I made on the machine controller are applicable to the machineset controller as well (although I don't think we want the machineset controller at all).

go machinesetapi.NewController(
ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().Clusters(),
ctx.ClusterAPIInformerFactory.Cluster().V1alpha1().MachineSets(),
ctx.ClientBuilder.KubeClientOrDie("clusteroperator-cluster-controller"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change name of user agent here and in startMachineAPIController to reflect the name of the controller.

spec:
replicas: 1
selector:
matchLabels:
machineset: master
machineset: ${CLUSTER_NAME}-master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these label keys come from upstream? If not, then we should use clusteroperator.openshift.io/machineset and clusteroperator.openshift.io/cluster instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not but IMO they should, no movement on the issues opened though.

@@ -159,6 +159,7 @@ type ClusterProviderConfigSpec struct {

// ClusterSpec is the specification of a cluster's hardware and configuration
type ClusterSpec struct {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an accidental newline?

@@ -413,19 +430,36 @@ type MachineSetConfig struct {

// NodeLabels specifies the labels that will be applied to nodes in this
// MachineSet
// TODO: may be obsolete as well via Cluster API selector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything in the upstream API that would replace this. This is meant to apply labels to the Nodes created. The Selector in MachineSetSpec is for labels on the Machines.

// VMImage contains a specified single image to use for a supported cloud provider.
type VMImage struct {
// +optional
AWSImage *string `json:"awsImage"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a *string instead of a string? Do we need to differentiate between a nil and an empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just optional per API conventions, as we expand to other clouds this may or may not be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is not correct otherwise I'll assume to leave it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that *string is fine for now. I would expect that one of two things happen as we add more clouds. Either (1) the *string will be replaced by a pointer to an AWS-specific struct, that may just contain a string, or (2) we forego the union idea since the the VM image information is the same for every cloud and there is a one-to-one relationship between machineset and cloud.

}

// Run runs c; will not return until stopCh is closed. workers determines how
// many clusters will be handled in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/clusters/machines/

}

// enqueueAfter will enqueue a cluster after the provided amount of time.
func (c *Controller) enqueueAfter(cluster *clustopv1.Cluster, after time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used. And enqueueing the wrong thing if it were.

return true
}

// syncCluster will sync the cluster with the given key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/syncCluster/syncMachineSet/


// TODO: once cluster controller is setting resolved cluster version refs on the
// ClusterStatus, replace this with the resolved ref not a lookup
clusterVersion, err := c.clustopClient.ClusteroperatorV1alpha1().ClusterVersions(clusterSpec.ClusterVersionRef.Namespace).Get(clusterSpec.ClusterVersionRef.Name, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use a lister to fetch the Cluster but the client directly to fetch the ClusterVersion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which should be used when?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the official stance on this. My ruke of thumb is to prefer Lister for any type that is used regularly and of which we need a large subset of the total population. ClusterVersion fits into that description. Lister saves us from having to go back to the API Server to service the query at the cost of having to store the objects locally and watch the API Server for changes.

return err
}

machineSpec, err := controller.PopulateMachineSpec(ms.Spec.Template.Spec, clusterSpec, clusterVersion, msLog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have this controller at all. Presumably, this is running in the root cluster, where the user is creating the MachineSet objects. We should not have a controller that is modifying user-created objects. The whole point of the capi-machine controller is to update the template for the Machine objects, that cluster-operator (via cluster-api) owns.

If we stop trying to write the Cluster and ClusterVersion information into the MachineSet, then we won't have a problem with trying to store user-specified and cluster-derived defaults in the same field as detailed in my comment in PopulateMachineSpec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller exists to make sure the spec template is updated before we send down to the target cluster (soon) and it starts to act on it, as we don't intend to have the capi-machine controller running remotely and there are no cluster versions defined. I expected we would do similar for MachineDeployment.

We need some place to pass that data, and ideally do some calculation logic that remains under the root clusters control. If all of MachineTemplateSpec providerConfig is off limits I'm not sure how we do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we wouldn't fill out the template at the time that we are sending the data to the remote cluster. I don't see why we need to mutate our root cluster MachineSet objects to satisfy the information being in the remote Machine templates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but then we've got different representations of the same thing in the root and the target which strikes me as quite unusual, and if we get into syncing or ever supporting changes made remotely, we'll have some problems to deal with. Possible that can be reconciled but it's something we should watch out for. I can back this controller up and pull it out of this PR though.

@dgoodwin
Copy link
Contributor Author

Updated and dropping WIP, ready for merge or more review. I do want to revisit that is generated vs set by user on that template in future though.

@dgoodwin
Copy link
Contributor Author

/test e2e

@staebler
Copy link
Contributor

Updated and dropping WIP

@dgoodwin You did not drop the WIP. Is that an oversight? Or is it still WIP?

return c
}

// Controller manages clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not accurate.

@@ -338,7 +339,9 @@ func NewControllerInitializers() map[string]InitFunc {
controllers["nodeconfig"] = startNodeConfigController
controllers["deployclusterapi"] = startDeployClusterAPIController

// Controllers ported to upstream cluster API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for the controllers to be "ported to upstream cluster API"? At best this comment is confusing. At worst it is inaccurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR started out as a port of cluster controller to use the new API, and has been through many iterations since. I will update it.

Please keep tone of PR comments in mind.

@@ -536,6 +539,21 @@ func startClusterController(ctx ControllerContext) (bool, error) {
return true, nil
}

func startMachineAPIController(ctx ControllerContext) (bool, error) {
if !resourcesAvailable(ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always returns true since it is not checking for any resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating to check for both API resources being ready as this uses both.

ctx.ClientBuilder.KubeClientOrDie("clusteroperator-capi-machine-controller"),
ctx.ClientBuilder.ClientOrDie("clusteroperator-capi-machine-controller"),
ctx.ClientBuilder.ClusterAPIClientOrDie("clusteroperator-capi-machine-controller"),
).Run(int(ctx.Options.ConcurrentClusterSyncs), ctx.Stop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not appropriate to use ConcurrentClusterSyncs here.

"time"

"k8s.io/apimachinery/pkg/api/errors"
//metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.


if !apiequality.Semantic.DeepEqual(machine, newMachine) {
// TODO: updating the machine here, should this be a patch operation on the template spec?
// NOTE: updating here immediately triggers another sync...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this note is necessary. That is overwhelmingly (if not universally) true in controllers.

rawClusterVersion, ok := coMachineSet.Annotations[clusterVersionAnnotation]
if !ok {
return nil, nil, fmt.Errorf("Missing ClusterVersion resource annotation in MachineSet %#v", coMachineSet)
func (a *Actuator) clusterOperatorMachineSetSpec(m *clusterv1.Machine) (*cov1.MachineSetSpec, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use MachineSetSpecFromClusterAPIMachineSpec from pkg/controller/controller_utils.go instead of defining this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost surely, this is quite old and was written in a separate repo IIRC.

}
spec, ok := obj.(*clusteroperator.MachineSetProviderConfigSpec)
if !ok {
return nil, fmt.Errorf("Unexpected object: %#v", obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in clusterOperatorMachineSetSpec in pkg/clusterapi/aws/actuator.go, let's use the GroupVersionKind instead of dumping the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this here and for a couple other similar methods in this package.

for _, regionAMI := range clusterVersion.Spec.VMImages.AWSImages.RegionAMIs {
if regionAMI.Region == clusterSpec.Hardware.AWS.Region {
return &clusteroperator.VMImage{
AWSImage: &regionAMI.AMI,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy regionAMI.AMI to a new string and use a pointer to that. It would be really hard to track down if somewhere the value stored in this pointer where modified and it mutated a ClusterVersion stored in the lister.

@@ -33,3 +35,18 @@ func WithMachineSet(logger log.FieldLogger, machineSet *clusteroperator.MachineS
func WithCluster(logger log.FieldLogger, cluster *clusteroperator.Cluster) log.FieldLogger {
return logger.WithField("cluster", fmt.Sprintf("%s/%s", cluster.Namespace, cluster.Name))
}

// WithClusterAPI expands a logger's context to include info about the given cluster.
func WithClusterAPI(logger log.FieldLogger, cluster *clusterv1.Cluster) log.FieldLogger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/WithClusterAPI/WithClusterAPICluster/

@dgoodwin
Copy link
Contributor Author

PR updates.

These are responsible for populating some portions of the machineSpec
providerConfig on machines.

- calculate the VMImage to use (uses cluster version and we expect
additional logic in future, which will only need to exist in the root
cluster)
- copy the cluster hardware spec for use in deletion (when cluster may
no longer exist)
- apply hardware defaults if applicable

These are only used in the root cluster, when we decide to send a
machineset down to the target cluster, we will update the spec
beforehand.

AWS actuator is updated to use the correct data going forward. (followup
PR will solve credentials to the actuator can actually create the master
machineset in the root cluster, presently this is still broken)
@dgoodwin
Copy link
Contributor Author

Updated again.

/test unit

@staebler
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 17, 2018
@dgoodwin dgoodwin changed the title WIP: Add cluster-api machine/machineset controllers. Add cluster-api machine/machineset controllers. May 17, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2018
@openshift-merge-robot openshift-merge-robot merged commit 1d86123 into openshift:master May 17, 2018
@dgoodwin dgoodwin deleted the cluster-api branch May 17, 2018 18:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants