Skip to content

Latest commit

 

History

History
279 lines (197 loc) · 13.9 KB

README.md

File metadata and controls

279 lines (197 loc) · 13.9 KB

Write Controllers on Top of Metal-Stack by Using Kubebuilder

Along the way of implementing cluster-api on top of metal-stack we learnt quite a few things about kubebuilder which enables us to write reconciliation logic easily and we want to share that knowledge with you, so we built the project xcluster, an extremely simplified version of cluster which contains metal-stack resources. We will assume you already went through kubebuilder book and are looking for more hands-on examples. By referencing the code in this project, you will be able to create a CustomResourceDefinition (CRD), write its reconciliation logic and deploy it.

Architecture

We created two CRDs, XCluster and XFirewall as shown in the following figure. XCluster represents a cluster which contains metal-stack network and XFirewall. XFirewall corresponds to metal-stack firewall. The circular arrows imply the nature of reconciliation and also the corresponding controllers which reconcile the states of the resources.

architecture

metal-api

metal-api manages all metal-stack resources, including machine, firewall, switch, OS image, IP, network and more. They are constructs which enable you to turn your data center into elastic cloud infrastructure. You can try it out on mini-lab, a local development platform where you can play with metal-stack resources and where we built this project. In this project, metal-api does the real job. It allocates the network and creates the firewall, fulfilling what you wish in the xcluster.yaml.

Demo

Clone the repo of mini-lab and xcluster in the same folder.

├── mini-lab
└── xcluster

Download the prerequisites of mini-lab. Then,

cd mini-lab
make

It's going to take some time to finish. Behind the scene, a kind cluster is created, metal-api related kubernetes resources are deployed, and multiple linux kernel-based virtual machines are created for metal-stack switches and machines.

From time to time, do

docker-compose run metalctl machine ls

Till you see Waiting under LAST EVENT as follows:

ID                                          LAST EVENT   WHEN     AGE  HOSTNAME  PROJECT  SIZE          IMAGE  PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258        Waiting      8s                               v1-small-x86         vagrant
2294c949-88f6-5390-8154-fa53d93a3313        Waiting      8s                               v1-small-x86         vagrant

Then, in another terminal yet still in folder mini-lab (must!), do

eval $(make dev-env) # for talking to metal-api in this shell
cd ../xcluster

Now you should be in folder xcluster. Then,

make

Behind the scene, all related kubernetes resources are deployed:

  • CRD of XCluster and XFirewall
  • Deployment xcluster-controller-manager which manages two controllers with the reconciliation logic of XCluster and XFirewall respectively
  • ClusterRole and ClusterRoleBinding which entitle your manager to manage the resources XCluster and XFirewall.

Then, check out your xcluster-controller-manager running alongside other metal-stack deployments.

kubectl get deployment -A

Then, deploy your xcluster.

kubectl apply -f config/samples/xcluster.yaml

Check out your brand new custom resources.

kubectl get xcluster,xfirewall -A

The results should read:

NAME                                           READY
xcluster.cluster.www.x-cellent.com/x-cellent   true

NAME                                            READY
xfirewall.cluster.www.x-cellent.com/x-cellent   true

Then go back to the previous terminal where you did

docker-compose run metalctl machine ls

Repeat the command and you should see a metal-stack firewall running.

ID                                                      LAST EVENT      WHEN    AGE     HOSTNAME                PROJECT                                 SIZE            IMAGE                          PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258                    Waiting         41s                                                                             v1-small-x86                                   vagrant
2294c949-88f6-5390-8154-fa53d93a3313                    Phoned Home     21s     14m 19s x-cellent-firewall      00000000-0000-0000-0000-000000000000    v1-small-x86    Firewall 2 Ubuntu 20201126     vagrant

The reconciliation logic in reconcilers did the job to deliver what's in the sample manifest. This manifest is the only thing the user has to worry about.

kubebuilder markers for CRD

kubebuilder provides lots of handful markers. Here are some examples:

  1. API Resource Type

    // +kubebuilder:object:root=true

    The go struct under this marker will be an API resource type in the url. For example, the url path to XCluster instance myxcluster would be

    /apis/cluster.www.x-cellent.com/v1/namespaces/myns/xclusters/myxcluster
  2. API Subresource

    // +kubebuilder:subresource:status

    The go struct under this marker contains API subresource status. For the last example, the url path to the status of the instance would be:

    /apis/cluster.www.x-cellent.com/v1/namespaces/myns/xclusters/myxcluster/status
  3. Terminal Output

    // +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.ready`

    This specifies an extra column of output on terminal when you do kubectl get.

Wire up metal-api client metalgo.Driver

metalgo.Driver is the client in go code for talking to metal-api. To enable both controllers of XCluster and XFirewall to do that, we created a metalgo.Driver named metalClient and set field Driver of both controllers as shown in the following snippet from main.go.

	if err = (&controllers.XClusterReconciler{
		Client: mgr.GetClient(),
		Driver: metalClient,
		Log:    ctrl.Log.WithName("controllers").WithName("XCluster"),
		Scheme: mgr.GetScheme(),
	}).SetupWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create controller", "controller", "XCluster")
		os.Exit(1)
	}

Role-based access control (RBAC)

With the following lines in xcluster_controller.go and the equivalent lines in xfirewall_controller.go (in our case overlapped), kubebuilder generates role.yaml and wire up everything for your xcluster-controller-manager pod when you do make deploy. The verbs are the actions your pod is allowed to perform on the resources, which are xclusters and xfirewalls in our case.

// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xclusters,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xclusters/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xfirewalls,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xfirewalls/status,verbs=get;update;patch

Finalizer

When you want to do some clean-up before the kubernetes api-server deletes your resource in no time upon kubectl delete, finalizers come in handy. A finalizer is simply a string constant stored in field finalizers of a Kubernetes resource instance's metadata. For example, the finalizer of XCluster in xcluster_types.go:

const XClusterFinalizer = "xcluster.finalizers.cluster.www.x-cellent.com"

The api-server will not delete the instance before its finalizers are all removed from the resource instance. For example, in xcluster_controller.go we add the above finalizer to the XCluster instance, so later when the instance is about to be deleted, the api-server can't delete the instance before we've freed the metal-stack network and then removed the finalizer from the instance. We can see that in action in the following listing. We use the Driver mentioned earlier to ask metal-api if the metal-stack network we allocated is still there. If so, we use the Driver to free it and then remove the finalizer of XCluster.

	resp, err := r.Driver.NetworkFind(&metalgo.NetworkFindRequest{
		ID:        &cl.Spec.PrivateNetworkID,
		Name:      &cl.Spec.Partition,
		ProjectID: &cl.Spec.ProjectID,
	})

	if err != nil {
		return ctrl.Result{}, fmt.Errorf("failed to list metal-stack networks: %w", err)
	}

	if len := len(resp.Networks); len > 1 {
		return ctrl.Result{}, fmt.Errorf("more than one network listed: %w", err)
	} else if len == 1 {
		if _, err := r.Driver.NetworkFree(cl.Spec.PrivateNetworkID); err != nil {
			return ctrl.Result{Requeue: true}, nil
		}
	}
	log.Info("metal-stack network freed")

	cl.RemoveFinalizer(clusterv1.XFirewallFinalizer)
	if err := r.Update(ctx, cl); err != nil {
		return ctrl.Result{}, fmt.Errorf("failed to remove xcluster finalizer: %w", err)
	}
	r.Log.Info("finalizer removed")

Likewise, in xfirewall_controller.go we add the finalizer to XFirewall instance. The api-server can't delete the instance before we clean up the underlying metal-stack firewall (r.Driver.MachineDelete(fw.Spec.MachineID) in the following listing) and then remove the finalizer from the instance:

func (r *XFirewallReconciler) DeleteFirewall(ctx context.Context, fw *clusterv1.XFirewall, log logr.Logger) (ctrl.Result, error) {
	if _, err := r.Driver.MachineDelete(fw.Spec.MachineID); err != nil {
		return ctrl.Result{}, fmt.Errorf("failed to delete firewall: %w", err)
	}
	log.Info("states of the machine managed by XFirewall reset")

	fw.RemoveFinalizer(clusterv1.XFirewallFinalizer)
	if err := r.Update(ctx, fw); err != nil {
		return ctrl.Result{}, fmt.Errorf("failed to remove XFirewall finalizer: %w", err)
	}
	r.Log.Info("finalizer removed")

	return ctrl.Result{}, nil
}

func errors.IsNotFound and client.IgnoreNotFound

When you have different handlers depending on whether the error is the instance not found, you can consider using errors.IsNotFound(err) as follows from xcluster_controller.go:

	fw := &clusterv1.XFirewall{}
	if err := r.Get(ctx, req.NamespacedName, fw); err != nil {
		// errors other than `NotFound`
		if !errors.IsNotFound(err) {
			return ctrl.Result{}, fmt.Errorf("failed to fetch XFirewall instance: %w", err)
		}

		// Create XFirewall instance
		fw = cl.ToXFirewall()

If we can do nothing against the error the instance not found, we might simply stop the reconciliation without requeueing the request as follows:

	cl := &clusterv1.XCluster{}
	if err := r.Get(ctx, req.NamespacedName, cl); err != nil {
		return ctrl.Result{}, client.IgnoreNotFound(err)
	}

Exponential Back-Off

As far as requeue is concerned, returning ctrl.Result{}, err and ctrl.Result{Requeue: true}, nil are the same as shown in this if clause and this else if clause in the source code. Moreover, exponential back-off can be observed in the source code where dependencies of a controller are set and where func workqueue.DefaultControllerRateLimiter is defined.

ControllerReference

ControllerReference is a kind of OwnerReference that enables the garbage collection of the owned instance (XFirewall) when the owner instance (XCluster) is deleted. We demonstrate that in xcluster_controller.go by using the function SetControllerReference.

		if err := controllerutil.SetControllerReference(cl, fw, r.Scheme); err != nil {
			return ctrl.Result{}, fmt.Errorf("failed to set the owner reference of the XFirewall: %w", err)
		}

Since XCluster owns XFirewall instance, we have to inform the manager that it should reconcile XCluster upon any change of an XFirewall instance:

func (r *XClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&clusterv1.XCluster{}).
		Owns(&clusterv1.XFirewall{}).
		Complete(r)
}

Wrap-up

Check out the code in this project for more details. If you want a fully-fledged implementation, stay tuned! Our cluster-api-provider-metalstack is on the way. If you want more blog posts about metal-stack and kubebuilder, let us know! Special thanks go to Grigoriy Mikhalkin.