Along the way of implementing cluster-api on top of metal-stack we learnt quite a few things about kubebuilder which enables us to write reconciliation logic easily and we want to share that knowledge with you, so we built the project xcluster, an extremely simplified version of cluster which contains metal-stack resources. We will assume you already went through kubebuilder book and are looking for more hands-on examples. By referencing the code in this project, you will be able to create a CustomResourceDefinition (CRD), write its reconciliation logic and deploy it.
We created two CRDs, XCluster
and XFirewall
as shown in the following figure. XCluster
represents a cluster which contains metal-stack network and XFirewall
. XFirewall
corresponds to metal-stack firewall. The circular arrows imply the nature of reconciliation and also the corresponding controllers which reconcile the states of the resources.
metal-api manages all metal-stack resources, including machine, firewall, switch, OS image, IP, network and more. They are constructs which enable you to turn your data center into elastic cloud infrastructure. You can try it out on mini-lab, a local development platform where you can play with metal-stack resources and where we built this project. In this project, metal-api does the real job. It allocates the network and creates the firewall, fulfilling what you wish in the xcluster.yaml.
Clone the repo of mini-lab and xcluster in the same folder.
├── mini-lab
└── xcluster
Download the prerequisites of mini-lab. Then,
cd mini-lab
make
It's going to take some time to finish. Behind the scene, a kind cluster is created, metal-api related kubernetes resources are deployed, and multiple linux kernel-based virtual machines are created for metal-stack switches and machines.
From time to time, do
docker-compose run metalctl machine ls
Till you see Waiting under LAST EVENT as follows:
ID LAST EVENT WHEN AGE HOSTNAME PROJECT SIZE IMAGE PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258 Waiting 8s v1-small-x86 vagrant
2294c949-88f6-5390-8154-fa53d93a3313 Waiting 8s v1-small-x86 vagrant
Then, in another terminal yet still in folder mini-lab (must!), do
eval $(make dev-env) # for talking to metal-api in this shell
cd ../xcluster
Now you should be in folder xcluster. Then,
make
Behind the scene, all related kubernetes resources are deployed:
- CRD of
XCluster
andXFirewall
Deployment
xcluster-controller-manager which manages two controllers with the reconciliation logic ofXCluster
andXFirewall
respectivelyClusterRole
andClusterRoleBinding
which entitle your manager to manage the resourcesXCluster
andXFirewall
.
Then, check out your xcluster-controller-manager running alongside other metal-stack deployments.
kubectl get deployment -A
Then, deploy your xcluster.
kubectl apply -f config/samples/xcluster.yaml
Check out your brand new custom resources.
kubectl get xcluster,xfirewall -A
The results should read:
NAME READY
xcluster.cluster.www.x-cellent.com/x-cellent true
NAME READY
xfirewall.cluster.www.x-cellent.com/x-cellent true
Then go back to the previous terminal where you did
docker-compose run metalctl machine ls
Repeat the command and you should see a metal-stack firewall running.
ID LAST EVENT WHEN AGE HOSTNAME PROJECT SIZE IMAGE PARTITION
e0ab02d2-27cd-5a5e-8efc-080ba80cf258 Waiting 41s v1-small-x86 vagrant
2294c949-88f6-5390-8154-fa53d93a3313 Phoned Home 21s 14m 19s x-cellent-firewall 00000000-0000-0000-0000-000000000000 v1-small-x86 Firewall 2 Ubuntu 20201126 vagrant
The reconciliation logic in reconcilers did the job to deliver what's in the sample manifest. This manifest is the only thing the user has to worry about.
kubebuilder provides lots of handful markers. Here are some examples:
-
API Resource Type
// +kubebuilder:object:root=true
The go
struct
under this marker will be an API resource type in the url. For example, the url path toXCluster
instance myxcluster would be/apis/cluster.www.x-cellent.com/v1/namespaces/myns/xclusters/myxcluster
-
API Subresource
// +kubebuilder:subresource:status
The go
struct
under this marker contains API subresource status. For the last example, the url path to the status of the instance would be:/apis/cluster.www.x-cellent.com/v1/namespaces/myns/xclusters/myxcluster/status
-
Terminal Output
// +kubebuilder:printcolumn:name="Ready",type=string,JSONPath=`.status.ready`
This specifies an extra column of output on terminal when you do
kubectl get
.
metalgo.Driver
is the client in go code for talking to metal-api. To enable both controllers of XCluster
and XFirewall
to do that, we created a metalgo.Driver
named metalClient
and set field Driver
of both controllers as shown in the following snippet from main.go.
if err = (&controllers.XClusterReconciler{
Client: mgr.GetClient(),
Driver: metalClient,
Log: ctrl.Log.WithName("controllers").WithName("XCluster"),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "XCluster")
os.Exit(1)
}
With the following lines in xcluster_controller.go and the equivalent lines in xfirewall_controller.go (in our case overlapped), kubebuilder generates role.yaml and wire up everything for your xcluster-controller-manager pod when you do make deploy
. The verbs
are the actions your pod is allowed to perform on the resources
, which are xclusters
and xfirewalls
in our case.
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xclusters,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xclusters/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xfirewalls,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cluster.www.x-cellent.com,resources=xfirewalls/status,verbs=get;update;patch
When you want to do some clean-up before the kubernetes api-server deletes your resource in no time upon kubectl delete
, finalizers come in handy. A finalizer is simply a string constant stored in field finalizers
of a Kubernetes resource instance's metadata. For example, the finalizer of XCluster
in xcluster_types.go:
const XClusterFinalizer = "xcluster.finalizers.cluster.www.x-cellent.com"
The api-server will not delete the instance before its finalizers are all removed from the resource instance. For example, in xcluster_controller.go we add the above finalizer to the XCluster
instance, so later when the instance is about to be deleted, the api-server can't delete the instance before we've freed the metal-stack network and then removed the finalizer from the instance. We can see that in action in the following listing. We use the Driver
mentioned earlier to ask metal-api if the metal-stack network we allocated is still there. If so, we use the Driver
to free it and then remove the finalizer of XCluster
.
resp, err := r.Driver.NetworkFind(&metalgo.NetworkFindRequest{
ID: &cl.Spec.PrivateNetworkID,
Name: &cl.Spec.Partition,
ProjectID: &cl.Spec.ProjectID,
})
if err != nil {
return ctrl.Result{}, fmt.Errorf("failed to list metal-stack networks: %w", err)
}
if len := len(resp.Networks); len > 1 {
return ctrl.Result{}, fmt.Errorf("more than one network listed: %w", err)
} else if len == 1 {
if _, err := r.Driver.NetworkFree(cl.Spec.PrivateNetworkID); err != nil {
return ctrl.Result{Requeue: true}, nil
}
}
log.Info("metal-stack network freed")
cl.RemoveFinalizer(clusterv1.XFirewallFinalizer)
if err := r.Update(ctx, cl); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to remove xcluster finalizer: %w", err)
}
r.Log.Info("finalizer removed")
Likewise, in xfirewall_controller.go we add the finalizer to XFirewall
instance. The api-server can't delete the instance before we clean up the underlying metal-stack firewall (r.Driver.MachineDelete(fw.Spec.MachineID)
in the following listing) and then remove the finalizer from the instance:
func (r *XFirewallReconciler) DeleteFirewall(ctx context.Context, fw *clusterv1.XFirewall, log logr.Logger) (ctrl.Result, error) {
if _, err := r.Driver.MachineDelete(fw.Spec.MachineID); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to delete firewall: %w", err)
}
log.Info("states of the machine managed by XFirewall reset")
fw.RemoveFinalizer(clusterv1.XFirewallFinalizer)
if err := r.Update(ctx, fw); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to remove XFirewall finalizer: %w", err)
}
r.Log.Info("finalizer removed")
return ctrl.Result{}, nil
}
When you have different handlers depending on whether the error is the instance not found, you can consider using errors.IsNotFound(err)
as follows from xcluster_controller.go:
fw := &clusterv1.XFirewall{}
if err := r.Get(ctx, req.NamespacedName, fw); err != nil {
// errors other than `NotFound`
if !errors.IsNotFound(err) {
return ctrl.Result{}, fmt.Errorf("failed to fetch XFirewall instance: %w", err)
}
// Create XFirewall instance
fw = cl.ToXFirewall()
If we can do nothing against the error the instance not found, we might simply stop the reconciliation without requeueing the request as follows:
cl := &clusterv1.XCluster{}
if err := r.Get(ctx, req.NamespacedName, cl); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
As far as requeue is concerned, returning ctrl.Result{}, err
and ctrl.Result{Requeue: true}, nil
are the same as shown in this if
clause and this else if
clause in the source code. Moreover, exponential back-off can be observed in the source code where dependencies of a controller are set and where func workqueue.DefaultControllerRateLimiter
is defined.
ControllerReference is a kind of OwnerReference
that enables the garbage collection of the owned instance (XFirewall
) when the owner instance (XCluster
) is deleted. We demonstrate that in xcluster_controller.go by using the function SetControllerReference
.
if err := controllerutil.SetControllerReference(cl, fw, r.Scheme); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to set the owner reference of the XFirewall: %w", err)
}
Since XCluster
owns XFirewall
instance, we have to inform the manager that it should reconcile XCluster
upon any change of an XFirewall
instance:
func (r *XClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&clusterv1.XCluster{}).
Owns(&clusterv1.XFirewall{}).
Complete(r)
}
Check out the code in this project for more details. If you want a fully-fledged implementation, stay tuned! Our cluster-api-provider-metalstack is on the way. If you want more blog posts about metal-stack and kubebuilder, let us know! Special thanks go to Grigoriy Mikhalkin.