A repository to automate the configuration of my local infrastructure.
My local infrastructure is my homelab, which I primarily use to learn about state-of-the-art technologies as well as to run applications for potential startup incubations.
As part of this repository you may find interesting learnings and insights that I came across when developing my homelab and its automation. Automating my infrastructure allows me to deploy workloads more efficiently and debug issues more quickly.
I have been running a homelab since 2014, where it started with a single Cubietech Cubietruck. It served as a NAS, LEMP server, and was a great introduction for me into Linux. Over time the setup grew to include other single board computers, such as the Hardkernel Odroid XU4.
Since 2018 I started to make significant changes by automating the configuration of my router and my manually provisioned servers via Ansible. Using it allowed me to learn the classic way of Baremetal provisioning. It provided a solid tool to get me started with the deployment and operation of Kubernetes clusters.
Now, I am in the process of upgrading my infrastructure to become cloud-native and fully automated. The goal is to provide a RESTful API to provision baremetal machines from scratch, configure networking, and bootstrap and autoscale Kubernetes clusters.
Below, I describe the network setup of my Homelab. Some parts of it are already implemented, while others are still being conceptualized.
Status: 🟢 Operational
I register my domains with Namecheap, because they often provide good discounts on variety of domain names. Below, you may find a list of limitations that I ran into, which is the reason why I decided to only use them as a registrar, but not as a DNS provider.
-
Limited usability in CI
Their API requires a manually administrated allow-list of trusted IPs. The allow list they offer does not allow CIDR notation IP ranges, e.g.0.0.0.0/24
. The issue here is that GitHub Actions' hosted runners do not have a static set of IPs. As I want to administrate my entire infrastructure via GitHub actions and pipelines, a self-host runner is not an option for me. -
Slow API development
The Namecheap API is well-documented, but is not RESTful and uses XML. This setup is rather old-fashioned and makes it rather uncomfortable to use. Their customers have requested a more modern API surface, but so far the development of the API focuses on pure maintenance. Having a lightweight, easy-to-use API is critical when relying on it to build your products.
Therefore I decided to use Google Cloud DNS for the automated administration of my DNS records. The deployment of my DNS infrastructure is automated using GitHub Actions and Pulumi, because it allows me to write my Infrastructure as Code using my favourite programming language Go.
For more information you may want to check out the following files and directories:
main.go
: The entry point to my Pulumi program as a general overview.pkg/dns/
: The configuration of DNSSEC and my DNS records.
Status (v3): 💡 Planning
Keywords: BGP, Kubernetes Operator, Kube-Router, SDN
Motivation: Automation, ease-of-operation, client source IPs
Status (v2): 🟢 Operational
This setup uses a Seeed Dual Gigabit Ethernet NIC Carrier Board with a Raspberry Pi Compute Module 4 as a router. This setup allows it to run a Kubernetes cluster on my router and therefore on my network edge, which is not possible with my previous setup. This setup allows to build a custom orchestration layer for software defined networking via VLANs.
Status (v1): 🔴 End of life
In this setup I was running a Ubiquity Edgerouter-X (ER-X) with port-forwarding towards a single baremetal k3s cluster on a seperate machine. The ER-X is fine, but not great for automation and dynamic configuration.
In this setup I was had two subnets, userspace (192.168.0.1/24
) and homelab (172.16.0.1/22
), where DHCP was only enabled in the userspace subnet. This setup allowed me to maintain network connectivity for userspace traffic while performing testing or configuration changes in my homelab. The configuration was hardcoded to specific ports on my router.
Status: 🟢 Operational
The table below describes the set of manually assigned VLANs. The configuration of the DHCPv4 server and my network interfaces via netplan.io is based on the following resources:
- Ubuntu community article about
isc-dhcp-server
- How to make a simple router on a Ubuntu server
netplan.io
examples
VLAN ID | CIDR | DHCP | Name | Description |
---|---|---|---|---|
1 | none |
No | default | A network without any gateway to isolate unassigned hosts. |
10 | 172.16.0.0/22 |
Yes | management | A network for the configuration and management of network devices. |
4000 | 192.168.254.0/22 |
No | homelab | A network for experimental deployments. |
4090 | 192.168.255.0/24 |
Yes | userspace | A home network for WiFi and other domestic traffic. |
Status (v1): 🟢 Operational
I chose to use nftables
as it provides a modern and declarative API. In the future the configuration should be managed via a software layer that performs automatic reconciliation.
For anyone replicating this setup, it might be worth noting that there was one major roadblock during the implementation, which took me a week to figure out. Make sure to configure MTU clamping as described in the reference below or shown in configs/cloud-init/seeed-rtcm4-0/user-data
. The symptoms of this missing configuration option are that small TCP and UDP packets will be forwarded correctly, such as the client hello
of a TLS handshake or DNS queries, but regular TCP traffic will just be dropped. This in turn causes websites to fail loading. If you are able to open https://api.ipify.org
in your browser, but not https://wikipedia.org
it may very well be related to this.
- Quick reference for
nftables
from its wiki - RHEL documentation
nftables
chapter - RHEL documentation explaining NAT configuration
- Blog post for simple firewall setup
- Gento wiki showing
nftables
configuration examples - Arch Linux wiki explaining
nftables
basics - MTU clamping via packet header mangling
Status: 🔵 Limited availability
Currently, I am only running Traefik as an Ingress Controller, which does load balancing and TLS termination for my pods in Kubernetes. In the future I would like to have automatic load balancing for my Kubernetes Control Planes and LoadBalancer
type services.
Status (v2): 💡 Planning
Keywords: BGP, HAProxy, Gateway API
Motivation: Automation, ease-of-operation, client source IPs
To automate deployments, I chose to implement GitOps via Argo CD. The initial installation procedure of Argo CD is described in this blog post.
Below you may find a list of limitations with my current infrastructure setup.
-
Manual NS and DS records
Due to the limitations of the Namecheap API, I decided to completely administrate my NS and DS records by hand. If I can't automate it for one registrar, I will not automate it for any of them. This is subject to change based on the amount of domains and the frequency of changes. Usually however NS records rarely change. -
Manual administration of VLANs
Because my network contains a variety of network devices with different management protocols, there is a currently little value in automating the management of VLANs. If this should prove to be a valuable investment at some point, the links below include information on how to interface with some of my hardware via protocols such as the Easy Smart Configuration Protocol (ESCP).
Hidden gems 💎
This project is licensed under the terms of the MIT license.