This is a mono repository for my home infrastructure and Kubernetes cluster using tools like Ansible, Kubernetes, Flux, Renovate, and GitHub Actions.
The baseline of this configuration starts from from onedr0p's cluster-template. Inspiration for further workloads to run in the cluster and how to provision their kustomizations extends from many other related home-ops projects in the community.
Talos is the linux distribution running kubernetes on my nodes. I have so far been happy with the results. I'd previously tried provisioning k3s on top of ubuntu with various ansible scripts to assist with the setup. Talos seems like less overhead to maintain and update.
I've tried some hyper-converged cluster storage paradigms, using mayastor, longhorn, or rook-ceph. I've had the most luck with rook-ceph. Currently, I've moved my primary workers and control-plane nodes to VMs on a Proxmox cluster, and am using rook-ceph in external mode with ceph running on the Proxmox cluster.
- actions-runner-controller: Self-hosted Github runners.
- cert-manager: Creates SSL certificates utilizing Let's Encrypt and Cloudflare DNS.
- cilium: Internal Kubernetes container networking interface.
- cloudflared: Enables Cloudflare secure access to certain ingresses.
- external-dns: Automatically syncs ingress DNS records to a DNS provider.
- external-secrets: Managed Kubernetes secrets using Bitwarden Secrets Manager Cache. BWSC seems a bit unstable, so I have a cronjob set to restart it daily.
- ingress-nginx: Kubernetes ingress controller using NGINX as a reverse proxy and load balancer.
- sops: Managed secrets for Kubernetes which are commited to Git.
- spegel: Stateless cluster local OCI registry mirror.
- volsync: Backup and recovery of persistent volume claims.
Flux watches the apps in my kubernetes folder and makes changes to the cluster based on the state of my Git repository.
Renovate watches my entire repository looking for dependency updates. When they are found, patch changes are automatically applied. For more major changes, a PR is automatically created. Flux applies the changes to my cluster after commits to main.
📁 kubernetes
├── 📁 apps # applications
├── 📁 bootstrap # bootstrap procedures
├── 📁 flux # core flux configuration
└── 📁 templates # re-useable components
Cilium is configured to use direct mode instead of vxlan tunneling. All nodes must be on the same subnet with each other. As a concequence to this choice, I've not had luck placing a worker node in a different subnet (for example, creating a single tainted worker to host untrusted or IOT-related workloads in a more-secure VLAN). Trying to convert in-place to encapsulation using VXLAN nearly immediately broke cluster networking. More science is required. 🧫
I have tailscale's operator running, which potentially could also help solve the problem.
While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.
Service | Use | Cost |
---|---|---|
Bitwarden | Family password manager, Secrets with External Secrets | ~$40/yr |
Cloudflare | Several Domains and S3 | ~$100/yr |
GitHub | Hosting this repository and CI/CD. Pro subscription. | ~$48/yr |
Fastmail | Email hosting for 2 users | ~$100/yr |
Pushover | Kubernetes Alerts and application notifications | $5 OTP |
Total: ~$xyz/mo |
In my cluster there are multiple ExternalDNS instances deployed. One is deployed with the ExternalDNS webhook provider for UniFi which syncs DNS records to my UniFi router. Another does the same to a PiHole VM, which is mirrored with GravitySync to a secondary VM and a tertiary hardware Pi. The other ExternalDNS instance syncs DNS records to Cloudflare only when the ingresses and services have an ingress class name of external
and contain an ingress annotation external-dns.alpha.kubernetes.io/target
. Most local clients on my network use my PiHoles as the upstream DNS server; some fall back on the Unifi router.
Once I do more testing of Unifi's adblock solution, I may remove the piholes.
Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
---|---|---|---|---|---|---|
Gmktec M5 Pro | 3 | 512GB SSD | 1TB NVMe (ceph) | 64GB | Proxmox | VM Hosts |
RasPi 4 | 4 | 512GB SSD | - | 8GB | Talos | Kubernetes Workers |
RasPi 3 | 1 | 32GB SD | - | 8GB | DietPi | PiHole |
RasPi 5 | 1 | 128GB SD | - | 8GB | HAOS | Home Assistant |
Supermicro 846 & X9dri-f | 1 | 2x 512GB SSD | 10x16TB ZFS (mirrored vdevs) | 64GB | TrueNAS Scale | NFS + Backup Server |
UniFi UDM SE | 1 | - | 1x12TB HDD | - | - | Router & NVR |
UniFi USW-Enterprise-24-PoE | 1 | - | - | - | - | 2.5Gb PoE Switch |
UniFi USP PDU Pro | 1 | - | - | - | - | PDU |
APC SMT1500RM2U | 1 | - | - | - | - | UPS |
Thanks to all the people who donate their time to the Home Operations Discord community. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you could deploy.