Using Kubernetes because...reasons (go Google).
This stack is based on the GitOps approach, where a Flux
operator listens for HelmRelease
s being deployed. Such a release is in essence a helm package that gets registered with the operator, to be idempotently deployed to achieve the desired state. Any changes to this config will be picked up by the operator and consistently deployed, or rolled back when undeployable. This operator also listens to the deployed docker images endpoints (the docker repositories hosting them) for any new tags (according to some configurable pattern match) and will do a rolling update according to the desired strategy, or again rollback upon failure. On success it will then write back the newly used image tag into the config and commit+push to the source repo.
This repo is a helm chart with subcharts containing such HelmRelease
configs as well as any additional configuration for that service. Each subchart holds all that is necessary for the operator to bring up that particular service. When this helm chart is deployed onto a cluster, the HelmRelease operator will reconcile the cluster with these configs and all the platform's services should come up.
One source to rule them all.
So far I am using the following Kubernetes applications/tools:
- Kubernetes for describing our container infrastructure.
- Helm for packaging and deploying of Kubernetes apps and subapps.
- Weave Flux operator, which monitors this repo and reconciles the cluster state with the declarations found in this repo.
- Docker Registry for storing locally built images, and as a proxy + cache for public docker hub images (disabled for now).
- Prometheus Operator + Prometheus + Grafana for monitoring.
- Calico for networking and policies k8s style.
- Cert Manager for automatic https certificate creation for public endpoints.
- ElasticSearch + Kibana for log indexing & viewing.
- Weave Scope for a graphic overview of the network topology and services.
- Drone for Ci/CD.
Wishlist for the next version:
- Istio for service mesh security, insights and other enhancements.
- Ambassador: the new kubernetes native api gateway.
Alright, let's get to it. Follow me please! We will be going through the following workflow:
- Configuration
- Deployment & development
- Testing the apps
At any point can any step be re-run to result in the same (idempotent) state. After destroying the cluster, and then running install again, all storage endpoints will still contain the previously built/cached artifacts and configuration. The next boot should thus be faster :)
- A running Kubernetes cluster with RBAC enabled and
kubectl
installed on your local machine. (I suggest you check out morriz/k8s-dev/cluster for some flavors and insights.) - Helm client (
brew install helm
?). - Forked Morriz/nodejs-demo-api
- Letsencrypt staging CA (click and add to your browser's cert manager temporarily if you'd like to bypass browser warnings about https)
- ssh passwordless sudo access. On OSX I have to add my key like this:
ssh-add -K ~/.ssh/id_rsa
. - For
cert-manager
to work (it autogenerates letsencrypt certs), make sure port 80 and 443 are forwarded to your local machine:- by manipulating your firewall
- or by tunneling a domain from ngrok (and using that as
$CLUSTER_HOST
in the config below):- free account: just run
ngrok http 80
in a dedicated terminal window that you will want to keep open, because it creates a temporary host until the tunnel dies. - biz account: see provided
tpl/ngrok.yaml
that you can modify and then runbin/ngrok.sh
.
- free account: just run
- When using letsencrypt staging certs: For (Github) repo webhooks to be able to talk to drone in our cluster these hooks need to have
https
disabled. - Create an oAuth app for our Drone and copy the key & secret in GitHub so that drone can operate on your forked
Morriz/nodejs-api-demo
repo. Fill in those secrets in thedrone.yaml
values below.
Copy secrets/sample.sh
to secrets/local.sh
(and secrets/gce.sh
for deploying to gce), and edit them.
If needed you can also edit values/*.yaml
(see all the options in charts/*/values.yaml
), but for a first boot I would leave them as is.
IMPORTANT: The $CLUSTER_HOST
subdomains must all point to your laptop ip (I use ngrok for that, see bin/ngrok.sh
). Then the bin/tunnel-to-ingress.sh
script can forward incoming port 443 and 80 to the nginx controller, which will serve all our public ingresses.
Start by installing the prerequisites like Helm's Tiller, PVs, some RBAC and some CRDs:
sh bin/install-prerequisites.sh
Wait for it to complete, and then deploy the motherload:
sh bin/deploy.sh
This will install the main GitOps operator that will reconcile the state of the cluster with this GitOps repo. That may take a long time as it needs to pull in the docker images on all the nodes.
Follow the instructions on the screen, as some keys will be generated. The public gitops key needs to be stored in gitlab.com/emtransit/platform/services/settings/repository as deploy keys. The SealedSecrets controllers' private key needs to go in our main company secret vault (LastPass?), in case you are deploying to the cloud.
To avoid going through the flux mechanism for testing your modifications, you may deploy the stack directly with helmfile.
First generate the secrets and values:
bin/gen-values-tmp.sh
To install the entire stack:
helmfile apply
# or to do both with the alias hf="bin/gen-values-tmp.sh && helmfile"
hf apply
Or to just deploy some apps:
hf --selector name=istio apply
Or to deploy first the prerequisites (like needed CRDs etc):
hf --selector phase=init apply
And then later the rest:
hf --selector phase=final apply
After editing generate the final GitOps release files into releases/
by running:
bin/seal-secrets.sh
And commit and push to git.
Please check if all apps are running:
watch -n1 -x kubectl --all-namespaces=true get po,deploy
and wait...
When all deployments are ready the local service proxies can be started with:
bin/dashboards.js
and the service index will open.
-
Go to your public drone url (https://drone.{{CLUSTER_HOST}}) and select the repo
nodejs-demo-api
. -
Go to the 'Secrets' menu and create the following entries (follow the comments to get the values):
kubernetes_cert= # ktf get secret $(ktf get sa drone-deploy -o jsonpath='{.secrets[].name}{"\n"}') -o jsonpath="{.data['ca\.crt']}" | pbcopy kubernetes_token= # ktf get secret $(ktf get sa drone-deploy -o jsonpath='{.secrets[].name}{"\n"}') -o jsonpath="{.data.token}" | base64 -d | pbcopy kubernetes_dns= # ksk get po --selector=k8s-app=kube-dns --output=jsonpath={.items..status.hostIP} registry=localhost:5000 # or the public version if you made the registry accessible as a service
- Now commit to the forked
Morriz/nodejs-api-demo
repo and trigger a build in our Drone. - Drone builds and does tests, pushes docker image artifact to our private docker registry.
- Weave Flux sees the image, updates the deployment, and commits the updated config to git.
Look at a pre-installed Grafana dashboard showing the system cluster metrics.
Use the following default creds if not changed already in secrets/*.sh
:
- username: admin
- password: jaja
Look at the Prometheus view to see all targets are scrapable.
The alertmanager view will show the alerts concerning the unreachable endpoints.
Look at a pre-installed Kibana dashboard with Logtrail showing the cluster logs.
Creds: Same as Grafana.
Now that we have all ups running and functional, we can start deploying network policies. Let's start with denying all inbound/outbound ingress and egress for all namespaces:
k apply -f k8s/policies/deny-all.yaml
Now we can revisit the apps and see most of them failing. Interesting observation on minikube: the main nginx-ingress is still functional. This is because current setup does not operate on the host network. To also control host networking we have to fix some things, but that will arrive in the next update of this stack (hopefully).
Let's apply all the policies needed for every namespace to open up the needed connectivity, and watch the apps work again:
for ns in default kube-system system monitoring logging team-backend; do k apply -n $ns -f k8s/policies/each-namespace/defaults.yaml; done
k apply -f k8s/policies
Sometimes during development stuff is not accessible (yet), so you can delete all the policies to allow full access again:
for ns in default kube-system system monitoring logging team-backend; do k -n $ns delete networkpolicy --all; done