This repository contains various commands, manifests, and docs for the Collective talk on 05/24/2018. A lot of the following material is inspired and borrowed from the Kubernetes: Up and Running book. Refer to this awesome Medium article for diagrams and visualizing how the components interact with each other.
Try the following commands on your Kubernetes cluster:
Create a namespace for your collective playground and set context
$ kubectl create namespace collective
$ kubectl config set-context $(kubectl config current-context) --namespace=collective
Create a kuard pod
$ kubectl apply -f 01-kuard-pod.yaml
See all the pods in this namespace using get
, read more details and description of the object using describe
, log
to get object's logs.
$ kubectl get pods
$ kubectl describe pods
$ kubectl get logs pods
Delete the object using
$ kubectl delete pods kuard
Add health checks
$ kubectl apply -f 02-kuard-pod-health.yaml
Add resource requests and limits
$ kubectl apply -f 03-kuard-pod-resources.yaml
Add a volume to the pod
$ kubectl apply -f 04-kuard-pod-volume.yaml
As you run more applications on Kubernetes, the resources/objects scale in size and complexity. Labels and annotations let you work in sets of things that map how you think about your application. You can organize, mark, and cross-index resources to represent groups that make the most sense of your application.
Labels provide the foundation for grouping objects. Annotations provide a storage mechanism to hold nonidentifying information (metadata) that can be leveraged by other tools and libraries.
Run a few deployments and add labels to them
$ kubectl run alpaca-prod \
--image=gcr.io/kuar-demo/kuard-amd64:1 \
--replicas=2 \
--labels="ver=1,app=alpaca,env=prod"
$ kubectl run alpaca-test \
--image=gcr.io/kuar-demo/kuard-amd64:2 \
--replicas=1 \
--labels="ver=2,app=alpaca,env=test"
$ kubectl run bandicoot-prod \
--image=gcr.io/kuar-demo/kuard-amd64:2 \
--replicas=2 \
--labels="ver=2,app=bandicoot,env=prod"
$ kubectl run bandicoot-staging \
--image=gcr.io/kuar-demo/kuard-amd64:2 \
--replicas=1 \
--labels="ver=2,app=bandicoot,env=staging"
Check the deployments
$ kubectl get deployments --show-labels
Modify the label for one of the deployments. Labels can be applied/updated after the object is created.
$ kubectl label deployments alpaca-test "canary=true"
Use the -L option to show a label value as a column
$ kubectl get deployments -L canary
Remove a label by applying a -
suffix
$ kubectl label deployments alpaca-test "canary-"
Selectors are a way to find objects based on their labels. Check the pods
$ kubectl get pods --show-labels
Show pods on version 2
$ kubectl get pods --selector="ver=2"
Show pods with multiple selectors. (Logical AND)
$ kubectl get pods --selector="ver=2,app=bandicoot"
Show pods with labels matching a set of values. (Logical OR)
$ kubectl get pods --selector="env in (prod,staging)"
Selectors are also used in the YAML manifests to refer to Kubernetes objects. A selector of app=alpaca,ver in (1,2)
would translate to the following:
...
selector:
matchLabels:
app: alpaca
matchExpressions:
- {key: ver, operator: In, values: [1, 2]}
...
Annotations provide a place to store additional metadata for Kubernetes objects with the sole purpose of assisting tools and libraries. While labels are used to identify and group objects, annotations are used to provide extra information about where and object came from, how to use it, or policy around that object. When in doubt, add information to an object as an annotation and promote it to a label if you find yourself wanting to use it in a selector.
They can be defined in the common metadata
section in every Kubernetes object.
...
metadata:
annotations:
example.com/icon-url: "https://example.com/icon.png"
...
Annotations are convenient and provide powerful loose coupling, but should be used judiciously to avoid an untyped mess of data.
Delete the deployments created in this section.
$ kubectl delete deployments --all
Service discovery tools help solve the problem of finding which processes are listening at which addresses for which services. A good service discovery system will enable users to resolve this information quickly and reliably. Such a system is low-latency. Kubernetes offers a Service
object to create a named label selector. kubectl expose
is used to create a service for a deployment.
Let's create a deployment
$ kubectl run alpaca-prod \
--image=gcr.io/kuar-demo/kuard-amd64:1 \
--replicas=3 \
--port=8080 \
--labels="ver=1,app=alpaca,env=prod"
Expose this deployment by creating a service
$ kubectl expose deployment alpaca-prod
Check the service
$ kubectl get services -o wide
Let's create another deployment
$ kubectl run bandicoot-prod \
--image=gcr.io/kuar-demo/kuard-amd64:2 \
--replicas=2 \
--port=8080 \
--labels="ver=2,app=bandicoot,env=prod"
Create a service for this deployment
$ kubectl expose deployment bandicoot-prod
Check services
$ kubectl get services -o wide
The SELECTOR
column indicates that the alpaca-prod
service just gives a name to a selector, and specifies which ports to talk to for that service. The kubectl expose
command pulls both the label selector and relevant ports from the deployment definition.
The service is also assigned a new type of virtual IP called a Cluster IP. This is a special IP address that the system will load-balance across all of the pods that are identified by the selector.
The cluster IP does not change, so it is appropriate to give it a DNS address. The issues with clients caching DNS results no longer apply.
$ ALPACA_POD=$(kubectl get pods -l app=alpaca \
-o jsonpath='{.items[0].metadata.name}')
$ kubectl port-forward $ALPACA_POD 8090:8080
If you open the DNS Query section on the kuard app, and query bandicoot-prod
for DNS Type A
, you will see the following output.
;; opcode: QUERY, status: NOERROR, id: 43754
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;bandicoot-prod.collective.svc.cluster.local. IN A
;; ANSWER SECTION:
bandicoot-prod.collective.svc.cluster.local. 30 IN A 100.64.41.4
A readiness check is a way for an overloaded server to signal to the system that it doesn't want to receive traffic anymore. This is a great way to implement graceful shutdown. The server can signal that it no longer wants traffic, wait until existing connections are closed, and then cleanly exit.
Let's add a readiness check to our deployment.
$ kubectl edit deployment/alpaca-prod
Add a readiness check to the pod spec
...
spec:
...
template:
...
spec:
containers:
...
name: alpaca-prod
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 2
initialDelaySeconds: 0
failureThreshold: 3
successThreshold: 1
...
A pod with failing readiness check is removed from the service loadbalancer, so no more connections will be made to the pod via the service until it is ready.
You can confirm this by watching the endpoints of the service
$ kubectl get endpoints alpaca-prod --watch
Go to the browser and click the Fail
link in the Readiness Probe
tab. You should see the endpoint corresponding to the pod removed from the alpaca-prod
service.
So far, we have covered exposing services inside of a cluster. Oftentimes, the IPs for pods are only reachable from within the cluster. There are a few ways to allow external traffic reach the pods.
For a service of type NodePort, the system picks a port (or user specifies one), and every node in the cluster then forwards traffic from that port to the service. This is in addition to the Cluster IP that's already assigned to the service. With this feature, if you can reach any node in the cluster, you can reach the service too.
$ kubectl edit service alpaca-prod
Modify .spec.type
from ClusterIP
to NodePort
.
$ kubectl describe svc alpaca-prod
Name: alpaca-prod
Namespace: collective
Labels: app=alpaca
env=prod
ver=1
Annotations: <none>
Selector: app=alpaca,env=prod,ver=1
Type: NodePort
IP: 100.65.8.237
Port: <unset> 8080/TCP
TargetPort: 8080/TCP
NodePort: <unset> 31212/TCP
Endpoints: 100.96.5.108:8080,100.96.6.16:8080,100.96.7.93:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
You can use LoadBalancer
service type to create a new loadbalancer on your cloud provider and direct it to the nodes in your cluster. Basically, it is a superset of the NodePort
service. In this case, since I'm running on AWS infrastructure, a classic ELB gets provisioned.
$ kubectl edit svc alpaca-prod
Modify .spec.type
from NodePort
to LoadBalancer
$ kubectl describe svc alpaca-prod
Name: alpaca-prod
Namespace: collective
Labels: app=alpaca
env=prod
ver=1
Annotations: <none>
Selector: app=alpaca,env=prod,ver=1
Type: LoadBalancer
IP: 100.65.8.237
LoadBalancer Ingress: aebad48a95ee111e89b82022fc41c72f-225282263.us-east-1.elb.amazonaws.com
Port: <unset> 8080/TCP
TargetPort: 8080/TCP
NodePort: <unset> 31212/TCP
Endpoints: 100.96.5.108:8080,100.96.6.16:8080,100.96.7.93:8080
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Type 13s service-controller NodePort -> LoadBalancer
Normal EnsuringLoadBalancer 13s service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 10s service-controller Ensured load balancer
You can grab the LoadBalancer Ingress
and open up a browser
$ LB_ING=$(kubectl get service alpaca-prod -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
$ curl http://$LB_ING:8080
It is possible to achieve manual service discovery by using the Endpoints
object and label selectors. Cluster IPs are stable virtual IPs that load-balance traffic across all of the endpoints in a service.. Every node on the cluster runs a component called kube-proxy
. The kube-proxy
watches for new services/endpoints in the cluster via the API server, and then programs a set of iptables
rules in the kernel of that host to rewrite the destination of packets, so they are directed at one of the endpoints for the service. If the set of endpoints changes, the set of iptables
rules is rewritten.
Cleanup the services and deployments
$ kubectl delete svc,deploy -l app
Services offer a great way to dynamically find and react to the placement of where your workloads are running. Once your application can find a service, you are free to stop worrying about where things are running and when they move. Kubernetes will take care of the details of container placement.
Previously we covered how to run individual containers as pods. But pods are essentially one-off singletons. More often than not, you want multiple replicas running at a particular time for the following reasons:
- Redundancy - Failure can be tolerated
- Scale - More requests can be handled
- Sharding - Different parts of a computation can be handled in parallel
A ReplicaSet acts as a cluster-wide Pod manager, ensuring that the right types and number of Pods are running at all times.
They are the building blocks used to describe common application deployment patterns and provide the underpinnings of self-healing for our applications at the infrastructure level. The act of managing the replicated Pods is an example of a reconciliation loop.
The reconciliation loop is constantly running, observing the current state of the world and taking action to try to make the observed state match the desired state. This approach is inherently goal-driven, self-healing, and it can often be easily expressed in a few lines of code.
One of the key themes that runs through Kubernetes is decoupling. In particular, all of the core concepts are modular with respect to each other and they are swappable with other components. In this spirit, the relationship between ReplicaSets and Pods is loosely coupled. ReplicaSets use label queries to identify the set of Pods they should be managing.
- You can create a ReplicaSet that will adopt an existing pod, seamlessly moving from a single imperative Pod to a replicated set of Pods managed by a ReplicaSet
- You can quarantine pods/containers that are misbehaving due to failing health checks. Update the set of labels on the sick Pod, disassociating it from the ReplicaSet (and service), so you can debug the Pod. The Pod is still running, available for the developers for interactive debugging, instead of resigning to debugging from logs.
Create a ReplicaSet using
$ kubectl apply 05-kuard-rs.yaml
Since the number of Pods in the current state is less than the desired state, the ReplicaSet controller will create new Pods, using a Pod template that is contained in the ReplicaSet specification. The labels used for filtering the Pods are defined in the ReplicaSet spec are key to understanding how ReplicaSets work.
$ kubectl describe rs kuard
You can see if a Pod is being managed by a ReplicaSet by checking the kubernetes.io/created-by
annotation. However, such annotations are created on a best-effort basis.
You can imperatively scale the ReplicaSet using
$ kubectl scale kuard --replicas=4
Declaratively, you can scale by updating the ReplicaSet spec in the yaml
and using the apply
command
...
spec:
replicas: 3
...
$ kubectl apply -f 05-kuard-rs.yaml
Kubernetes supports Horizontal Pod Autoscaling. HPA requires the presence of the heapster
Pod in the kube-system
namespace. Follow the installation steps if you don't have heapster installed.
$ kubectl autoscale rs kuard --min=2 --max=5 --cpu-percent=80
$ kubectl delete rs kuard
If you don't want to delete the pods that are being managed by the ReplicaSet, you can set the --cascade
flag to false
, to ensure only the ReplicaSet object gets deleted and not the Pods
$ kubectl delete rs kuard --cascade=false
A DaemonSet ensures that a copy of a Pod is running across a set of nodes in a Kubernetes cluster. They are typically used to deploy system daemons such as log collectors and monitoring agents.
Run a fluentd
logging agent on every node
$ kubect apply -f 07-fluentd.yaml
See the description
$ kubectl describe daemonset fluentd --namespace kube-system
Name: fluentd
Selector: app=fluentd
Node-Selector: <none>
Labels: app=fluentd
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"app":"fluentd"},"name":"fluentd","namespace":"kube-system...
Desired Number of Nodes Scheduled: 6
Current Number of Nodes Scheduled: 6
Number of Nodes Scheduled with Up-to-date Pods: 6
Number of Nodes Scheduled with Available Pods: 6
Number of Nodes Misscheduled: 0
Pods Status: 6 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 1m daemonset-controller Created pod: fluentd-9xlg8
Normal SuccessfulCreate 1m daemonset-controller Created pod: fluentd-2z4gw
...
With the fluentd DaemonSet in place, adding a new node to the cluster will result in a fluentd Pod being deployed to that node automatically.
You can also restrict the nodes on which the DaemonSet can run. To be able to do that, you need to add labels to the nodes and use a nodeSelector
field in the DaemonSet spec.
Add a label to a node
$ kubectl label nodes ip-172-20-104-191.ec2.internal ssd=true
Run nginx-fast-storage DaemonSet
$ kubectl apply -f 08-nginx-fast-storage.yaml
Verify that the pods run only on the nodes that match the selector
Usually updating a DaemonSet is achieved by deleting all the pods and changing the container image before running it again. This can result in downtime. To avoid this, update the spec.updateStrategy.type
field to RollingUpdate
. Any change to spec.template
will trigger a rolling update now.
The biggest thing I love about Kubernetes and its ecosystem is the community and culture. There is collaboration unlike anything I have seen before. I encourage you to participate in meetups/webinars, engage with the people involved, and there's a ton to learn here.
My favorites: