- CNCF Cloud Native Interactive Landscape
- Development Containers
- Turnkey * Containers
- k8s-for-docker-desktop
- There are many computing orchestrators
- They make decisions about when and where to "do work"
- We've done this since the dawn of computing:Mainframe schedulers, Puppet, Terraform, AWS, Mesos, Hadoop, etc.
- Since 2014 we've had resurgence of new orchestration projects because:
- Popularity of distributed computing
- Docker containers as a app package and isolated runtime
- We needed "many servers to act like one, and run many containers"
- An the Container Orchestrator was born
- Many open source projects have been created in the last 5 years to:
- Schedule running of containers on servers
- Dispatch them across many nodes
- Monitor and react to container and server health
- Provide storage, networking, proxy, security, and logging features
- Do all this in a declarative way, rather than imperative
- Provide API's to allow extensibility and management
- Kubernetes, aka K8s
- Docker Swarm(and Swarm classic)
- Apache Mesos/Marathon
- Cloud Foundry
- Amazon ECS(not OSS, AWS-only)
- HashiCorp Nomad
- Many of these tools run on top of Docker Engine
- Kubernetes is the one orchestator with many distributions
- Kubernetes "vanilla upstream"(not a distribution)
- Cloud-Managed distros: AKS, GKE, EKS,DOK...
- Self-Managed distros: RedHat OpenShift, Docker Enterprise, Rancher, Canonical Charmed, openSUSE Kubic...
- Vanilla installers: Kubeadm, kops, kubicorn...
- Local dev/test: Docker Desktop, minikube, microK8s
- CI testing:kind
- Special builds: Rancher K3s
- And Many, many more..(86 as of June 2019)
- Kubernetes is a container management system
- It runs and manages containerized applications on a cluster(one or more servers)
- Often this is simply called "container orchestration"
- Sometimes shortened to Kube or K8s("Kay-eights" or "Kates")
- Start 5 containers using image
atseashop/api:v1.3
- Place an internal load balancer in front of these containers
- Start 10 containers using image
atseashop/webfront:v1.3
- Place a public load balancer in front of these containers
- It's Black Friday(or Christmas), traffic spikes, grow our cluster and add containers
- New release!Replace my containers with the new image
atseashop/webfront:v1.4
- Keep processing requests during the upgrade;update my containers one at a time
- Basic autoscaling
- Blue/green deployment, canary deployment
- Long running services, but also batch(one-off) and CRON-like jobs
- Overcommit our cluster and evict low-priority jobs
- Run services with stateful data(database etc.)
- Fine-grained access control defining what can be done whom on which resources
- Integrating third party services(service catalog)
- Automating complex tasks(operators)
- Ha ha ha ha
- OK, I was trying to scare you, it's much simpler than that ❤️
- The nodes executing our containers run a collection of services:
- a container Engine(typically Docker)
- kubelet(the "node agent")
- kube-proxy(a necessary but not sufficient network component)
- Nodes were formerly called "minions"
- (You might see that word in older articles or documentation)
- The Kubernetes logic(its "brains") is a collection of services:
- the API server(our point of entry to everything!)
- core services like the scheduler and controller manager
etcd
(a highly available key/value store; the "database" of Kubernetes)
- Together, these services form the control plane of our cluster
- The control plane is also called the "master"
- It is common to reserve a dedicated node for the control plane
- (Except for single-node development clusters, like when using minikube)
- This node is then called a "master"
- (Yes,this is ambiguous:is the "master" a node, or the whole control plane?)
- Normal applications are restricted from running on this node
- (By using a mechanism called "taints")
- When high availability is required, each service of the control plane must be resilient
- The control plane is then replicated on multiple nodes
- (This is sometimes called a "multi-master" setup)
- The services of the control plane can run in or out of containers
- For instance: since
etcd
is a critical service, some people deploy it directly on a dedicated cluster(without containers)- (This is illustrated on the first "super complicated" schema)
- In some hosted Kubernetes offerings(e.g. AKS, GKE, EKS), the control plane is invisible
- (We only "see" a Kubernetes API endpoint)
- In that case, there is no "master node"
For this reason, it is more accurate to say "control plane" ranther than "master"
No!
- By default, Kubernetes uses the Docker Engine to run containers
- Or leverage other pluggable runtimes through the Container Runtime Interface
- We could also use rkt("Rocket") from CoreOS(deprecated)
- containerd: maintained by Docker, IBM, and community
- Used by Docker Engine, microK8s, K3s, GKE, and standalone; has
ctr
CLI - CRI-O: maintained by Red Hat, SUSE, and community; based on containerd
- Used by OpenShift and Kubic, version matched to Kubernetes
Yes!
- In this course, we'll run our apps on a single node first
- We may need to build images and ship them around
- We can dot these things without Docker
- (and get diagnosed with NIH syndrome)
- Docker is still the most stable container engine today
- (but other options are maturing very quickly)
NIH--> Not Invented Here
- On your development environments, CI pipelines...:
- Yes, almost certainly
- On our production servers:
- Yes(today)
- Probally not(in the future)
More information about CRI on the Kubernetes blog
- We will interact with our Kubernetes cluster through the Kubernetes API
- The Kubernetes API is (mostly) RESTful
- It allows us to create, read, update, delete resources
- A few common resource types are:
- node (a machine == physical or virtual -- in our cluster)
- pod (group of containers running together on a node)
- service (stable network endpoint to connect to one or multiple containers)
- Pods are a new abstraction!
- A
pod
can have multiple containers working together - (But you usually only have on container per pod)
- Pod is our smallest deployable unit; Kubernetes can't mange containers directly
- IP address are associated with
pods
, not with individual containers - Containers in a pod share
loacalhost
, and can share volumes - Multiple containers in a pod are deployed together
- In reality, Docker doesn't know a pod, only containers/namespaces/volumes
- Best: Get a environment locally
- Docker Desktop(Win/MacOS), minikube(Win Home), or microk8s(Linux)
- Small setup effort;free;flexible environments
- Requires 2GB+ of memory
- Good: Setup a cloud Linux host to run microk8s
- Great if you don't have the local resources to run Kubernetes
- Small setup effort; only free for a while
- My $50 DigitalOcean coupon lets you run Kubernetes free for a month
- Last choice: Use a browser-based solution
- Low setup effort; but host is short-lived and has limited resources
- Not all hands-on examples will work in the browser sandbox
- Docker Desktop(DD) is great for a local dev/test setup
- Requires modern macOS or Windows 10 Pro/Ent/Edu(no Home)
- Requires Hyper-V, and disables VirtualBox
Exercise
- Download Windows or macOS versions and install
- For Windows, ensure you pick "Linux Containers" mode
- Once running, enabled Kubernetes in Settings/Perferences
kubectl get nodes
- A good local install option if you can't run Docker Desktop
- Inspired by Docker Toolbox
- Will create a local VM and configure latest Kubernetes
- Has lots of other features with it
minikube
CLI - But, requires spearate install of VirtualBox and kubectl
- May not work with older Windows version(YMMV)
Exercise
- Download and install
VirtualBox
- Download
kubectl
, and add to $PATH - Download and install
minikube
- Run
minikube start
to create and run a Kubernetes VM - Run
minikube stop
when you're done
- Easy install and management of local Kubernetes
- Made by Canonical(Ubuntu).Installs using
snap
.Works nearly everywhere - Has lots of other features with its
microk8s
CLI - But, requires you install snap if not on Ubuntu
- Runs on containerd rather than Docker, no biggie
- Needs alias setup for
microk8s.kubectl
Exercise
- Install
microk8s
, change group permissions, then set alias in bashrc
sudo apt install snapd
sudo snap install microk8s --classic
microk8s.kubectl
sudo usermod -a -G microk8s <username>
echo "alias kubectl='microk8s.kubectl'" >> ~/.bashrc
Last choice: Use a browser-based solution
- Low setup effort; but host is short-lived and has limited resources
- Services are not always working right, and may not be up to date
- Not all hands-on examples will work in the browser sandbox
Exercise
- Use a prebuilt Kubernetes server at
Katacoda
- Or setup a Kubernetes node at
play-with-k8s.com
- Maybe try the latest OpenShift at
learn.openshift.com
- See if instruqt works for
a Kubernetes playground
- You can use
shpod
for examples shpod
provides a shell running in a pod on the cluster- It comes with many tools pre-installed(helm, stern, curl, jq...)
- These tools are used in many exercises in these slides
shpod
also gives you shell completion and a fancy prompt- Create it with
kubectl apply -f https://k8smastery.com/shpod.yaml
- Attach to shell with
kubectl attach --namespace=shpod -ti shpod
- After finishing course
kubectl delete -f https://k8smastery.com/shpod.yaml
kubectl
is(almost) the only tool we'll need to talk to Kubernetes- It is a rich CLI tool around the Kubernetes API
- (Everything you can do with
kubectl
, you can do directly with the API)
- (Everything you can do with
- On our machines, there is a
~/.kube/config
file with:- the Kubernetes API address
- the path to our TLS certificates used to authenticate
- You can also use the
--kubeconfig
flag to pass a config file - Or directly
--server
,--user
, etc. kubectl
can be pronounced "Cube C T L", "Cube cuttle", "Cube cuddle"...- I'll be using the official name "Cube Control"
- Let's look at our
Node
resources withkubectl get
!
Exercise
- Look at the composition of our cluster:
kubectl get node
- These commands are equivalent:
kubectl get no
kubectl get node
kubectl get nodes
kubectl get
con output JSON, YAML, or be directly formatted
Exercise
- Give us more info about the nodes:
kubectl get nodes -o wide
- Let's have some YAML:
kubectl get no -o yaml
See that kind: List
at the end? It's the type of our result!
- It's super easy to build custom reports
Exercise
- Show the capacity of all our nodes as a stream of JSON objects:
kubectl get nodes -o json | jq ".items[] | {name:.metadata.name} + .status.capacity"
kubectl describe
needs a resource type and (optionally) a resource name- It is possible to provide a resource name prefix(all matching objects will be displayed)
kubectl describe
will retrieve some extra information about the resource
Exercise
- Look at the information available for your node name with one of the following:
kubectl describe node/<node>
kubectl describe node <node>
kubectl describe node/docker-desktop
(We should notice a bunch of control plane pods.)
- We can list all available resource types by running
kubectl api-resources
- (in Kubernetes 1.10 and prior, this command used to be
kubectl get
)
- (in Kubernetes 1.10 and prior, this command used to be
- We can view the definition for a resource type with:
kubectl explain type
- We can view the definition of a field in a resource, for instance:
kubectl explain node.spec
- Or get the list of all fields and sub-fields:
kubectl explain node --recursive
e.g.
kubectl explain node
kubectl explain node.spec
- We can access the same information by reading the
API documentation
- The API documentation is usually easier to read, but:
- it won't show custom types(like Custom Resource Definitions)
- We need to make sure that we look at the correct version
kubectl api-resources
andkubectl explain
perform introspection- (they communicate with the API server and obtain the exact type definitions)
- The most common resource names have three forms:
- singular(e.g.
node
,service
,deployment
) - plural(e.g.
nodes
,services
,deployments
) - short(e.g.
no
,svc
,deploy
)
- singular(e.g.
- Some resources do not have a short name
Endpoints
only have a plural form- (because even a single
Endpoints
resource is actually a list of endpoints)
- (because even a single
- A service is a stable endpoint to connect to "something"
- (in the initial proposal, they were called "portals")
Exercise
- List the services on our cluster with one of these commands:
kubectl get services
kubectl get svc
There is already one service on our cluster: the Kubernetes API itself.
- Containers are manipulated through pods
- A pod is a group of containers:
- running together (on the same node)
- sharing resources(RAM, CPU; but also network, volumes)
Exercise
- List pods on our cluster:
kubectl get pods
- Namespaces allow us to segregate resources
Exercise
- List the namespaces on our cluster with one of these commands:
kubectl get namespaces
kubectl get namespace
kubectl get ns
You know what...This kube-system
thing looks suspicious
In fact, I'm pretty sure it showed up earlier, when we did:
kubectl describe node <node-name>
- By default,
kubectl
uses thedefault
namespace - We can see resources in all namespaces with
--all-namespaces
Exercise
- List the pods in all namespaces:
kubectl get pods --all-namespaces
- Since Kubernetes 1.14, we can also use
-A
as a shorter version:
kubectl get pods -A
Here are our system pods!
etcd
is our etcd serverkube-apiserver
is the API serverkube-controller-manager
andkube-scheduler
are other control plane componentscoredns
provides DNS-based service discovery(replacing kube-dns as of 1.11)kube-proxy
is the(per-node) component managing port mappings and such<net name>
is the optional(per-node) component managing the network overlay- the
READY
column indicates the number of conatainers in each pod - Note: this only shows containers, you won't see host svcs(e.g. microk8s)
- Also Note: you may see different namespaces depending on setup
- We can also look at a different namespace(other than
default
)
Exercise
- List only the pods in the
kube-system
namespace:
kubectl get pods --namespace=kube-system
kubectl get pods -n kube-system
- We can use
-n/--namespace
with almost everykubectl
command - Example:
kubectl create --namespace=X
to create something in namespace X
- We can use
-A/--all-namespaces
with most commands that manipulate multiple objects - Examples:
kubectl delete
can delete resources across multiple namespaceskubectl label
can add/remove/update labels across multiple namespaces
Exercise
- List the pods in the
kube-public
namespace:
kubectl -n kube-public get pods
Nothing!
kube-public
is created by our installer & used for security bootstrapping
.
- The only interesting object in
kube-public
is a ConfigMap namedcluster-info
Exercise
- List ConfigMap objects:
kubectl -n kube-public get configmaps
- inspect
cluster-info
:
kubectl -n kube-public get configmap cluster-info -o yaml
Note the selfLink
URI:/api/v1/namespaces/kube-public/configmaps/cluster-info
We can use that (later in kubectl context
lectures)!
kubectl -n kube-public get configmaps
kubectl -n kube-public get configmap cluster-info -o yaml
- Starting with kubernetes 1.14, there is
kube-node-lease
namespace- (or in Kubernetes 1.13 if the NodeLease feature gate is enabled)
- That namespace contains one Lease object per node
- Node leases are a new way to implement node heartbeats
- (i.e. node regularly pinging the control plane to say "I'm alive!")
- For more details, see
KEP-0009
or thenode controller documentation
- First things first: we cannot run a container
- We are going to run a pod, and in that pod there will be a single container
- In that container in the pod, we are going to run a simple
ping
command - Then we are going to start additional copies of the pod
- We need to spcify at least a name and the image we want to use
Exercise
- Let's ping
1.1.1.1
, Cloudflare'spublic DNS resolver
:
kubectl run pingpong --image alpine ping 1.1.1.1
(Starting with Kubernetes 1.12, we get a message telling us that kubectl run
is deprecated. Let's ignore it for now.)
- Let's look at the resources that were created by
kubectl run
Exercise
- List most resource types:
kubectl get all
We should see the following things:
deployment.apps/pingpong
(the deployment that we just created)replicaset.apps/pingpong-xxxxxxxxx
(a replica set created by the deployment)pod/pingpong-xxxxxxxx-yyyyyyy
(a pod created by the replica set)
Note: as of 1.10.1, resource types are displayed in more detail.
- A deployment is a high-level construct
- allows scaling, rolling updates, rollbacks
- multiple deployments can be used together to implement a
canary deployment
- delegates pods management to replica sets
- A replica set is a low-level construct
- makes sure that a given number of identical pods are running
- allows scaling
- rarely used directly
- Note: A replication controller is the (deprecated) predecessor of a replica set
kubectl run
created a deployment,deployment.apps/pingpong
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/pingpong 1 1 1 1 10m
- That deployment created a replica set,
replicaset.apps/pingpong-xxxxxxxxxxxxx
NAME DESIRED CURRENT READY AGE
replicaset.apps/pingpong-7c8bbcd9bc 1 1 1 10m
- That replica set created a pod,
pod/pingpong-xxxxxxxxxxxxx-yyyy
NAME READY STATUS RESTARTS AGE
pod/pingpong-7c8bbcd9bc-6c9qz 1/1 Running 0 10m
- We'll see later how these folks play together for:
- scaling, high availability, rolling updates
- Let's use the
kubectl logs
command - We will pass either a pod name, or a type/name
- (E.g. if we specify a deployment or replica set, it will get the first pod in it)
- Unless specified otherwise, it will only show logs of the first container in the pod
- (Good thing there's only one in ours!)
Exercise
- View the result of our
ping
command:
kubectl logs deploy/pingpong
- We can create additional copies of our container(I mean, our pod) with
kubectl scale
Exercise
- Scale our
pingpong
deployment:
kubectl scale deploy/pingpong --replicas 3
- Note that this command does exactly the same thing:
kubectl scale deployment pingpong --replicas 3
Note: what if we tried to scale replicaset.apps/pingpong-xxxxxxx
?
We could! But he deployment would notice it right away, and scale back to the ...
kubectl scale deploy/pingpong --replicas 3
Streaming logs in real time
- Just like
docker logs
,kuberctl logs
supports conveninent options:-f/--follow
to stream logs in real time(a latail -f
)--tail
to indicate how many lines you want to see(from the end)--since
to get logs only after a given timestramp
Exercise
- View the latest logs of our
ping
command:
kubectl logs deploy/pingpong --tail 1 --follow
- Leave that command running, so that we can keep an eye on these logs
- What happens if we restart
kubctl logs
?
Exercise
- interrupt
kubectl logs
(with Ctrl-C) - Restart it:
kubectl logs deploy/pingpong --tail 1 --follow
kubectl logs
will warn us that multiple pods were found, and that it's showing us only one of them.
Let's leave kubectl logs
running while we keep exploring.
- The deployment
pingpong
watches itsreplica set
- The replica set ensures that the right number of pods are running
- What happens if pods disappear?
Exercise
- In a separate window, watch the list of pods:
watch kubectl get pods
- Destroy the pod currently shown by
kubectl logs
:
kubectl delete pod pingpong-xxxxxxxxxxxxx-yyyyyy
e.g.
watch kubectl get pods
kubectl delete pod pingpong-xxxxxxxxxxxxx-yyyyyy
kubectl delete pod
terminates the pod gracefully- (sending it the TERM signal and waiting for it to shutdown)
- As soon as the pod is in "Terminating" state, the Replica Set replaces it
- But we can still see the output of the "Terminating" pod in
kubectl logs
- Until 30 seconds later, when the grace period expires
- The pod is then killed, and
kubectl logs
exits
- What if we wanted to start a "one-shot" container that doesn't get restarted?
- We could use
kubectl run --restart=OnFailure
orkubectl run --restart=Never
- These commands would create jobs or pods instead of deployments
- Under the hood,
kubectl run
invokes "generators" to create resource descriptions - We could also write these resource descriptions ourselves(typically in YAML),
and create them on the cluster with
kubectl apply -f
(discussed later) - With
kubectl run --schedule=...
, we can also create cronjobs.
- A Cron Job is a job that will be executed at specific intervals
- (the name comes from the traditional cronjobs executed by the UNIX crond)
- It requires a schedule,represented as five space-separated fields:
- minute[0,59]
- hour[0,23]
- day of the month[1,31]
- month of the year[1,12]
- day of the week([0,6] with 0=Sunday)
*
means "all vaild values";/N
means "every N"- Example:
*/3 * * * *
means "every three minutes"
- Let's create a simple job to be executed every three minutes
- Cron Jobs need to terminate, otherwise they'd run forever
Exercise
- Create the Cron Job:
kubectl run --schedule="*/3 * * * *" --restart=OnFailure --image=alpine sleep 10
- Check the resource that was created:
kubectl get cronjobs
- At the specified schedule, the Cron Job will create a Job
- The Job will create a Pod
- The Job will make sure that the Pod completes
- (re-creating another on if it fails, for instance if its node fails)
Exercise
- Check the Jobs that are created:
kubectl get jobs
(it will take a few minutes before the first job is scheduled.)
- As we can see from the previous slide,
kubectl run
can do many things - The exact type of resource created is not obvious
- To make things more explicit, it is better to use
kubectl create
:kubectl create deployment
to create a deploymentkubectl create job
to create a jobkubectl create cronjob
to run a job periodically- (since Kubernetes 1.14)
- Eventually,
kubectl run
will be used only to start one-shot pods
kubectl run
- easy way to get started
- versatile
kubectl create <resource>
- explicit, but lacks some features
- can't create a CronJob before Kubernetes 1.14
- can't pass command-lin arguments to deployments
kubectl create -f foo.yaml
orkubectl apply -f foo.yaml
- all features are available
- requires writing YAML
- Can we stream the logs of all our
pingpong
pods?
Exercise
- Combine
-l
and-f
flags:
kubectl logs -l run=pingpong --tail 1 -f
Note: combining -l
and -f
is only possible since Kubernetes 1.14!
Let's try to understand why...
- Let's see what happens if we try to stream the logs for more than 5 pods
Exercise
- Scale up our deployment:
kubectl scale deployment pingpong --replicas=8
- Stream the logs
kubectl logs -l run=pingpong --tail 1 -f
We see a message like the following one:
error: your are attempting to follow 8 log streams,
but maximum allowed concurency is 5,
use --max-log-requests to increase the limit
e.g.
kubectl scale deployment pingpong --replicas=8
kubectl get pods
kubectl logs -l run=pingpong --tail 1 -f
# error: you are attempting to follow 8 log streams, but maximum allowed concurency is 5, use --max-log-requests to increase the limit
kubectl
opens one connection to the API server per pod- For each pod, the API server opens one extra connection to the corresponding kubelet
- If there are 1000 pods in our deployments, that's 1000 inbound + 1000 outbound connections on the API server
- This could easily put a lot of stress on the API server
- Prior Kubernetes 1.14, it was decided to not allow multiple connections
- From Kubernetes 1.14, it is allowed, but limited to 5 connetcions
- (this can be changed with
--max-log-requests
)
- (this can be changed with
- For more details about the rationale, see
PR #67573
- We don't see which pod sent which log line
- If pods are restarted/replaced, the log stream stops
- If new pods are added, we don't see their logs
- To stream the logs of multiple pods, we need to write a selector
- There are external tools to address these shortcomings
- (e.g.: Stern)
- The
kubectl logs
commands has limitaions:- it cannot stream logs from multiple pods at a time
- when showing logs from multiple pods, it mixes them all together
- We are going to see how to do it better
- We could(if we were so inclined) write a program or script that would:
- take a selector as an argument
- enumerate all pods matching that selector (with
kubectl get -l...
) - for one
kubectl logs --follow ...
command per container - annotate the logs (the output of each
kubectl logs ....
process) with their origin - preserve ordering by using
kubectl logs --timestramps ...
and merge the output
- We could do it, but thankfully, other did it for us already!
Stern is an open source project by Wercker
From the README:
Stern allows you to tail multiple pods on Kubernetes and multiple containers within the pod. Each result is color coded for quicker debugging.
The query is a regular expression so the pod name can easily be filtered and you don't need to specify the exact id (for instance omitting the deployment id). If a pod is deleted it gets removed from tail and if a new pod is added it automatically gets tailed.
Exactly what we need!
- Run
stern
(without arguments) to check if it's installed
$ stern
Tail multiple pods and containers from Kubernetes
Usage:
stern pod-query [flags]
- If it is not installed, the easiest method is to download a
binary release
- The following commands will install Stern on a Linux Intel 64 bit machine:
sudo curl -L -o /usr/local/bin/stern \
https://github.com/wercker/stern/releases/download/1.11.0/stern_linux_amd64
sudo chmod +x /usr/local/bin/stern
- On OS X, just
brew install stern
- There are two ways to specify the pods whose logs we want to see:
-l
followed by a selector expression(like with manykubectl
commands)- with a "pod query," i.e. a regex used to match pod names
- These two ways can be combined if necessary
Exercise
- View the logs for all the pingpong containers:
stern pingpong
- The
--tail N
flag shows the lastN
lines for each container- (instead of showing the logs since the creation of the container)
- The
-t
/--timestamps
flag shows timestamps - The
--all-namespaces
flag is self-explanatory
Exercise
- View what's up with the
pingpong
system containers:
stern --tail 1 --timestamps pingpong
Let's cleanup before we start the next lecture!
Exercise
- remove our deployment and cronjob:
kubectl delete deployment/pingpong cronjob/sleep
- They say, "a picture is worth one thousand words."
- The following 19 slides show what really happens when we run:
kubectl run web --image=nginx --replicas=3
kubectl expose
creates a service for existing pods- A service is a stable address for a pod (or a bunch of pods)
- If we want to connect to our pod(s), we need to create a service
- Once a service is created, CoreDNS will allow us to resolve it by name
- (i.e. after creating service
hello
, the namehello
will resolve to something)
- (i.e. after creating service
- There are different types of services, detailed on the following slides:
ClusterIP
,NodePort
,LoadBalancer
,ExternalName
ClusterIP
(default type)- a virtual IP address is allocated for the service(in an internal, private range)
- this IP address is reachable only from within the cluster(nodes and pods)
- our code can connect to the service using the original port number
NodePort
- a port is allocated for the service(by default, in the 30000-32768 range)
- that port is made available on all our nodes and anybody can connect to it
- our code must be changed to connect to that new port number
These service types are always available.
Under the hood:kube-proxy
is using a userland proxy and a bunch of iptables
rules.
LoadBalancer
- an external load balancer is allocated for the service
- the load balancer is configured accordingly
- (e.g.: a
NodePort
service is created, and the load balancer sends traffic to that port)
- (e.g.: a
- available only when the underlying infrastructure provides some "load balancer as as service"
- (e.g. AWS, Azure, GCE, OpenStack...)
ExternalName
- the DNS entry managed by CoreDNS will just be a
CNAME
to a provided record - no port, no IP address, no nothing else is allocated
- the DNS entry managed by CoreDNS will just be a
- Since
ping
doesn't have anything to connect to, we'll have to run something else - We could use the
nginx
offical image, but ...- ...we wouldn't be able to tell the backends from each other!
- We are going to use
bretfisher/httpenv
, a tiny HTTP server written in Go bretfisher/httpenv
listens on port 8888- it serves its environment variables in JSON format
- The environment variables will include
HOSTNAME
, which will be the pod name- (and therefore, will be different on each backend)
- We could do
kubectl run httpenv --image=bretfisher/httpenv
... - But since
kubectl run
is changing, let's see how to seekubectl create
instead
Exercise
- In another window, watch the pods(to see when they are created)
kubectl get pods -w
- Create a deployment for this very lightweight HTTP server:
kubectl create deployment httpenv --image=bretfisher/httpenv
- Scale it to 10 replicas:
kubectl scale deployment httpenv --replicas=10
Exercise
# t1
kubectl get pods -w
# t2
kubectl create deployment httpenv --image=bretfisher/httpenv
# t1
ctrl+c
kubectl get pods
kubectl get all
clear
kubectl get pods -w
# t2
kubectl scale deployment httpenv --replicas=10
kubectl expose deployment httpenv --port 8888
kubectl get service
- You can assign IP addresses to services, but they are stll layer 4
- (i.e. a service is not an IP address; it's an IP address + protocol + port)
- This is caused by the current implementation of
kube-proxy
- (it relies on mechanisms that don't support layer 3)
- As a result: you have to indicate the port number for your service
- Running services with arbitrary port (or port ranges) requires hacks
- (e.g. host networking mode)
- We will now send a few HTTP requests to our pods
Exercise
- Run shpod if not on Linux host so we can access internal ClusterIP
kubectl apply -f https://bret.run/shpod.yml
kubectl attach --namespace=shpod -it shpod
- Let's obtain the IP address that was allocated for our service, programmatically:
IP=$(kubectl get svc httpenv -o go-template --template '{{ .spec.clusterIP }}')
echo $IP
- Send a few requests:
curl http://$IP:8888/
- Too much output? Filter it with
jq
:
curl -s http://$IP:8888/ | jq .HOSTNAME
exit
kubectl delete -f https://bret.run/shpod.yml
kubectl delete deployment/httpenv
- Sometimes, we want to access our scaled services directly:
- if we want to save a tiny little bit of latency(typically less than 1ms)
- if we need to connect over arbitrary ports(instead of a few fixed ones)
- if we need to communicate over another protocol than UDP or TCP
- if we want to decide how to balance the requests client-side
- ...
- In that case, we can use a
headless service
- A headless service is obtained by setting the
clusterIP
field toNone
- (Either with
--cluster-ip=None
, or by providing a custom YAML)
- (Either with
- As a result, the service doesn't have a virtual IP address
- Since there is no virtual IP address, there is no load balancer either
- CoreDNS will return the pods' IP addresses as multiple
A
records - This gives us an easy way to discover all the replicas for a deployment
- A service has a number of "endpoints"
- Each endpoint is a host + port where the service is available
- The endpoints are maintained and updated automatically by Kubernetes
Exercise
- Check the endpoints that Kubernetes has associated with our
httpenv
service:
kubectl describe service httpenv
- In the output, there will be a line starting with
Endpoints:
. - That line will list a bunch of addresses in
host:port
format.
- When we have many endpoints, our deploy commands truncate the list
kubectl get endpoints
- If we want to see the full list, we can use a different output:
kubectl get endpoints httpenv -o yaml
- These IP addresses should match the addresses of the corresponding pods:
kubectl get pods -l app=httpenv -o wide
endpoints
is the only resource that cannot be singular
kubectl get endpoint
# error: the server doesn't have a resource type "endpoint"
- This is because the type itself is plural(unlike every other resource)
- There is no
endpoint
object:type Endpoints struct
- The type doesn't represent a single endpoint, but a list of endpoints
- The default type(ClusterIP) only works for internal traffic
- If we want to accept external traffic, we can use one of these:
- NodePort (expose a service on a TCP port between 30000-32768)
- LoadBalancer (provision a cloud load balancer for our service)
- ExternalIP (use one node's external IP address)
- Ingress(a special mechanism for HTTP services)
We'll see NodePorts and Ingresses more in detail later.
Let's cleanup before we start the next lecture!
Exercise
- remove our httpenv resources:
kubectl delete deployment/httpenv service/httpenv
- TL,DR:
- Our cluster(nodes and pods) is one big flat IP network.
- In detail:
- all nodes must be able to reach each other, without NAT
- all pods must be able to reach each other, without NAT
- pods and nodes must be able to reach each other, without NAT
- each pod is aware of its IP address(no NAT)
- pod IP addresses are assigned by the network implementation
- Kubernetes doesn't mandate any particular implementaion
- Everything can reach everything
- No address translation
- No port translation
- No new protocol
- The network implementation can decide how to allocate addresses
- IP addresses don't have to be "portable" from a node to another
- (For example, We can use a subnet per node and use a simple routed topology)
- The specification is simple enough to allow many various implementaions
- Everything can reach everything
- if you want security, you need to add network policies
- the network implementation you use needs to support them
- There are literally dozens of implementations out there
- (15 are listed in the Kubernetes documentaion)
- Pods have level 3(IP) connectivity, but services are level 4(TCP or UDP)
- (Services map to a single UDP or TCP port; no port ranges or arbitrary IP packets)
kube-proxy
is on the data path when connecting to a pod or container,and its not particularly fast(relies on userland proxying or iptables)
- The nodes we are using have been set up to use kubenet, Calico, or something else
- Don't worry about the warning about
kube-proxy
performance - Unless you:
- routinely saturate 10G network interfaces
- count packet rates in millions per second
- run high-traffic VOIP or gaming platforms
- do weird things that involve millions of simultaneous connections
- (in which case you're already familiar with kernel tuning)
- if necessary, there are alternatives to
kube-proxy
; e.g.kube-router
- Most Kubernetes cluster use CNI "plugins" to implement networking
- When a pod is created, Kubernetes delegates the network setup to these plugins
- (it can be a single plugin, or a combination of plugins, each doing one task)
- Typically, CNI plugins will:
- allocate an IP address(by calling an IPAM plugin)
- add a network interface into the pod's network namespace
- configure the interface as well as required routes, etc.
- The "pod-to-pod network" or "pod network"
- provides communication between pods and nodes
- is generally implemented with CNI plugins
- The "pod-to-service network"
- provides internal communication and load balancing
- is generally implemented with kube-prox(or maybe kube-router)
- Network policies:
- provide firewalling and isolation
- can be bundled with the "pod network" or provided by another component
- Inbound traffic can be handled by multiple components:
- something like kube-proxy or kube-router(for NodePort services)
- load balancers(ideally, connected to the pod network)
- It is possible to use multiple pod networks in parallel
- (with "meta-plugins" like CNI-Genie or Multus)
- Some solutions can fill multiple roles
- (e.g. kube-router can be set up to provide the pod network and/or network polcies and/or replace kube-proxy)
- It is a DockerCoin miner!
- No,you can't buy coffee with DockerCoins
- How DockerCoins works:
- generate a few random bytes
- hash these bytes
- increment a counter(to keep track of speed)
- repeat forever!
- DockerCoins is not a cryptocurrency
- (the only common points are "randomness", "hashing" and "coins" in the name)
- DockerCoins is made of 5 services
rng
= web service generating random byteshasher
= web service computing hash of POSTed dataworker
= background process callingrng
andhasher
webui
= web interface to watch progressredis
= data store(holds a counter updated byworker
)
- These 5 services are visible in the application's Compose file, dockercoins-compose.yml
worker
invokes web servicerng
to generate random bytesworker
invokes web servicehasher
to hash these bytesworker
does this in an infinite loop- Every sceond,
worker
updatesredis
to indicate how many loops were done webui
queriesredis
, and computes and exposes "hashing speed" in our browser (See diagram on next slide!)
How does each service find out the address of the other ones?
- We do not hard-code IP addresses in the code
- We do not hard-code FQDNs in the code, either
- We just connect to a service name, and container-magic does the rest
- (And by container-magic, we mean "a crafty, dynamic, embedded DNS server")
redis = Redis("redis")
def get_random_bytes():
r = requests.get("http://rng/32")
return r.content
def hash_bytes(data):
r = requets.post("http://hasher/",
data=data,
headers={"Content-Type":"application/octet-stream"})
(Full source code available here
)
worker
will log HTTP requests torng
andhasher
rng
andhasher
will log incoming HTTP requestswebui
will give us a graph on coins mined per second
- Compose is(still) great for local development
- You can test this app if you have Docker and Compose installed
- If not, remember
play-with-docker.com
Exercise
- Download the compose file somewhere and run it
curl -o docker-compose.yml https://k8smastery.com/dockercoins-compose.yml
docker-compose up
- View the
webui
onlocalhost:8000
or click the8080
link in PWD
- The worker doesn't update the counter after every loop, but up to once per second
- The speed is computed by the browser, checking the counter about once per second
- Between two consecutive updates, the counter will increase either by 4, or by 0
- The perceived speed will therefore be 4 - 4 - 4 - 0 - 4 - 4 - 0 etc.
- What can we conclude from this?
- "I'm clearly incapple of writing good frontend code!"😁
- If we interrupt Compose(with
^C
),it will politely ask the Docker Engine to stop the app - The Docker Engine will send a
TERM
signal to the containers - If the containers do not exit in a timely manner, the Engine sends a
KILL
signal
Exercise
- Stop the application by hitting
^C
- For development using Docker, it has build, ship, and run features
- Now that we want to run on a cluster, things are different
- Kubernetes doesn't have a build feature built-in
- The way to ship(pull) images to Kubernetes is to use a registry
- What happens when we execute
docker run alpine
? - If the Engine needs to pull the
alpine
image, it expands it intolibrary/alpine
library/alpine
is expanded intoindex.docker.io/library/alpine
- The Engine communicates with
index.docker.io
to retrievelibrary/alpine:latest
- To use something else than
index.docker.io
, we specify it in the image name - Examples:
docker pull gcr.io/google-containers/alpine-with-bash:1.0
docker build -t registry.mycompany.io:5000/myimage:awesome .
docker push registry.mycompany.io:5000/myimage:awesome
- There are many options!
- Manually:
- build locally(with
docker build
or otherwise) - push to the registry
- build locally(with
- Automatically:
- build and test locally
- when ready, commit and push a code repository
- the code repository notifies an automated build system
- that system gets the code, builds it, pushes the image to the registry
- There are SAAS products like Docker Hub, Quay, GitLab...
- Each major cloud provider has an option as well
- (ACR on Azure, ECR on AWS, GCR on Google Cloud...)
- There are also commercial products to run our own registry
- (Docker Enterprise DTR, Quay, GitLab, JFrog Artifacotry...)
- And open source options, too!
- (Quay, Portus, OpenShift OCR, Gitlab, Harbor, Kraken...)
- (I don't mention Docker Distribution here because it's too basic)
- When picking a registry, pay attention to:
- Its build system
- Multi-user auth and mgmt(RBAC)
- Storage feature(replication, cahing, garbage collection)
- Create one deployment for each component
- (hasher, redis, rng, webui, worker)
- Expose deployments that need to accept connections
- (hasher, redis, rng, webui)
- For redis, we can use the official redis image
- For the 4 others, we need to build images and push them to some registry
- For everyone's convenience, we took care of building DockerCoins images
- We pushed these images to the DockerHub, under the
dockercoins
user - These images are tagged with a version number, v0.1
- The full image names are therefore:
dockercoins/hasher:v0.1
dockercoins/rng:v0.1
dockercoins/webui:v0.1
dockercoins/worker:v0.1
- We can now deploy our code(as well as a redis instance)
Exercise
- Deploy
redis
:
kubectl create deployment redis --image=redis
- Deploy everything else:
kubectl create deployment hasher --image=dockercoins/hasher:v0.1
kubectl create deployment rng --image=dockercoins/rng:v0.1
kubectl create deployment worker --image=dockercoins/worker:v0.1
kubectl create deployment webui --image=dockercoins/webui:v0.1
- After waiting for the deployment to complete, let's look at the logs!
- (Hint: use
kubectl get deploy -w
to watch deployment events)
- (Hint: use
Exercise
- Look at some logs:
kubectl logs deploy/rng
kubectl logs deploy/worker
kubectl logs deploy/worker --follow
🤔rng
is fine... But not worker
.
Oh right! We forgot to expose
- Three deployments need to be reachable by others:
hasher
,redis
,rng
worker
doesn't need to be exposedwebui
will be dealt with later
Exercise
- Expose each deployment, specifying the right port:
kubectl expose deployment redis --port 6379
kubectl expose deployment rng --port 80
kubectl expose deployment hasher --port 80
$ minikube start --vm-driver=hyperkit --registry-mirror=https://registry.docker-cn.com --image-mirror-country=cn --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers
$ minikube start --image-mirror-country=cn --iso-url=https://kubernetes.oss-cn-hangzhou.aliyuncs.com/minikube/iso/minikube-v1.6.0.iso --registry-mirror=https://dockerhub.azk8s.cn --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --vm-driver="hyperv" --hyperv-virtual-switch="minikube v switch" --memory=4096
- Now we would like to access the Web UI
- We will expose it with a
NodePort
- (just like we did for the registry)
Exercise
- Create a
NodePort
service for the Web UI:
kubectl expose deploy/webui --type=NodePort --port=80
- Check the port that was allocated:
kubectl get svc
webui NodePort 10.96.39.74 80:31900/TCP 9s
e.g. ->> http://localhost:31900
- Our ultimate goal is to get more DockerCoins
- (i.e. increase the number of loops per second shown on the web UI)
- Let's look at the
architecture
again: - We're at 4 hashes a second.Let's ramp this up!
- The loop is done in the worker; perhaps we could try adding more workers?
- All we have to do is scale the
worker
Deployment
Exercise
- Open two new terminals to check what's going on with pods and deployments:
kubectl get pods -w
kubectl get deployments -w
- Now, create more
worker
replicas:
kubectl scale deployment worker --replicas=2
- If 2 workers give us 2x speed,what about 3 workers?
Exercise
- Scale the
worker
Deployment further:
kubectl scale deployment worker --replicas=3
- The graph in the web UI should go up again.
- (This is looking great! We're gonna be RICH!)
- Let's see if 10 workers give us 10x speed!
Exercise
- Scale the
worker
Deployment to a bigger number:
kubectl scale deployment worker --replicas=10
# kubectl scale deployment rng --replicas=2
# kubectl scale deployment hasher --replicas=2
The graph will peak at 10 hashes/second. (We can add as many workers as we want: we will never go past 10 hashes/second.)
- If this was high-quality, production code, we would have instrumentation
- (Datadog, Honeycomb, New Relic, statsd, Sumologic,...)
- It's not
- Perhaps we could benchmark our web services?
- (with tools like
ab
, or even simpler,httping
)
- (with tools like
- We want to check
hasher
andrng
- We are going to use
httping
- It's just like
ping
, but using HTTPGET
requests- (it measures how long it takes to perform one
GET
request)
- (it measures how long it takes to perform one
- It's used like this:
httping [-c count] http://host:port/path
- Or even simpler:
httping ip.ad.dr.ess
- We will use
httping
on the ClusterIP addresses of our services
- We can simply check the output of
kubectl get services
- Or do it programmatically, as in the example below
Exercise
- Retrieve the IP addresses:
HASHER=$(kubectl get svc hasher -o go-template={{.spec.clusterIP}})
RNG=$(kubectl get svc rng -o go-template={{.spec.clusterIP}})
Now we can access the IP addresses of our services through $HASHER
and $RNG
.
Exercise
- Remember to use
shpod
on macOS and Windows:
kubectl apply -f https://bret.run/shpod.yml
kubectl attach --namespace=shpod -ti shpod
- Check the response times for both services:
httping -c 3 $HASHER
httping -c 3 $RNG
- The bottleneck seems to be
rng
- What if we don't have enough entropy and can't generate enough random numbers?
- We need to scale out the
rng
service on multiple machines!
Note:this is a fiction!We have enough entropy.But we need a pretext to scale out.
- Oops we only have one node for learning.🤔
- Let's pretend and I'll explain along the way.
(in fact, the code of rng
uses /dev/urandom
. which never runs out of entropy......
...and is just as good as /dev/random
)
- So far, we created resources with the following commands:
kubectl run
kubectl create deployment
kubectl expose
- We can also create resources directly with YAML manifests
kubectl create -f whatever.yaml
- creates resources if they don't exist
- if resources already exist, don't alter them(and display error message)
kubectl apply -f whatever.yaml
- creates resources if they don't exist
- if resources already exist,update them(to match the definition provided by the YAML file)
- stores the manifest as an annotation in the resource
- The manifest can contain multiple resources separated by
---
kind: ...
apiVersion: ...
metadata:
name: ...
...
spec:
...
---
kind: ...
apiVersion: ...
metadata:
name: ...
...
spec:
...
- The manifest can also contain a list of resources
apiVersion: v1
kind: List
items:
- kind: ...
apiVersion: ...
...
- kind: ...
apiVersion: ...
...
- We provide a YAML manifest with all the resources for DockerCoins (Deployments and Services)
- We can use it if we need to deploy or redeploy DockerCoins
Exercise
- Deploy or redeploy DockerCoins:
kubectl apply -f https://k8smastery.com/dockercoins.yaml
- Note the warinings if you already had the resources created
- This is because we didn't use
apply
before - This is OK for us learning, so ignore the warnings
- Generally in production you want to stick with one method or the other
- Kubernetes resources can also be viewed with an official web UI
- That dashboard is usually exposed over HTTPS (this requires obtaining a proper TLS certificate)
- Dashboard users need to authenticate
- We are going to take a dangerous shortcut
- We could(and should) use Let's Encrypt...
- ...but we don't want to deal with TLS certificates
- We could(and should) learn how authentication and authorization work...
- ...but we will use a guest account with admin access instead
Yes, this will open our cluster to all kinds of shenanigans.Don't do this at home.
- We are going to deploy that dashboard with one single command
- This command will create all the necessary resources (the dashboard itself, the HTTP wrpper, the admin/guest account)
- All these resources are defined in a YAML file
- All we have to do is load that YAML file with
kubctl apply -f
Exercise
- Create all the dashboard resources, with the following command:
kubectl apply -f https://k8smastery.com/insecure-dashboard.yaml
kubectl get svc dashboard
- We have three authentication options at this point:
- token (associated with a role that has appropriate permissions)
- kubeconfig (e.g. using the
~/.kube/config
file) - "skip" (use the dashboard "service account")
- Let's use "skip": we're logged in!
By the way, we just added a backdoor to our Kubernetes cluster!
- The steps that we just showed you are for educational purpose only!
- If you do that on your production cluster, people can and will abuse it
- For an in-depth discussion about securing the dashboard
- check this execllent post on Heptio's blog
- Minikube/microK8s can be enabled with easy commands
minikube dashboard
andmicrok8s.enable dashboard
- Kube Web View
- read-only dashboard
- optimized for "troubleshooting and incident response"
- see vision and goals for details
- Kube Ops View
- "provides a common operational picture for multiple Kubernetes clusters"
- Your Kubernetes distro comes with one!
- Cloud-provided control-planes often don't come with one.
- When we do
kubectl apply -f <URL>
.we create arbitrary resources - Resources can be evil;imagine a
deployment
that...- starts bitcoin miners on the whole cluster
- hides in a non-default namespace
- bing-mounts our nodes' filesystem
- inserts SSH keys in the root account(on the node)
- encrypts our data and ransoms it
curl | sh
is convenient- It's safe if you use HTTPS URLs from trusted sources
kubectl apply -f
is convenient- It's safe if you use HTTPS URLs from trusted sources
- Example:the official setup instructions for most pod networks
- We want to scale
rng
in a way that is different from how we scaledworker
- We want one(and exactly one) instance of
rng
per node - We do not want two instances of
rng
on the same node - We will do that with a daemon set
- Can't we just do
kubectl scale deployment rng --replicas=...
? - Nothing guarantees that the
rng
containers will be distributed evenly - If we add nodes later,they will not automatically run a copy of
rng
- If we remove(or reboot) a node, one
rng
container will restart elsewhere (add we will end up with two instancesrng
on the same node)
- Daemon sets are great for cluster-wide, per-node processes:
kube-proxy
- CNI network plugins
- monitoring agents
- hardware management tools(e.g. SCSI/FC HBA agents)
- etc.
- They can also be restricted to run only on some nodes
- Unfortunately, as of Kubernetes 1.15, the CLI cannot create daemon sets
- More precisely: it doesn't have a subcommand to create a daemon set
- But any kind of resource can always be created by providing a YAML description:
kubectl apply -f foo.yaml
- How do we create the YAML file for our daemon set?
- option 1: read the docs
- option 2:
vi
our way out of it
- Let's start with the YAML file for the current
rng
resource Exercise - Dump the
rng
resource in YAML:
kubectl get deploy/rng -o yaml > rng.yml
- Edit
rng.yml
- What if we just changed the
kind
field?(It can't be that easy,right?)
Exercise
- Change
kind: Deployment
tokind: DaemonSet
- Save, quit
- Try to create our new resource:
kubectl apply -f rng.yml
- The core of the error is:
error validating data:
[ValidationError(DaemonSet.spec):
unknown field "replicas" in io.k8s.api.extensions.v1beta1.DaemonSetSpec,
...
- Obviously, it doesn't make sense to specify a number of replicas for a daemon set
- Workaround:fix the YAML
- remove the
replicas
field - remove the
strategy
field(which defines the rollout mechanism for a deployment) - remove the
progressDeadlineSeconds
field(also used by the rollout mechani) - remove the
status: {}
line at the end
- remove the
- We could also tell Kubernetes to ignore these errors and try anyway
- The
--force
flag's actual name is--validate=false
Exercise
- Try to load our YAML file and ignore errors:
kubectl apply -f rng.yml --validate=false
Wait...Now, can it be that easy?
- Did we transform our
deployment
into adaemonset
? Exercise - Look at the resources that we have now:
kubectl get all
- You can have different resource types with the same name
(i.e. a deployment and a daemon set both named
rng
) - We still have the old
rng
deployment
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/rng 1/1 1 1 20h
- But how we have the new
rng
daemon set as well
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/rng 1 1 1 1 1 <none> 7s
- If we check with
kubectl get pods
, we see:- one pod for the deployment(named
rng-xxxxxxxxxxx-yyyyy
) - one pod per node for the daemon set(named
rng-zzzzz
)
- one pod for the deployment(named
NAME READY STATUS RESTARTS AGE
pod/rng-58596fdbd-lqlsp 1/1 Running 0 20h
pod/rng-qnq4n 1/1 Running 0 7s
The daemon set created one pod per node.
In a multi-node setup, master usually have taints preventing pods from running there.
(To schedule a pod on this node anyway,the pod will require appropriate tolerations)
- Look at the web UI
- The graph should now go above 10 hashes per second!
- It looks like the newly created pods are serving traffic correctly
- How and why did this happen?
(We didn't do anything special to add them to the
rng
service load balancer!)
- The
rng
service is load balancing requests to a set of pods - That set of pods is defined by the selector of the
rng
service
Exercise
- Check the selector in the
rng
service definition:
kubectl describe service rng
- The selector is
app=rng
- It means "all the pods having the label app=rng" (They can have additional labels as well, that's OK!)
- We can use selectors with many
kubectl
commands - For instance, with
kubectl get
,kubectl logs
,kubectl delete
... and more
Exercise
- Get the list of pods matching selector
app=rng
:
kubectl get pods -l app=rng
kubectl get pods --selector app=rng
But...why do these pods(in particular, the new ones)have this app=rng
label?
Where do labels come from?
- When we create a deployment with
kubectl create deployment rng
, this deployment gets the labelapp=rng
- The replica sets created by this deployment also get the label
app=rng
- The pods created by these replica sets also get the label
app=rng
- When we created the daemon set from the deployment, we re-used the same spec
- Therefore, the pods created by the daemon set get the same labels
- When we use
kubectl run stuff
, the label isrun=stuff
instead
- We would like to remove a pod from the load balancer
- What would happen if we removed that pod, with
kubectl delete pod...
?- It would be re-created immediately(by the replica set ro the daemon set)
- What would happen if we removed the
app=rng
label from that pod?- It would also be re-created immediately
- Why?!?
- The "mission" of a replica set is:
- "Make sure that there is the right number of pods matching this spec!"
- The "mission" of a daemon set is:
- "Make sure that there is a pod matching this spec on each node!"
- In fact, replica sets adn daemon sets do not check pod specifications
- They merely have a selector, and they look for pods matching that selector
- Yes, we can fool them by manually creating pods with the "right" labels
- Bottom line:if we remove our
app=rng
label... *...The pod "disappears" for its parent, which re-creates another pod to replace it - Since both the
rng
daemon set and therng
replica set useapp=rng
...- ...Why don't they "find" each other's pods?
- Replica sets have a more specific selector, visible with
kubectl describe
- (It looks like
app=rng, pod-template-hash=abcd1234
)
- (It looks like
- Daemon sets also have a more specific selector, but it's invisible
- (It looks like
app=rng, controller-revision-hash=abcd1234
)
- (It looks like
- As a result,each controller only "sees" the pods it manages
- Currently, the
rng
service is defined by theapp=rng
selector - The only way to remove a pod is to remove or change the
app
label - ...But that will cause another pod to be created instead!
- What's the solution?
- We need to change the selector of the
rng
service! - Let's add another label to that selector(e.g. enabled=yes)
- If a selector specifies multiple labels, they are understood as a logical AND (In other words: the pods must match all the labels)
- Kubernetes has support for advanced, set-based selectors (But these cannot be used with services, at least not yet!)
- Add the label
enabled=yes
to all ourrng
pods - Update the selector for the
rng
service to also includeenabled=yes
- Toggle traffic to a pod by manually adding/removing the
enabled
label - Profit!
Note: if we swap steps 1 and 2, it will cause a short
service disruption, becasuse there will be
a period of time during which the service selector
won't match any pod. During that time,
requests to the service will time out.By doing
things in the order above, we guarantee that
there won't be any interruption.
- We want to add the label
enabled=yes
to all pods that haveapp=rng
- We could edit each pod one by one with
kubectl edit
... - ... Or we could use
kubectl label
to label them all kubectl label
can use selectors itself
Exercise
- Add
enabled=yes
to all pods that haveapp=rng
kubectl label pods -l app=rng enabled=yes
- We need to edit the service specification
- Reminder: in the service deinition, we will see
app: rng
in two places- the label of the service itself(we don't need to touch that one)
- the selector of the service(that's the one we want to change)
Exercise
- Update the service to add
enabled: yes
to its selector:
kubectl edit service rng
- YAML parsers try to help us:
xyz
is the string"xyz"
42
is the integer42
yes
is the boolean valuetrue
- If we want the string
"42"
or the string"yes"
, we have to quote them - So we have to use
enabled: "yes"
For a good laugh:if we had used "ja", "oui", "si"... as the value,it would have world.
Exercise
- Update the service to add
enabled: "yes"
to its selector:
kubectl edit service rng
- This time is should work!
- If we did everything correctly, the web UI shouldn't show any change.
- We want to disable the pod that was created by the deployment
- All we have to do, is remove the
enabled
label from that pod - To identify that pod, we can use its name
- ... Or rely on the fact that it's the only one with a
pod-template-hash
label - Good to know:
kubctl label ... foo=
doesn't remove a label(it sets it to an empty string)- to remove label
foo
, usekubectl label ... foo-
- to change an existing label, we would need to add
--overwrite
Exercise
- In one window, check the logs of that pod:
POD=$(kubectl get pod -l app=rng,pod-template-hash -o name)
kubectl logs --tail 1 --follow $POD
(We should see a steady stream of HTTP logs)
- In another window, remove the label from the pod:
kubectl label pod -l app=rng,pod-template-hash enabled-
(The stream of HTTP logs should stop immediately)
There might be a slight change in the web UI(since we removed a bit of capacity from rng
service).if we remove more pods, the effect should be more visible.
- If we scale up our cluster by adding new nodes, the daemon set will create more pods
- These pods won't have the
enable=yes
label - If we want these pods to have that label, we need to edit the daemon set spec
- We can do that with e.g.
kubectl edit daemonset rng
- When a pod is misbehaving, we can delete it:another one will be recreated
- But we can also change its labels
- It will be removed from the load balancer(it won't receive traffic anymore)
- Another pod will be recreated immediately
- But the problematic pod is still here, and we can inspect and debug it
- We can even re-add it to the rotation if necessary
- (Very useful to troubleshoot intermittent and elusive bugs)
- Conversely, we can add pods matching a service's selector
- These pods will then receive requests and serve traffic
- Examples:
one-shot pod with all debug flags enabled, to collect logs
- pods created automatically, but added to rotation in a second step (by setting their label accordingly)
- This gives us building blocks for canary and blue/green deployments
Let's cleanup before we start the next lecture!
Exercise
- remove our DockerCoin resources(for now):
kubectl delete -f https://k8smastery.com/dockercoins.yaml
kubectl delete daemonset/rng
- To use Kubernetes is to "live in YAML"!
- It's more important to learn the foundations then to memorize all YAML keys(hundreds+)
- There are various ways to generate YAML with Kubernetes, e.g.:
kubectl run
kubectl create deployment
(and a few otherkubectl create
variants)kubectl expose
- These commands use "generators" because the API only accepts YAML(actually JSON)
- Pro: They are easy to use
- Con: They have limits
- When and why do we need to write our own YAML?
- How do we write YAML from scratch?
- And maybe,what is YAML?
- It's technically a superset of JSON, designed for humans
- JSON was good for machines, but not for humans
- Spaces set the structure. One space off and game over
- Remember spaces not tabs,Ever!
- Two spaces is standard, but four spaces works too
- You don't have to learn all YAML features, but key concepts you need:
- Key/Value Pairs
- Array/Lists
- Dictionary/Maps
- Good online tutorials exist here, here, here, and YouTube here
- Can be in YAML or JSON, but YAML is 100%.
- Each file contains one or more manifests
- Each manifest describes an API object(deployment, service, etc.)
- Each manifest needs four parts(root key:values in the file)
apiVersion:
kind:
metadata:
spec:
- This is a single manifest that creates one Pod
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:1.17.3
apiVersion: v1
kind: Service
metadata:
name: mynginx
spec:
type: NodePort
ports:
- port: 80
selector:
app: mynginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mynginx
spec:
replicas: 3
selector:
matchLabels:
app: mynginx
template:
metadata:
labels:
app: mynginx
spec:
containers:
- name: nginx
image: nginx:1.17.3
- Advanced(and even not-so-advanced) features require us to write YAML:
- pods with multiple containers
- resource limits
- healthchecks
- many other resource options
- Other resource types don't have their own commands!
- DaemonSets
- StatefulSets
- and more!
- How do we access these features?
- Output YAML from existing resources
- Create a resource(e.g. Deployment)
- Dump its YAML with
kubectl get -o yaml...
- Edit the YAML
- Use
kubectl apply -f ...
with the YAML file to: - update the resource(if it's the same kind)
- create a new resource(if it's a different kind)
- Or...we have the docs, with good starter YAML
- StatefulSet, DaemonSet, ConfigMap, and a ton more on Github
- Or...we can use
-o yaml --dry-run
- We can use the
-o yaml --dry-run
option combo witherun
andcreate
Exercise
- Generate the YAML for a Deployment without creating it:
kubectl create deployment web --image nginx -o yaml --dry-run
- Generate the YAML for a Namespace without creating it:
kubectl create namespace awesome-app -o yaml --dry-run
- We can clean up the YAML even more if we want
- (for instance, we can remove the
creationTimestamp
and empty dicts)
- (for instance, we can remove the
clusterrole # Create a ClusterRole.
clusterrolebinding # Create a ClusterRoleBinding for a particular ClusterRole.
configmap # Create a configmap from a local file, directory or literal
cronjob # Create a cronjob with the specified name
deployment # Create a deployment with the specified name
job # Create a job with the specified name
namespace # Create a namespace with the specified name
poddisruptionbudget # Create a pod disruption budget with the specified name
priorityclass # Create a priorityclass with the specified name
quota # Create a quota with the specified name
role # Create a role with single rule
rolebinding # Create a RoleBinding for a particular Role or ClusterRole
secret # Create a secret using specified subcommand
service # Create a service using specified subcommand
serviceaccount # Create a service account with the specified name
- Ensure you use valid
create
commands with required options for each
- Paying homage to Kelsey Hightower's "Kubernetes The Hard Way"
- A reminder about manifest:
- Each file contains one or more manifests
- Each manifest describes an API object(deployment, service, etc.)
- Each manifest needs four parts(root key:values in the file)
apiVersion: # find with "kubectl api-versions"
kind: # find with "kubectl api-resources"
metadata:
spec: # find with "kubectl describe pod"
- Those three
kubectl
commands, plus the API docs, is all we'll need
- Find the resource
kind
you want to create(api-resources
) - Find the latest
apiVersion
your cluster supports forkind
(api-versions
) - Give it a
name
in metadata(minimum) - Dive into the
spec
of thatkind
kubectl explain <kind>.spec
kubectl explain <kind> --recursive
- Browse the docs
API Reference
for your cluster version to supplement - Use
--dry-run
and--server-dry-run
for testing kubectl create
anddelete
until you get it right
kubectl api-resources
kubectl api-versions
kubectl explain pod
kubectl explain pod.spec
kubectl explain pod.spec.volumes
kubectl explain pod.spec --recursive
- Using YAML(instead of
kubectl run/create/etc.
)allows to be declarative - The YAML describes the desired state of our cluster and applications
- YAML can be stored. versioned, archived(e.g. in git repositories)
- To change resource, change the YAML files(instead of using
kubectl edit/scale/label/etc.
) - Changes can be reviewed before being applied (with code reviews, pull requests...)
- This workflow is sometimes called "GitOps" (there are tools like Weave Flux or GitKube to facilitate it)
- Get started with
kubectl run/create/expose/etc.
- Dump the YAML with
kubectl get -o yaml
- Tweak that YAML and
kubectl apply
it back - Store that YAML for reference(for further deployments)
- Feel free to clean up the YAML
- remove fields you don't know
- check that it still works!
That YAML
will be useful later when using e.g. Kustomize or Helm
- Use generic linters to check proper YAML formatting
- yamllint.com
- codebeautiful.org/yaml-validator
- For humans without kubectl, use a web Kubernetes YAML validator: kubeyaml.com
- In CI, you might use CLI tools
- YAML linter:
pip install yamllint
github.com/adrienverge/yamllint - Kubernetes validator:
kubeval
github.com/instrumenta/kubeval
- YAML linter:
- We'll learn about Kubernetes cluster-specific validation with kubectl later
- We already talked about using
--dry-run
for building YAML - Let's talk more about options for testing YAML
- Including testing against the live cluster API!
- The
--dry-run
option can also be used withkubectl apply
- However, it can be mileading(it doesn't do a "real" dry run)
- Let's see what happens in the following scenario:
- generate the YAML for a Deployment
- tweak the YAML to transform it into a DaemonSet
- apply that YAML to see what would actually be created
Exercise
- Generate the YAML for a deployment:
kubectl create deployment web --image=nginx -o yaml > web.yaml
- Change the
kind
in the YAML to make it aDaemonSet
- Ask
kubectl
what would be applied:
kubectl apply -f web.yaml --dry-run --validate=false -o yaml
The resulting YAML doesn't represent a valid DaemonSet.
- Since Kubernetes 1.13, we can use server-side dry run and diffs
- Server-side dry run will do all the work, but not persist to etcd
- (all validation and mutation hooks will be executed)
Exercise
- Try the same YAML files as earlier,with server-side dry run:
kubectl apply -f web.yaml --server-dry-run --validate=false -o yaml
- The resulting YAML doesn't have the
replicas
field anymore. - Instead, it has the fields expected in a DaemonSet
- The YAML is verified much more extensively
- The only step that is skipped is "write to etcd"
- YAML that passes server-side dry run should apply successfully (unless the cluster state changes by the time the YAML is actually applied)
- Validating or mutating hooks that have side effects can also be an issue
- Kubernetes 1.13 also introduced
kubectl diff
kubectl diff
does a server-side dry run, and shows differences
Exercise
- Try
kubectl diff
on a simple Pod YAML:
curl -O https://k8smastery.com/just-a-pod.yaml
kubectl apply -f just-a-pod.yaml
# edit the image ta to :1.17
kubectl diff -f just-a-pod.yaml
Note: we don't need to specify --validate=false
here.
Let's cleanup before we start the next lecture!
Exercise
- remove our "hello" pod:
kubectl delete -f just-a-pod.yaml
- By default(without rolling updates), when a scaled resource is updated:
- new pods are created
- old pods are terminated
- ...all at the same time
- if something goes wrong,
- With rolling updates, when a Deployment is updated, it happens progressively
- The Deployment controls multiple ReplicaSets
- Each ReplicaSet is a group of identical Pods (with the same image, arguments, parameters)
- During the rolling update, we have at least two ReplicaSets:
- the "new" set(corresponding to the "target" version)
- at least one "old" set
- We can have multiple "old" sets (if we start another update before the first one is done)
- Two parameters determine the pace of the rollout:
maxUnavailable
andmaxSurge
- They can be specified in absolute number of pods, or percentage of the
replicas
count - At any given time...
- there will always be at least
replicas-maxUnavailable
pods available - there will never be more than
replicas
+maxSurge
pods in total - there will therefore be up to
maxUnavailable
+maxSurge
pods being updated
- there will always be at least
- We have the possibility of rolling back to the previous version (if the update fails or is unsatisfactory in any way)
- Recall how we build custom reports with
kubectl
andjq
:
Exercise
- Show the rollout plan for our deployments:
kubectl apply -f https://k8smastery.com/dockercoins.yaml
kubectl get deploy -o json | jq ".items[] | {name:.metadata.name} + .spec.strategy.rollingUpdate"
- As of Kubernetes 1.8, we can do rolling updates with:
deployments
,daemonsets
,statefulsets
- Editing one of these resources will automatically result in a rolling update
- Rolling updates can be monitored with the
kubectl rollout
subcommand
Exercise
- Let's monitor what's going on by opening a few terminals, and run :
kubectl get pods -w
kubectl get replicasets -w
kubectl get deployments -w
- Update
worker
either withkubectl edit
, or by running:
kubectl set image deploy worker worker=dockercoins/worker:v0.2
That rollout should be pretty quick. What shows in the web UI?
- At first, it looks like nothing is happening(the graph remains at the same level)
- According to
kubectl get deploy -w
, thedeployment
was updated really quickly - But
kubectl get pods -w
tells a different story - The old
pods
are still here, and they stay inTerminating
state for a while - Eventually, they are terminated; and then the graph decreases significantly
- This delay is due to the fact that our worker doesn't handle signals
- Kubernetes sends a "polite" shutdown request to the worker,which ignores it
- After a grace period, Kubernetes gets impatient and kills the container (The grace period is 30 seconds, but can be changed if needed)
- What happens if we make a mistake?
Exercise
- Update
worker
by specifiying a non-existent image:
kubectl set image deploy worker worker=dockercoins/worker:v0.3
- Check what's going on:
kubectl rollout status deploy worker
Our rollout is stuck. However, the app is not dead. (After a minute, it will stabilize to be 20-25% slower.)
kubectl get pods
# ErrImagePull
- Why is our app a bit slower?
- Because
MaxUnavailable=25%
- ...So the rollout terminated 2 replicas out of 10 available
- Okay, but why do we see 5 new replicas being rolled out?
- Because
MaxSurge=25%
...So in addition to replacing 2 replicas, the rollout is also starting 3 more - It rounded down the number of MaxUnavailable pods conservatively. but the total number of pods being rolled out is allowed to be 25+25=50%
- We start with 10 pods running for the
worker
deployment - Current settings: MaxUnavailable=25% and MaxSurge=25%
- When we start the rollout:
- two replicas are taken down(as per MaxUnavailable=25%)
- two others are created(with the new version) to replace them
- three others are created(with the new version) per MaxSurge=25%)
- Now we have 8 replicas up and running, and 5 being depoyed
- Our rollout is stuck at this point!
- If you didn't deploy the Kubernetes dashboard earlier, just kip this slide.
Exercise
- Connect to the dashboard that we deployed earlier
- Check that we have failures in Deployments, Pods, and Replica Sets
- Can we see the reason for the failure?
kubectl describe deploy worker
- We could push some
v0.3
image (the pod retry logic will eventually catch it and the rollout will proceed) - Or we could invoke a manual rollback
Exercise
- Cancel the deployment and wait for the dust to settle:
kubectl rollout undo deploy worker
# deployment.extensions/worker rolled back
kubectl rollout status deploy worker
# deployment "worker" successfully rolled out
- We reverted to
v0.2
- But this version still has a performance problem
- How can we get back to the previous version?
- What happens if we try
kubectl rollout undo
again?
Exercise
- Try it:
kubectl rollout undo deployment worker
- Check the web UI, the list of pods....
🤔That didn't work.
- If we see successive versions as a stack:
kubectl rollout undo
doesn't "pop" the last element from the stack- it copies the N-1th element to the top
- Multiple "undos" just swap back and forth between the last two versions!
Exercise
- Go back to v0.2 again:
kubectl rollout undo deployment worker
- Our version numbers are easy to guess
- What if we had used git hashes?
- What if we had changed other parameters in the Pod spec?
- We can list successive versions of a Deployment with
kubectl rollout history
Exercise
- Look at our successive versions:
kubectl rollout history deployment worker
We don't see all revisions.
We might see something like 1, 4, 5.
(Depending on how many "undos" we did before.)
- These revisions correspond to our ReplicaSets
- This information is stored in the ReplicaSet annotations
Exercise
- Check the annotations for our replica sets:
kubectl describe replicasets -l app=worker | grep -A3 Annotations
- The missing revisions are stored in another annotation:
deployment.kubernetes.io/revision-history
- These are not shown in
kubectl rollout history
- We could easily reconstruct the full list with a script (if we wanted to!)
kubectl rollout undo
can work with a revision number
Exercise
- Roll back to the "known good" deployment version:
kubectl rollout undo deployment worker --to-revision=1
- Check the web UI or the list of pods
- What if we wanted to, all at once:
- change image to
v0.1
- be conservative on availability(always have desired number of available workers)
- go slow on rollout speed(update only one pod at a time)
- give some time to our workers to "warm up" before starting more
- change image to
- The corresponding changes can be expressed in the following YAML snippet:
spec:
template:
spec:
containers:
- name: worker
image: dockercoins/worker:v0.1
strategy:
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
minReadySeconds: 10
- We could use
kubectl edit deployment worker
- But we could also use
kubectl patch
with the exact YAML shown before - Apply our changes and wait for them to take effect:
kubectl patch deployment worker -p "
spec:
template:
spec:
containers:
- name: worker
image: dockercoins/worker:v0.1
strategy:
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
minReadySeconds: 10
"
kubectl rollout status deployment worker
kubectl get deploy -o json worker | jq "{name: .metadata.name} + .spec.strategy.rollingUpdate"
- Healthchecks are key to providing built-in lifecycle automation
- Healthchecks are probes that apply to containers(not to pods)
- Kubernetes will take action on containers that fail healthchecks
- Each container can have three(optional) probes:
- liveness = is this container dead or alive?(most important probe)
- readiness = is this container ready to serve traffic?(only needed if a service)
- startup = is this container still starting up?(alpha in 1.16)
- Different probe handlers are available(HTTP, TCP, program execution)
- They don't replace a full monitoring solution
- Let's see the difference and how to use them!
- Indicates if the container is dead or alive
- A dead container cannot come back to life
- If the liveness probe fails, the container is killed (to make really sure that it's really dead; no zombies or undeads!)
- What happens next depends on the pod's
restartPolicy
:Never
: the container is not restartedOnFailure
orAlways
: the container is restarted
- To indicate failures that can't be recovered
- deadlocks(causing all requests to time out)
- internal corruption(causing all requests to error)
- Anything where our incident response would be "just restart/reboot it"
- Do not use liveness probes for problems that can't be fixed by a restart
- Otherwise we just restart our pods for no reason, creating useless load
- Indicates if the container is ready to serve traffic
- If a container becomes "unready" it might be ready again soon
- If the readiness probe fails:
- the container is not killed
- if the pod is memeber of a service, it is temporarily removed
- it is re-added as soon as the readiness probe passes again
- To indicate failure due to an external cause
- database is down or unreachable
- mandatory auth or other backend service unavailable
- To indicate temporary failure or unavailability
- application can only service N parallel connections
- runtime is busy doing garbage collection or ini
- Kubernetes 1.16 introduces a third tye of probe:
startupProbe
(it is inalpha
in Kubernetes 1.16) - It can be used to indicate "container not ready yet"
- process is still starting
- loading external data, priming caches
- Before Kubernetes 1.16, we had to use the
initialDelaySeconds
parameter (available for both liveness and readiness probes) initialDelaySeconds
is a rigid delay(always wait X before running probes)startupProbe
works better when a container start time can vary a lot
- Rolling updates proceed when containers are actually ready (as opposed to merely started)
- Containers in a broken state get killed and restarted (instead of serving errors or timeouts)
- Unavailable backends get removed from load balancer rotation (thus improving response times across the board)
- If a probe is not defined, it's as if there was an "always successful" probe
- HTTP request
- specify URL of the request(and optional headers)
- any status code between 200 and 399 indicates success
- TCP connection
- the probe succeeds if the TCP port is open
- arbitrary exec
- a command is executed in the container
- exit status of zero indicates success
- Probes are executed at intervals of
periodSeconds
(default: 10) - The timeout for a probe is set with
timeoutSeconds
(default: 1)
If a probe takes longer than that,it is considered as a FAIL
- A probe is considered successful after
successThreshold
successes(default: 1) - A probe is considered failing after
failureThreshold
failures (default: 3) - A probe can have an
initialDelaySeconds
parameter(default: 0) - Kubernetes will wait that amount of time before running the probe for the first time (this is important to avoid killing services that take a long time to start)
Here is a pod template for the rng
web service of the DockerCoins app:
apiVersion: v1
kind: Pod
metadata:
name: rng-with-liveness
spec:
containers:
- name: rng
image: dockercoins/rng:v0.1
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 1
If the backend serves an error, or takes longer than 1s, 3 times in a row,it gets killed.
Example: exec probe
Here is a pod template for a Redis server:
apiVersion: v1
kind: Pod
metadata:
name: redis-with-liveness
spec:
containers:
- name: redis
image: redis
livenessProbe:
exec:
command: ["redis-cli", "ping"]
If the Redis process becomes unresponsive, it will be killed.
- A HTTP/TCP probe can't check an external dependency
- But a HTTP URL could kick off code to validate a remote dependency
- If a web server depends on a database to function, and the database is down:
- the web server's liveness probe should succeed
- the web server's readiness probe should fail
- Same thing for any hard dependency(without which the container can't work)
Do not fail liveness probes for problems that are external to the container
(In that context, worker = process that doesn't accept connections)
- Readiness isn't useful(because workers aren't backends for a service)
- Liveness may help us restart a broken worker, but how can we check it?
- Embedding an HTTP server is a(potentially expensive) option
- Using a "lease" file can be relatively easy:
- touch a file during each iteration of the main loop
- check the timestamp of that file from an exec probe
- Writing logs(and checking them from the probe) also works
- Do we want liveness, readiness, both? (sometimes, we can use the same check, but with different failture thresholds)
- Do we have existing HTTP endpoints that we can use?
- Do we need to add new endpoints, or perhaps use something else?
- Are our healthchecks likely to use resources and/or slow down the app?
- Do they depend on additional services? (this can be particularly tricky)
- Let's add healthchecks to DockerCoins!
- We will examine the questions of the previous slide
- Then we will review each component individually to add healthchecks
- To answer that question, we need to see the app run for a while
- Do we get temporary, recoverable glitches?
- -> then sue liveness
- Or do we get hard lock-ups requiring a restart?
- -> then use liveness
- In the case of DockerCoins, we don't know yet!
- Let's pick liveness
- Each of the 3 web services(hasher, rng, webui) has a trivial route on
/
- These routes:
- don't seem to perform anything complex or expensive
- don't seem to call other services
- Perfect! (See next slides for individual details)
hasher.rb
get '/' do
"HASHER running on #{Socket.gethostname}\n"
end
rng.py
@app.route("/")
def index():
return "RNG running on {}\n".format(hostname)
webui.js
app.get('/', function (req, res) {
res.redirect('/index.html')
})
- I've split up the previous
dockercoins.yaml
into one-resource-per-file - This works with the
apply
command, and is easier for humans to manage - Clone them locally so we can add healthchecks and re-apply
Exercise
- Clone that repository:
git clone https://github.com/bretfisher/kubercoins
- Change directory to the repository:
cd kubercoins
This is what our liveness probe should look like:
containers:
- name: ...
image: ...
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
periodSeconds: 5
This will give 30 seconds to the service to start.(Way more than necessary!)
It will run the probe every 5 seconds.
It will use the default timeout(1 second).
It will use the default failure threshold(3 failed attempts = dead).
It will use the default success threshold(1 successful attempt = alive).
- Let's add the liveness probe, then deploy DockerCoins
- Remember if you don't have DockerCoins running, this will create
- If you already have DockerCoins running, this will update
rng
Exercise
- Edit
rng-deployment.yaml
and add the liveness probe
vim rng-deployment.yaml
- Load the YAML for all the resources of DockerCoins
kubectl apply -f .
- The rng service needs 100ms to process a request (because it is single-threaded and sleeps 0.1s in each request)
- The probe timeout is set to 1 second
- If we send more than 10 requests per second per backend,it will break
- Let's generate traffic and see what happens!
Exercise
- Get the ClusterIP address of the rng service:
kubectl get svc rng
- Each command below will show us what's happening on a different level
Exercise
- In one window, monitor cluster events:
kubectl get events -w
- In another window, monitor pods status:
kubectl get pods -w
- Let's use
ab
(Apache Bench) to send concurrent requests to rng
Exercise
- In yet another window, generate traffic using
shpod
:
kubectl attach --namespace=shpod -ti shpod
ab -c 10 -n 1000 http://<ClusterIP>/1
-
Experiment with higher values of
-c
and see what happens -
The
-c
parameter indicates the number of concurrent requests -
The final
/1
is important to generate actual traffic (otherwise we would use the ping endpoint, which doesn't sleep 0.1s per request)
e.g.
# t1
kubectl get svc rng
# kubectl apply -f https://k8smastery.com/shpod.yaml
kubectl attach --namespace=shpod -ti shpod
ab -c 10 -n 1000 http://10.101.25.252/1
# t2
kubectl get events -w
# t3
kubectl get pods -w
- Above a given threshold, the liveness probe starts failing
- (about 10 concurrent requests per backend should be plenty enough)
- When the liveness probe fails 3 times in a row, the container is restarted
- During the restart, there is less capacity available
- ...Meaning that the other backends are likely to timeout as well
- ...Eventually causing all backends to be restarted
- ...And each fresh backend gets restarted, too
- This goes on until load goes down, or we add capacity
This wouldn't be a good healthcheck in a real application!
- We need to make sure that the healthcheck doesn't trip when performance degrades due to external pressure
- Using a readiness check would have fewer effects
- (but it would still be an imperfect solution)
- A possible combination:
- readiness check with a short timeout / low failure threshold
- liveness check with a longer timeout / higher failure threshold
- A liveness probe is enough (it's not useful to remove a backend from rotation when it's the only one)
- We could use an exec probe running
redis-cli ping
- When using exec probes, we should make sure that we have a zombie reaper 🤔🤔Wait,what?
- When a process terminates, its parent must call
wait()
/waitpid()
(this is how the parent process retrieves the child's exit status) - In the meantime, the process is in zombie state
(the process state will show as
z
inps
,top
...) - When a process is killed, its children are orphaned and attached to PID 1
- PID 1 has the responsibility of reaping these processes when they terminate
- OK, but how does that affect us?
- On ordinary systems, PID 1(
/sbin/init
) has logic to reap porcesses - In containers, PID 1 is typically our application process (e.g. Apache, the JVM, NGINX, Redix...)
- These do not take care of reaping orphans
- If we use exec probes, we need to add a process reaper
- We can add tini to our images
- Or share the PID namespace between containers of a pod
- (and have gcr.io/pause take care of the reaping)
- Discussion of this in Video - 10 Ways to Shoot Yourself in the Foot with Kubernetes, Will Surprise You
- Add
tini
to your own custom redis image - Change the kubercoins YAML to use your own image
- Create a liveness probe in kuberoins YAML
- Use
exec
handeler and runtini -s -- redis-cli ping
- Example repo here:
github.com/BretFisher/redis-tini
containers:
- name: redis
image: custom-redis-image
livenessProbe:
exec:
command:
- /tini
- -s
- --
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 5
Let's cleanup before we start the next lecture!
Exercise
- remove our DockerCoin resources(for now):
kubectl delete -f https://k8smastery.com/dockercoins.yaml
- Some application need to be configured(obviously!)
- There are many ways for our code to pick up configuration:
- command-line arguments
- environment variables
- configuration files
- configuration servers(getting configuration from a database, an API...)
- ... and more(because programmers can be very creative!)
- How can we do these things with containers and Kubernetes?
- There are many ways to pass configuration to code running in a container:
- baking it into a custom image
- command-line arguments
- environment variables
- injecting configuration files
- exposing it over the Kubernetes API
- configuration servers
- Let's review these different strategies!
- Put the configuration in the image
(it can be in a configuration file, but also
ENV
orCMD
actions) - It's easy! It's simple!
- Unfortunately, it also has downsides:
- multiplication of images
- different images for dev, staging, prod ...
- minor reconfigurations require a whole build/push/pull cycle
Avoid doing
it unless you don't have the time to figure out other options
- Pass options to
args
array in the container specification - Example(source):
args:
- "--data-dir=/var/lib/etcd"
- "--advertise-client-urls=http://127.0.0.1:2379"
- "--listen-client-urls=http://127.0.0.1:2379"
- "--listen-peer-urls=http://127.0.0.1:2380"
- "--name=etcd"
- The options can be passed directly to the program that we run... ...or to a wrapper script that will use them to e.g. generate a config file
- Works great when options are passed directly to the running program (otherwise, a wrapper script can work around the issue)
- Works great when there aren't too many parameters
(to avoid a 20-lines
args
array) - Requires documentation and/or understanding of the underlying program ("which parameters and flags do I need, again?")
- Well-suited for mandatory parameters(without default values)
- Not ideal when we need to pass a real configuration file anyway
- Pass options through the
env
map in the container specification - Example:
env:
- name: ADMIN_PORT
value: "8080"
- name: ADMIN_AUTH
value: Basic
- name: ADMIN_CRED
value: "admin:0pensesame!"
value
must be a string! Make sure that numbers and fancy strings are quoted
🤔Why this weired {name: xxx, value: yyy}
scheme?It will be revealed soon!
- In the previous example, environment variables have fixed values
- We can also use a mechanism called the Downward API
- The Downward API allows exposing pod or container information
- either through special files(we won't show that for now)
- or through environment variables
- The value of these environment variables is computed when the container is started
- Remember: environment variables won't(can't) change after container start
- Let's see a few concrete examples!
- name: MY_POD_NAMESPACE
valueFrom:
fileRef:
fieldPath: metadata.namespace
- Useful to generate FQDN of services (in some contexts, a short name is not enough)
- For instance, the two commands should be equivalent:
curl api-backend
curl api-backend.$MY_POD_NAMESPACE.svc.cluster.local
- name: MY_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- Useful if we need to know our IP address
(we could also read it from
eth0
, but this is more solid)
- name: MY_MEM_LIMIT
valueFrom:
resouceFieldRef:
containerName: test-container
resource: limits.memory
- Useful for runtimes where memory is garbage collected
- Example: the JVM
(the memory available to the JVM should be set with the
-Xmx
flag) - Best practice:set a memory limit, and pass it to the runtime
- Note: recent versions of the JVM can do this automatically
(see
JDK-8146115
) andthis blog post
for detailed examples
- This documentation page tells more about these environment variables
- And
this one
explains the other way to use the Downward API (through files that get created in the container filesystem)
- Works great when the running program expects these variables
- Works great for optional parameters with reasonable defaults (since the container image can provide these defaults)
- Sort of auto-documented (we can see which environment variables are defined in the image, and their values)
- Can be (ab)used with longer values...
- ...You can put an entire Tomcat configuration file in an environment...
- ...But should you? (Do it if you really need to, we're not judging! Bu we'll see better ways.)
- Sometimes, there is no way around it: we need to inject a full config file
- Kubernetes provides a mechanism for that purpose:
ConfigMaps
- A ConfigMap is a Kubernetes resource that exits in a namespace
- Conceptually, it's a key/value map (values are arbitrary strings)
- We can think about them in(at least) two different ways:
- as holding entire configuration file(s)
- as holding individual configuration parameters
Note: to hold sensitive information, we can use "Secrets", which are another typ of.. resource behaving very much like ConfigMaps. We'll cover them just after!
- In this case, each key/value pair corresponds to a configuration file
- Key = name of the file
- Value = content of the file
- There can be one key/value pair, or as many as necessary (for complex apps with multiple configuration files)
- Examples:
# Create a ConfigMap with a single key, "app.conf"
kubectl create configmap my-app-config --from-file=app.conf
# Create a ConfigMap with a single key, "app.conf" but another file
kubectl create configmap my-app-config --from-file=app.conf=app-prod.conf
# Create a ConfigMap with multiple keys (one per fiel in the config.d directory)
kubectl create configmap my-app-config --from-file=config.d/
- In this case, each key/value pair corresponds to a parameter
- Key = name of the parameter
- Value = value of the parameter
- Examples:
# Create a ConfigMap with two keys
kubectl create cm my-app-config \
--from-literal=foreground=red \
--from-literal=foreground=blue
# Create a ConfigMap from a file containing key=val pairs
kubectl create cm my-app-config \
--from-env-file=app.conf
- ConfigMaps can be exposed as plain files in the filesystem of a container
- this is achieved by declaring a volume and mounting it in the container
- this is particularly effective for ConfigMaps containing whole files
- ConfigMaps can be exposed as environment variables in the container
- this is achieved with the Downward API
- this is particularly effective for ConfigMaps containing individual parameters
- Let's see how to do both!
- We will start a load balancer powered by HAProxy
- We will use the official haproxy image
- It expects to find its configuration in
/usr/local/etc/haproxy/haproxy.cfg
- We will provide a simple HAProxy configuration
- It listens on port 80, and load balances connections between IBM and Google
Exercise
- Download our simple HAProxy config:
curl -O https://k8smastery.com/haproxy.cfg
- Create a ConfigMap named
haproxy
and holding the configuration file:
kubectl create configmap haproxy --from-file=haproxy.cfg
- Check what our ConfigMap looks like:
kubectl get configmap haproxy -o yaml
- We are going to use the following pod definition:
apiVersion: v1
kind: Pod
metadata:
name: haproxy
spec:
volumes:
- name: config
configMap:
name: haproxy
containers:
- name: haproxy
image: haproxy
volumeMounts:
- name: config
mountPath: /usr/local/etc/haproxy/
- Apply the resource definition from the previous slide
Exercise
- Create the HAProxy pod:
kubectl apply -f https://k8smastery.com/haproxy.yaml
- Check the IP address allocated to the pod, inside
shpod
:
kubectl attach --namespace=shpod -ti shpod
kubectl get pod haproxy -o wide
IP=$(kubectl get pod haproxy -o json | jq -r .status.podIP)
- The load balancer will send:
- half of the connections to Google
- the other half to IBM
Exercise
- Access the load balancer a few times:
curl $IP
curl $IP
curl $IP
We should see connections served by Google, and others served by IBM. (Each server sends us a redirect page. Look at the URL that they send us to!)
- We are going to run a Docker registry on a custom port
- By default, the registry listens on port 5000
- This can be changed by setting environment variable
REGISTRY_HTTP_ADDR
- We are going to store the port number in a ConfigMap
- Then we will expose that ConfigMap as a container environment variable
Exercise
- Our ConfigMap will have a single key,
http.addr
:
kubectl create configmap registry --from-literal=http.addr=0.0.0.0:80
- Check our ConfigMap:
kubectl get configmap registry -o yaml
- We are going to use the following pod definition:
apiVersion: v1
kind: Pod
metadata:
name: registry
spec:
containers:
- name: registry
image: registry
env:
- name: REGISTRY_HTTP_ADDR
valueFrom:
configMapKeyRef:
name: registry
key: http.addr
- The resource definition from the previous slide:
Exercise
- Create the registry pod:
kubectl apply -f https://k8smastery.com/registry.yaml
- Check the IP address allocated to the pod
kubectl attach --namespace=shpod -ti shpod
kubectl get pod registry -o wide
IP=$(kubectl get pod registry -o json | jq -r .status.podIP)
- Confirm that the registry is available on port 80:
curl $IP/v2/_catalog
- For sensitive information, there is another special resource: Secrets
- Secrets and Configmaps work almost the same way (we'll expose the differences on the next slide)
- The intent is different, though:
"You should use secrets for things which are actually
secret like API keys, credentials, etc.and use config map
for not-secret configuration data."
"In the future there will likely be some differentiators
for secrets like rotation or support for backing the
secret API w/HSMs, etc."
(Source: the author of both features)
- Secrets are base64-encoded when shown with
kubectl get secrets -o yaml
- keep in mind that this is just encoding, not encryption
- it is very easy to automatically extract and decode secrets
- Secrets can be encrypted at rest
- With RBAC, we can authorize a user to access ConfigMaps, but not Secrets (since they are two different kinds of resources)
Let's cleanup before we start the next lecture!
Exercise
- remove our pods:
kubectl delete pod/haproxy pod/registry
- Services give us a way to access a pod or a set of pods
- Services can be exposed to the outside world:
- with type
NodePort
(on a port > 30000) - with type
LoadBalancer
(allocating an external load balancer)
- with type
- What about HTTP services?
- how can we expose
webui
,rng
,hasher
? - the Kubernetes dashboard?
- all on the same IP and port?
- how can we expose
- If we use
NodePort
services, clients have to specify port numbers (i.e. http://xxxxx:31234 instead of just http://xxxxx) LoadBalancer
services are nice, but:- they are not available in all environments
- they often carry an additional cost(e.g they provision an ELB)
- they often work at OSI Layer 4(IP+Port) and not Layer 7(HTTP/S)
- they require one extra step for DNS integration
(waiting for the
LoadBalancer
to be provisioned;then adding it to DNS)
- We could build our own reverse proxy
- There are many options available: Apache, HAProxy, Envoy Proxy, Gloo, NGINX, Traefik,...
- Most of these options require us to update/edit configuration files after each change
- Some of them can pick up virtual hosts and backends from a configuration store
- Wouldn't it be nice if this configuration could be managed with Kubernetes API?
- Enter Ingress resources!
ingress
- ingress definition: Going in, entering. The opposite of egress(leaving)
- In networking terms, ingress refers to handling incomming connections
- Could imply incoming to firewall, network, or in this case, a server cluter
Ingress
- Ingress(capital 1)in these slides means the Kubernetes Ingress resource
- Specific to HTTP/S
- Kubernetes API resource(
kubectl get ingress/ingresses/ing
) - Designed to expose HTTP services
- Basic features:
- load balancing
- SSL termination
- name-based virtual hosting
- Can also route to different services depending on:
- URI path(e.g
/api
->api-service
,static
->assets-service) - Client headers, including cookies(for A/B testing, canary deployment...)
- and more
- URI path(e.g
- Step 1:deploy an Ingress controller
- Ingress controller = load balancing proxy + control loop
- the control loop watches over Ingress resources, and configures the LB accordingly
- these might be two separate processes(NGINX server + NGINX Ingress controller)
- or a single app that knows how to speak to Kubernetes API(Traefik)
- Step 2: set up DNS(usually)
- associate external DNS entries with the load balancer or host address
- Step 3: create Ingress resources for our Services resources
- these resources contain rules for handling HTTP/S connections
- the Ingress controller picks up these resources and configures the LB
- connections to the Ingress LB will be processed by the rules
- We will deploy the NGINX Ingress controller first
- this is a popular, yet arbitrary choice, the
docs
list over a dozen options
- this is a popular, yet arbitrary choice, the
- For DNS, we will use
nip.io
*.127.0.0.1.nip.io
resolves to127.0.0.1
- we do this so we can use various FQDN's without editing our
hosts
file
- We will create Ingress resources for various HTTP-based Services
- We want our Ingress load balancer to be available on port 80
- We could do that with a
LoadBalancer
service- ...but it requires support from the underlying infrastructure
- minikube and MicroK8s don't work with it
- ...but Docker Desktop supports it for
localhost
!
- We could use pods specifying
hostPort: 80
- ...but with most CNI plugins, this
doesn't work or requires additional setup
- ...but with most CNI plugins, this
- We could use a
NodePort
service- ...but that requires
changing the --service-node-port-range
flag in the API service
- ...but that requires
- Last resort: the
hostNetwork
mode
- Normally, each pod gets its own network namespace (sometimes called sandbox or network sandbox)
- An IP address is assigned to the pod
- This IP address is routed/connected to the cluster network
- All containers of that pod are sharing that network namespace (and therefore using the same IP address)
- No network namespace gets created
- The pod is using the network namespace of the host
- It "sees" (and can use) the interfaces(and IP addresses) of the host(VM on macOS/Win)
- The pod can receive outside traffic directly, on any port
- Downside: with most network plugins, network polices won't work for that pod
- most network policies work at the IP address level
- filtering that pod = filtering traffic from the node
- Docker Desktop
- no built-in ingress installer, we'll provide you YAML
- Ignores
hostNetwork
, but Servicetype: LoadBalancer
works withlocalhost
!
- minikube
- has a built-in NGINX installer
minikube addons enable ingress
- But, let's use YAML we provide for learning purposes
hostNetwork: true
enabled on pod works for minikube IP
- has a built-in NGINX installer
- MicroK8s:
- has a built-in NGINX installer
microk8s.enable ingress
- let's use YAML we provide anyway for learning purposes
hostNetwork: true
enabled on pod works for MicroK8s host IP
- has a built-in NGINX installer
- Now let's deploy the NGINX controller. Pick your distro:
Exercise
- Apply the YAML
# for Docker Desktop, create Service with LoadBalancer
kubectl apply -f https://k8smastery.com/ic-nginx-lb.yaml
# for minikube/MicroK8s, create Service with hostNetwork
kubectl apply -f https:/k8smastery.com/ic-nginx-hn.yaml
- Check the pod Status
kubectl describe -n ingress-nginx deploy/nginx-ingress-controller
kubectl get all -n ingress-nginx
- We need the YAML templates from the
kubernetes/ingress-nginx
project
The two main sections in the YAML are:
- NGINX Deployment(or DaemonSet) and all its required resources
- Namespace
- ConfigMaps(storing NGINX configs)
- ServicesAccount(authenticate to Kubernetes API)
- Role/ClusterRole/RoleBindings(authorization to API parts)
- LimitRange(limit cpu/memory of NGINX)
Service to
expose NGINX on 80/443- different for each Kubernetes distribution
kubectl apply -f https://k8smastery.com/ic-nginx-lb.yaml
kubectl describe -n ingress-nginx deploy/nginx-ingress-controller
kubectl get all -n ingress-nginx
- If NGINX started correctly, we now have a web server listening on each node
Exercise
- Direct your browser to your Kubernetes IP on port 80
We should get a 404 page not found
error.
This is normal: we haven't provided any Ingress rule yet.
- To make our lives easier, we will use
nip.io
- Check out
http://cheddar.A.B.C.D.nip.io
(replacing A.B.C.D with the IP address of your Kubernetes IP) - We should get the same
404 page not found
error (meaning that our DNS is "set up properly" so to speak!)
cheddar.127.0.0.1.nip.io
- We are going to use
errm/cheese
images (there are3 tags available
: wensleydale, cheddar, stilton) - These images contain a simple static HTTP server sending a picture of cheese
- We will run 3 deployments(one for each cheese)
- We will create 3 services(one for each deployment)
- Then we will create 3 ingress rules(one for each service)
- We will route
<name-of-cheese>.A.B.C.D.nip.io
to the corresponding deployment
Exercise
- Run all three deployments:
kubectl create deployment cheddar --image=errm/cheese:cheddar
kubectl create deployment stilton --image=errm/cheese:stilton
kubectl create deployment wensleydale --image=errm/cheese:wensleydale
- Create a service for each of them:
kubectl expose deployment cheddar --port=80
kubectl expose deployment stilton --port=80
kubectl expose deployment wensleydale --port=80
Here is a minimal host-based ingress resource:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: cheddar
spec:
rules:
- host: cheddar.A.B.C.D.nip.io
http:
paths:
- path: /
backend:
serviceName: cheddar
servicePort: 80
(In 1.14 ingress moved from the extensions API to networking)
Exercise
- Download our YAML
curl -O https://k8smastery.com/ingress.yaml
- Edit the file
ingress.yaml
which has three Ingress resources - Replace the A.B.C.D with your Kubernetes IP (
127.0.0.1
forlocalhost
) - Apply the file
kubectl apply -f ingress.yaml
- Open
http://cheddar.A.B.C.D.nip.io
(An Image of a piece of cheese should show up.)
Exercise
-
Open
http://stilton.A.B.C.D.nip.io
-
Open
http://wensleydale.A.B.C.D.nip.io
-
Different cheeses should show up for each URL
http://cheddar.127.0.0.1.nip.io
http://stilton.127.0.0.1.nip.io
http://wensleydale.127.0.0.1.nip.io
- Reverse proxies have lots of features
- Let's add a 301 redirect to a new Ingress resource using annotations
- It will apply when any other path is used in URL that we didn't already add
Exercise
- Create a redirect
kubectl apply -f https://k8smastery.com/redirect.yaml
- Open http://.A.B.C.D.nip.io or localhost or A.B.C.D
- It should immediately redirect to google.com
http://somethingelse.127.0.0.1.nip.io
- Let's inspect some ingress resources
Exercise
- List all Ingress resources in the
default
namespace
kubectl get ingress
- Get the details on the
cheddar
Ingress resource
kubectl describe ingress cheddar
- Get the details on the
my-google
Ingress resource
kubectl describe ingress my-google
- Output
stilton
in YAML
kubectl get ingress/stilton -o yaml
- Traefik is a proxy with built-in Kubernetes Ingress support
- It has a web dashboard, built-in Let's Encrypt, full TCP support, and more
- Most importantly: Traefik releases are named after cheeses
- The
Traefik documentaion
tells us to pick between Deployment and DaemonSet - We are going to use a DaemonSet so that each node can accept connections
- We provide a YAML file which is essentially the sum of:
Traefik's DaemonSet resources
(patched withhostNetwork
and tolerations)Traefik's RBAC rules
allowing it to watch necessary API objects
- We will make a minor change to the
YAML provided by Traefik
to enablehostNetwork
for MicroK8s/minikube - For Docker Desktop we'll add a
type: LoadBalancer
to the Service
- Before starting Traefik, let's remove the NGINX controller
- This won't remove Services or Ingress resources
- But it will make them unavailable from outside the cluster
Exercise
- Delete our NGINX controller and related resources:
# for Docker Desktop, create Service with LoadBalancer
kubectl delete -f https://k8smastery.com/ic-nginx-lb.yaml
# for minikube/MicroK8s, create Service with hostNetwork
kubectl delete -f https:/k8smastery.com/ic-nginx-hn.yaml
- Also remove the redirect Ingress resource. It only worked in NGINX
kubectl delete -f https:/k8smastery.com/redirect.yaml
- Now let's deploy the Traefik Ingress controller
Exercise
- Apply the YAML
# for Docker Desktop with LoadBalancer
kubectl apply -f https://k8smastery.com/ic-traefik-lb.yaml
# kubectl delete -f https://k8smastery.com/ic-traefik-lb.yaml
# for minikube/MicroK8s hostNetwork
kubectl apply -f https:/k8smastery.com/ic-traefik-hn.yaml
- Check the pod Status
kubectl describe -n kube-system ds/traefik-ingress-controller
kubectl get all -n kube-system
- If Traefik started correctly, we can refresh a cheese and it still works
Exercise
- Refresh
http://cheddar.A.B.C.D.nip.io
http://cheddar.127.0.0.1.nip.io
http://stilton.127.0.0.1.nip.io
http://wensleydale.127.0.0.1.nip.io
- Traefik provides a web dashboard on container port 8080
- For those using the
LoadBalancer
method(Docker Desktop), it's enabled
Exercise
-
if using Docker Desktop, go to
http://localhost:8080
-
For those use
hostNetwork
, this could be a problem -
The container won't start if anything is listening on
<host IP>:8080
-
On MicroK8s, Kubernetes API runs on 8080 😂
-
For those using minikube, you can un-comment the YAML and re-apply
-
You could also edit the resource(s) and manually add the details, e.g.
kubectl edit -n kube-system ds/traefik-ingress-controller
- We've been using Traefik 2.x as the Ingress controller
- Traefik released 2.0 in late 2019
- Their documentation talks about IngressRoute resource
- But IngressRoute is not a build-in resource of Kubernetes
- Traefik 2.x now supports a custom CRD(Custom Resource Definition)
- We'll explore why in a bit
- You can have multiple ingress controllers active simultaneously (e.g. Traefik, Gloo, and NGINX)
- You can even have multiple instances of the same controller (e.g. one for internal, another for external traffic)
- The
kubernetes.io/ingress.class
annotation can be used to tell which one to use - It's OK if multiple ingress controllers configure the same resource (it just means that the service will be accessible through multiple paths)
- The traffic flows directly from the ingress load balancer to the backends
- it doesn't need to go through the
ClusterIP
- in fact, we don't even need a
ClusterIP
(we can use a headless service)
- it doesn't need to go through the
- The load balancer can be outside of Kubernetes (as long as it has access to the cluster subnet)
- This allows the use of external(hardware, physical machines...) load balancers
- Annotations can encode special features (rate-limiting, A/B testing, session stickiness, etc.)
- Aforementioned "special features" are not standardized yet
- Some controller will support them; some won't
- Even relatively common features(stripping a path prefix) can differ:
traefik.ingress.kubernetes.io/rule-type: PathPrefixStrip
Ingress.kubernetes.io/rewrite-target: /
- This should eventually stabilize
(remember that ingresses are currently
apiVersion: networking.k8s.io/v1beta1
) - Annotations are not validated in CLI
- Some proxies provide a CRD(Custom Resource Definition) option
- You need features beyond Ingress including:
- TCP support, traffic splitting, mTLS, egress, service mesh
- response transformation, routing to 2+ services
- You have external load balancers(like AWS ELBs) which route to NodePorts
- You don't need externally available HTTP services on the default ports
- Due to the limits of the built-in ingress, many projects are moving to CRD's
- For example, Traefik 2.x has a IngressRoute CRD option
- Ambassador, a controller for Envoy proxy, uses a Mapping CRD
- These CRD proxy options do ingress plus more (sometimes called API Gateways):
- TCP Support(anything beyond HTTP/HTTPS)
- Traffic splitting, rate limiting, circuit breaking, etc
- Complex traffic routing, request and response transformation
- Once we consider CRD's, many more proxy options are available:
- Envoy Proxy based (Gloo, Ambassador, Contour)
- Other �Proxies(Tyk, Traefik, Kong, KrakenD)
- Eventually, some more advanced features might be added to "Ingress Resources"
- We'll cover more after we learn about CRD's and Operators
Let's cleanup before we start the next lecture!
Exercise
- remove our ingress controller:
# for Docker Desktop with LoadBalancer
kubectl delete -f https://k8smastery.com/ic-traefik-lb.yaml
# for minikube/MicroK8s with hostNetwork
kubectl delete -f https:/k8smastery.com/ic-traefik-hn.yaml
- remove our ingress resources:
kubectl delete -f ingress.yaml
kubectl delete -f https:/k8smastery.com/redirect.yaml
- remove our cheeses
kubectl delete svc/cheddar svc/stilton svc/wensleydale
kubectl delete deploy/cheddar deploy/stilton deploy/wensleydale
- An Image is the application we want to run
- A Container is an instance of that image running as a process
- You can have many containers running off the same image
- In this lecture our image will be the Nginx web server
- Docker's default image "registry" is called Docker Hub
- (hub.docker.com)
docker container run --publish 80:80 nginx
# ctrl + c
# docker container run --publish 8080:80 nginx
# docker container run --publish 8080:80 --detach nginx
# docker container ls
# docker container stop 690
# docker container ls -a
- Downloaded image 'nginx' from Docker Hub
- Started a new container from that image
- Opened port 80 on the host IP
- Routes that traffic to the container IP, port 80
Note you'll get a "bind" error if the left number [host port]
is being used by anthing else, even another container.
You can use any port you want on the left, like 8080:80
or 8888:80,then use localhost:8888 when testing
docker container run --publish 80:80 --detach --name webhost nginx
docker container ls -a
docker container logs webhost
docker container top webhost
docker container --help
- Looks for that image locally in image cache, doesn't find anything
- Then looks in remote image repository(defaults to Docker Hub)
- Downloads the latest version(nginx:latest by default)
- Creates new container based on that image and prepares to start
- Give it a virtual IP on a private network inside docker engine
- Opens up port 80 on host and forwards to port 80 in container
- Starts container by using the CMD in the image Dockerfile
docker container run --publish 8080:80 --name webhost -d nginx:1.11 nginx -T
# 8080 -> change host listening port
# 1.11 -> change version of image
# nginx -T -> change CMD run on start
- They are just processes
- Limited to what resources they can access
- Exit when process stops
docker run --name mongo -d mongo
docker ps
docker top mongo
ps aux
docker stop mongo
ps aux | grep mongo
docker start mongo
docker ps
docker top mongo
- docs.docker.com and
--help
are your friend - Run a
nginx
, amysql
, and ahttpd
(apache) server - Run all of the
--detach
(or-d
), name them with--name
- nginx should listen on
80:80
, httpd on8080:80
, mysql on3306:3306
- When running
mysql
, use the--env
option(or-e
) to pass inMYSQL_RANDOM_ROOT_PASSWORD=yes
- Use
docker container logs
on mysql to find the random password it created on startup - Clean it all up with
docker container stop
anddocker container rm
(both can accept multiple names or ID's) - Use
docker container ls
to ensure everything is correct before and after cleanup
- This lecture is ANSWERS to previous assignment
- Try the previous lecture (Assignment details) first on your own before watching this video
- Forcing yourself to figure this assignment out by sudying commands, docs.docker.com, etc. will help this stuff stick in your brain better then just watching me do it!
docker container run -d -p 3306:3306 --name db -e MYSQL_RANDOM_ROOT_PASSWORD=yes mysql
docker container logs db
# find & copy-> GENERATED ROOT PASSWORD: Aekia1fiighaengeethe0eihai2ailij
docker container run -d --name webserver -p 8080:80 httpd
docker ps
docker container run -d --name proxy -p 80:80 nginx
docker container ls
curl localhost
curl localhost:8080
docker container stop nginx httpd mysql
docker container stop # tap completion
docker container ls -a
docker container rm # tap completion again
docker container rm 4f444c35ebf7 8f9ccb6b0553 0827cf019e82 bb7958b14a40
docker ps -a
docker image ls
- docker container top
- process list in one container
- docker container inspect
- details of one container config
- docker container stats
- performance stats for all containers
docker container run -d --name nginx nginx
docker container run -d --name mysql -e MYSQL_RANDOM_ROOT_PASSWORD=true mysql
docker container ls
docker container top mysql
docker container top nginx
docker container inspect mysql
docker container stats --help
docker container stats
docker container run -it
- start new container interactivelydocker container exec -it
- run additional command in existing container- Different Linux distros in containers
docker container run -it --name proxy nginx bash
exit
docker container ls -a
docker container run -it --name ubuntu ubuntu
apt-get update
apt install -y curl
curl baidu.com
exit
docker container ls -a
docker container start --help
docker container start -ai ubuntu
curl google.com
exit
docker container exec --help
docker container exec -it mysql bash
ps aux
exit
docker container ls
docker pull alpine
docker container run -it alpine sh
apk
- Review of
docker container run -p
- For local dev/testing, networks usually "just work"
- Quick port check with
docker container port <container>
- Learn concepts of Docker Networking
- Understand how network packets move around Docker
- Each container connected to a private virtual network "bridge"
- Each virtual network routes through NAT firewall on host IP
- All containers on a virtual network can talk to each other without -p
- Best practice is to create a new virtual network for each app:
- network "my_web_app" for mysql and php/apache containers
- network "my_api" for mongo and nodejs containers
- "Batteries Included, But Removable"
- Defaults work well in many cases, but easy to swap out parts to customize it
- Make new virtual networks
- Attach containers to more then one virual network(or none)
- Skip virual networks and use host IP(--net=host)
- Use different Docker network drivers to gain new abilities
- and much more...
docker container run -p 80:80 --name webhost -d nginx
docker container port webhost
docker container inspect --format '{{ .NetworkSettings.IPAddress }}' webhost
ifconfig en0
- Show networks
docker network ls
- Inspect a network
docker network inspect
- Create a network
docker network create --driver
- Attach a network to container
docker network connect
- Detach a network from container
docker network disconnect
docker network ls
docker network inspect bridge
docker network create my_app_net
docker network ls
docker network create --help
docker container run -d --name new_nginx --network my_app_net nginx
docker network inspect my_app_net
docker network connect 21c7d8642536 b3f503669e05
docker container inspect b3f503669e05
docker network disconnect 21c7d8642536 b3f503669e05
docker container inspect b3f503669e05
- Create your apps so frontend/backend sit on same Docker network
- Their inter-communication never leaves host
- All externally exposed ports closed by default
- You must manually expose via
-p
, which is better default security! - This gets even better later with Swarm and Overlay networks
- Understand how DNS is the key to easy inter-container comms
- See how it works by default with custom networks
- Learn how to use
--link
to enable DNS on default bridge network
docker container ls
docker network inspect my_app_net
docker container run -d --name my_nginx --network my_app_net nginx
docker network inspect my_app_net
docker container exec -it my_nginx ping new_nginx
- Containers shouldn't rely on IP's for inter-communication
- DNS for friendly names is built-in if you use custom networks
- You're using custom networks right?
- This gets way easier with Docker Compose in future Section
- Know how to use
-it
to get shell in container - Understand basics of what a Linux distributions is like Ubuntu and CentOS
- Know how to run a container🐻
- Use different Linux distro containers to check
curl
cli tool version - Use two different terminal windows to start bash in both
centos:7
andubuntu:14.04
, using-it
- Learn the
docker container --rm
option so you can save cleanup - Ensoure
curl
is installed and on latest version for that distro- ubuntu:
apt-get update && apt-get install curl
- centos:
yum update curl
- ubuntu:
- Check
curl --version
#t1
docker container run --rm -it centos:7 bash
yum update curl -y
#t2
docker ps -a
docker container run --rm -it ubuntu:14.04 bash
apt-get update && apt-get install -y curl
curl --version
- Know how to use -it go get shell in container
- Understand basics of what a Linux distribution is like Ubuntu and CentOS
- Know how to run a container
- Understand basics of DNS records
- Ever since Docker Engine 1.11, we can have multiple containers on a created network respond to the same DNS address
- Create a new virtual network(default bridge driver)
- Create two containers from
elasticsearch:2
image - Research and use
-network-alias search
when creating them to give them an additional DNS name to respond to - Run
alpine nslookup search
with--net
to see the two containers list for the same DNS name - Run
centos curl -s search:9200
with--net
multiple times until you see both "name" fields show
docker network create dude
docker container run -d --net dude --net-alias search elasticsearch:2
# --net-alias and --network-alias both work
docker container run --rm --net dude alpine nslookup search
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
docker container run --rm --net dude centos curl -s search:9200
### Test DNS Round Robin
docker container ls
docker container rm -f kind_darwin fervent_jackson
- All about images, the building blocks of containers
- What's in an image (and what isn't)
- Using Docker Hub registry
- Managing our local image cache
- Building our own images
- App binaries and dependencies
- Metadata about the image data and how to run the image
- Official definition: "An Image is an ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime"
- Not a complete OS. No kernel, kernel modules(e.g. drivers)
- Small as one file(your app binary) like a golang static binary
- Big as a Ubuntu distro with apt, and Apache, PHP, and more installed
- Basics of Docker Hub(hub.docker.com)
- Find Official and other good public images
- Download images and basics of image tags
- Docker Hub, "the apt package system for Containers"
- Official images and how to use them
- How to discern "good" public images
- Using different base images like Debian or Alpine
- Image layers
- union file system
history
andinspect
commands- copy on write
docker image ls
docker history nginx:latest
# Oops that's old command format
docker history mysql
docker image inspect
# docker inspect (old way)
# returns JSON metadata about the image
docker image inspect nginx
- Images are made up of file system changes and metadata
- Each layer is uniquely identified and only stored once on a host
- This saves storage space on host and transfer time on push/pull
- A container is just a single read/write layer on top of image
docker image history
andinspect
commands can teach us
- Know what container and images are
- Understand image layer basics
- Understand Docker Hub basics
- All about image tags
- How to upload to Docker Hub
- Image ID vs. Tag
docker image tag --help
docker image ls
docker pull mysql/mysql-server
docker pull nginx:mainline
docker image tag nginx lotteryjs/nginx
docker image tag --help
docker image ls
docker login
cat ~/.docker/config.json
docker image push lotteryjs/nginx
docker image tag lotteryjs/nginx lotteryjs/nginx:testing
docker image ls
docker image push lotteryjs/nginx:testing
- Properly tagging images
- Tagging images for upload to Docker Hub
- How tagging is related to image ID
- The Latest Tag
- Logging into Docker Hub from docker cli
- How to create private Docker Hub images
e.g. docker build -f some-dockerfile
cd dockerfile-sample-1
docker image build -t customnginx .
cd dockerfile-sample-2
ll
vim Dockerfile
docker container run -p 80:80 --rm nginx
docker image build -t nginx-with-html .
docker container run -p 80:80 --rm nginx-with-html
docker image ls
docker image tag --help
docker image tag nginx-with-html:latest lotteryjs/nginx-with-html:latest
docker image ls
- Dockerfiles are part process workflow and part art
- Take existing Node.js app and Dockerize it
- Make
Dockerfile
. Build it. Test it. Push it. (rm it). Run it. - Expect this to be iterative. Rarely do I get it right the first time.
- Details in
dockerfile-assignment-1/Dockerfile
- Use the Alpine version of the official 'node' 6.x image
- Expected result is web site at http://localhost
- Tag and push to your Docker Hub account(free)
- Remove your image from local cache, run again from Hub
cd dockerfile-assignment-1
ll
vim Dockerfile
# Instructions from the app developer
# - you should use the 'node' official image, with the alpine 6.x branch
FROM node:6-alpine
# - this app listens on port 3000, but the container should launch on port 80
# so it will respond to http://localhost:80 on your computer
EXPOSE 3000
# - then it should use alpine package manager to install tini: 'apk add --update tini'
RUN apk add --update tini
# - then it should create directory /usr/src/app for app files with 'mkdir -p /usr/src/app'
RUN mkdir -p /usr/src/app
# - Node uses a "package manager", so it needs to copy in package.json file
WORKDIR /usr/src/app
COPY package.json package.json
# - then it needs to run 'npm install' to install dependencies from that file
RUN npm install && npm cache clean
# - to keep it clean and small, run 'npm cache clean --force' after above
# - then it needs to copy in all files from current directory
COPY . .
# - then it needs to start container with command '/sbin/tini -- node ./bin/www'
CMD [ "tini", "--", "node", "./bin/www" ]
# - in the end you should be using FROM, RUN, WORKDIR, COPY, EXPOSE, and CMD commands
docker build -t testnode .
docker container run --rm -p 80:3000 testnode
docker images
docker tag --help
docker tag testnode lotteryjs/testing-node
docker push --help
docker push lotteryjs/testing-node
docker image rm lotteryjs/testing-node
docker container run --rm -p 80:3000 lotteryjs/testing-node
- Defining the problem of persistent data
- Key concepts with containers: immutable, ephemeral
- Learning and using Data Volumes
- Learning and using Bind Mounts
- Assignments
- Containers are usually immutable and ephemeral
- "immutable infrastructure":only re-deploy containers, never change
- This is the ideal scenario, but what about databases, or unique data?
- Docker gives us features to ensure these "separation fo concerns"
- This is known as "persistent data"
- Two ways: Volumes and Bind Mounts
- Volumes: make special location outside of container UFS
- Bind Mounts: link container path to host path
- VOLUME command in Dockerfile
docker pull mysql
# Note you might want to do a `docker volume prune` to
# cleanup unused volumes and make it easier to see what
# you're doing here
docker image inspect mysql
docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql
docker container inspect mysql
docker volume inspect bd2dd9f5ecabc7d9bd6779436b8fdbee1426613601e24ef7d94ef77b497b7c1c
docker container run -d --name mysql2 -e MYSQL_ALLOW_EMPTY_PASSWORD=True mysql
docker volume ls
docker container stop mysql
docker container stop mysql2
docker container ls
docker container ls -a
docker volume ls
docker container rm mysql mysql2
docker volume ls
docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql mysql
docker volume ls
docker volume inspect mysql-db
docker container rm -f mysql
docker container run -d --name mysql3 -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql mysql
docker volume ls
docker container inspect mysql3
docker volume create
- required to do this before "docker run" to use custom drivers and labels
docker volume create --help
- Maps a host file or directory to a container file or directory
- Basically just two locations pointing to the same file(s)
- Again, skips UFS, and host files overwrite any in container
- Can't use in Dockerfile, must be at
container run
... run -v /Users/bret/stuff:/path/container
(mac/linux)... run -v //c/Users/bret/stuff:/path/container
(windows)
cat dockerfile-sample-2
ll
pcat Dockerfile
docker container run -d --name nginx -p 80:80 -v $(pwd):/usr/share/nginx/html nginx
docker container run -d --name nginx2 -p 8080:80 nginx
ll
# t2
docker container exec -it nginx bash
cd /usr/share/nginx/html
ls -al
# t1
touch testme.txt
# t2
ls -al
# t1
echo "is it me you're looking for" > testme.txt
- Database upgrade with containers
- Create a
postgres
container with named volume psql-data using version9.6.1
- Use Docker Hub to learn
VOLUME
path and versions needed to run it - Check logs, stop container
- Create a new
postgre
container with same named volume using9.6.2
- Check logs to validate
- (this only works with patch versions, most SQL DB's require manual commands to upgrade DB's to major/minor versions,i.e. it's DB limitation not a container one)
docker container run -d --name psql -v psql:/var/lib/postgresql/data postgres:9.6.1
docker container logs -f psql
docker container stop psql
docker container run -d --name psql2 -v psql:/var/lib/postgresql/data postgres:9.6.2
docker container ps -a
docker volume ls
docker container logs psql2
- Use a Jekyll "Static Site Generator" to start a local web server
- Don't have to be web developer: this is example of bridging the gap between local file access and apps running in containers
- source code is in the course repo under
bindmount-sample-1
- We edit files with editor on our host using native tools
- Container detects changes with host files and updates web server
- start container with
docker run -p 80:4000 -v $(pwd):/site bretfisher/jekyll-serve
- Refresh our browser to see changes
- Change the file in
_posts\
and refresh browser to see changes
cd bindmount-sample-1
ls -la
docker run -p 80:4000 -v $(pwd):/site bretfisher/jekyll-serve
- Why: configure relationships between containers
- Why: save our docker container run settings in easy-to-read file
- Why: create one-liner developer environment startups
- Comprised of 2 separate but related things
-
- YAML-formatted file that describes our solution options for:
- containers
- networks
- volumes
-
- A CLI tool
docker-compose
used for local dev/test automation with those YAML files
- A CLI tool
- Compose YAML format has it's own versions: 1, 2, 2.1, 3, 3.1
- YAML file can be used with
docker-compose
command for local docker automation or... - With
docker
directly in production with Swarm(as of v1.13) docker-compose --help
docker-compose.yml
is default filename, but any can be used withdocker-compose -f
cd compose-sample-2
- CLI tools comes with Docker for Windows/Mac, but separate download for Linux
- Not a production-grade tool but ideal for local development and test
- Two most common commands are
- docker-compose up # setup volumes/networks and start all containers
- docker-compose down # stop all containers and remove con/vol/net
- If all your projects had a
Dockerfile
anddocker-compose.yml
then "new developer onboarding" would be:- git clone github.com/some/software
- docker-compose up
cd compose-sample-2
ls -la
cat docker-compose.yml
docker-compose up
# browser -> http://localhost
# ctrl + c
docker-compose up -d
docker-compose logs
docker-compose --help
docker-compose ps
docker-compose top
docker-compose down
- Build a basic compose file for a Drupal content management system website. Docker Hub is your friend
- Use the
drupal
image along with thepostgres
image - Use
ports
to expose Drupal on 8080 so you canlocalhost:8080
- Be sure to set
POSTGRES_PASSWORD
for postgres - Walk though Drupal setup via browser
- Tip: Drupal assumes DB is
localhost
, but it's service name - Extra Credit: Use volumes to store Drupal unique data
docker pull drupal
docker image inspect drupal
version: '2'
services:
drupal:
image: drupal:8.2
ports:
- "8080:80"
volumes:
- drupal-modules:/var/www/html/modules
- drupal-profiles:/var/www/html/profiles
- drupal-sites:/var/www/html/sites
- drupal-themes:/var/www/html/themes
postgres:
image: postgres:9.6
environment:
- POSTGRES_PASSWORD=mypasswd
volumes:
drupal-modules:
drupal-profiles:
drupal-sites:
drupal-themes:
docker-compose down -v
# Remove named volumes declared in the `volumes`
# section of the Compose file and anonymous volumes
# attached to containers.
- Compose can also build your custom images
- Will build them with
docker-compose up
if not found in cache - Also rebuild with
docker-compose build
- Great for complex builds that have lots of vars or build args
cd compose-sample-3/
docker-compose up
# localhost
docker-compose down
docker image ls
docker-compose down --help
docker image rm nginx-custom
docker image ls
# remove -> image: nginx-custom
docker-compose up -d
docker image ls
docker-compose down --rmi local
- "Building custom
drupal
image for local testing" - Compose isn't just for developers. Testing apps si easy/fun!
- Maybe your learning Drupal admin, or are a software tester
- Start with Compose file from previous assignment
- Make your
Dockerfile
anddocker-compose.yml
in dircompose-assignment-2
- Use the
drupal
image along with thepostgres
image as before - Use
README.md
in that dir for details
cd compose-assignment-2
Dockerfile
FROM drupal:8.6
RUN apt-get update && apt-get install -y git \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /var/www/html/themes
RUN git clone --branch 8.x-3.x --single-branch --depth 1 https://git.drupal.org/project/bootstrap.git \
&& chown -R www-data:www-data bootstrap
WORKDIR /var/www/html
docker-compose.yml
version: '2'
# NOTE: move this answer file up a directory so it'll work
services:
drupal:
container_name: drupal
image: custom-drupal
ports:
- "8080:80"
volumes:
- drupal-modules:/var/www/html/modules
- drupal-profiles:/var/www/html/profiles
- drupal-sites:/var/www/html/sites
- drupal-themes:/var/www/html/themes
postgres:
container_name: postgres
image: postgres:9.6
environment:
- POSTGRES_PASSWORD=mypasswd
volumes:
- drupal-data:/var/lib/postgresql/data
volumes:
drupal-data:
drupal-modules:
drupal-profiles:
drupal-sites:
drupal-themes:
- Templates
- 3 Managers and 2 Workers
docker service create --name hello --replicas 3 --detach=false --publish 8080:80 nginx
- How do we automate container lifecycle?
- How can we easily scale out/in/up/down?
- How can we ensure our containers are re-created if they fail?
- How can we replace containers are-created if they fail?
- How can we control/track where containers get started?
- How can we create cross-node virtual networks?
- How can we ensure only trusted servers run our containers?
- How can we store secrets, keys, passwords and get them to the right container(and only that container)?
- Swarm Mode is a clustering solution built inside Docker
- Not related to Swarm "classic" for pre-1.12 versions
- Added in 1.12(Summer 2016)via SwarmKit toolkit
- Enhanced in 1.13(January 2017) via Stacks and Secrets
- Not enabled by default, new commands once enabled
- docker swarm
- docker node
- docker service
- docker stack
- docker secret
- Lots of PKI and security automation
- Root Signing Certificate created for our Swarm
- Certificate is issued for first Manager node
- Join tokens are created
- Raft database created to store root CA, configs and secrets
- Encrypted by default on disk(1.13+)
- No need for another key/value system to hold orchestration/secrets
- Replicates logs amongst Managers via mutual TLS in "control plane"
docker swarm init
docker node ls
# ...MANAGER STATUS
# ...Leader
docker node help
docker swarm help
docker service help
docker service create alpine ping 8.8.8.8
docker service ls
docker service ps <NAME(service)>
docker container ls
docker service update <ID(service)> --replicas 3
docker service ls
docker service ps <NAME(service)>
docker update --help
docker service update --help
docker container ls
docker container rm -f <name>.1.<ID>
docker service ls
docker service ps <NAME(service)>
docker service rm <NAME(service)>
docker service ls
docker container ls
- A.
play-with-docker.com
- Only needs a browser, but resets after 4 hours
- B.docker-machine + VirtualBox
- Free and runs locally,but requires a machine with 8GB memory
- C.Digital Ocean + Docker install
- Most like a production setup, but cost $5-10/node/month while learning
- Use my referral code in section resources to get $10 free
- D.Roll your own
- docker-machine can provision machines for Amazon, Azure, DO, Google, etc.
- Install docker anywhere with
get.docker.com
play-with-docker.com
# node1
# create 3 new instances
docker info
ping node2
docker-machine + VirtualBox
docker-machine create node1
docker-machine ssh node1
exit
docker-machine env node1
eval $(docker-machine env node1)
docker info # node1
docker-machine env --unset
eval $(docker-machine env --unset)
docker info
DO
# curl -fsSL https://get.docker.com -o get-docker.sh
# sh get-docker.sh
# root@node1
docker swarm init
docker swarm init --advertise-addr <IP address>
# docker swarm init --advertise-addr=192.168.99.100
#I'm going to copy the swarm join command and go over to node2 and add it in.
# root@node2
docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-c91v8sc6vpsyxw4zsqmrmg4ji 192.168.99.100:2377
# This node joined a swarm as a worker.
docker node ls
#Error response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager.
# go back to node1
# docker node ls
docker node update --role manager node2
# for node3, let's add it as a manager by default.
# We need to go back to our original command of docker swarm, and then we need to get the join-token.
# go back to node1
docker swarm join-token manager
# docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-dzz4pcg9inzkt9wnft4hskaxn 192.168.99.100:2377
# I'm going to copy this, paste it into node3
# node3
docker swarm join --token SWMTKN-1-1bn2hsyhjwqn4wztjqd7moftumffk4vwd2e88azwj7u7bg0q6m-dzz4pcg9inzkt9wnft4hskaxn 192.168.99.100:2377
# This node joined a swarm as a manager.
# Back on node1
docker node ls
docker service create --replicas 3 alpine ping 8.8.8.8
docker node ps
docker service ps <service name>
- Just choose
--driver overlay
when creating network - For container-to-container traffic inside a single Swarm
- Optional IPSec(AES) encryption on network creation
- Each service can be connected to multiple networks
- (e.g. front-end, back-end)
# root@node1
docker network create --driver overlay mydrupal
docker network ls
# https://github.com/bitnami/bitnami-docker-postgresql
docker service create --name psql --network mydrupal -e POSTGRESQL_POSTGRES_PASSWORD=mypass -e POSTGRESQL_DATABASE=progres postgres
docker service ls
docker service ps psql
docker logs psql.1.terabjvf7wkt5j769t04tld02
docker service create --name drupal --network mydrupal -p 80:80 drupal
docker service ls
docker service ps drupal #drupal is actrually running on Node2.
docker service inspect drupal #VIP
- Routes ingress(incoming) packets for a Service to proper Task
- Spans All nodes in Swarm
- Uses IPVS from Linux Kernel
- Load balances Swarm Services across their Tasks
- Two ways this works:
- Container-to-container in a Overlay network(uses VIP)
- External traffic incoming to published ports(all nodes listen)
docker service create --name search --replicas 3 -p 9200:9200 elasticsearch:2
docker service ps search
curl localhost:9200
curl localhost:9200
curl localhost:9200
- This is stateless load balancing
- This LB is at OSI Layer 3(TCP), not Layer 4(DNS)
- Both limitation can be overcome with:
- Niginx or HAProxy LB proxy, or:
- Docker Enterprise Edition, which comes with built-in L4 web proxy
- Using Docker's Distributed Voting App
- use
swarm-app-1
directory in our course repo for requirements - 1 volume, 2 networks, and 5 services needed
- Create the commands needed, spin up services, and test app
- Everything is using Docker Hub images, so no data needed on Swarm
- Like many computer things, this is 1/2 art form and 1/2 science
docker node ls
docker ps -a
docker service ls
docker network create -d overlay backend
docker network create -d overlay frontend
docker service create --name vote -p 80:80 --network frontend --replicas 2 <image>
docker service create --name redis --network frontend --replicas 1 redis:3.2
docker service create --name worker --network frontend --network backend <image>
docker service create --name db --network backend --mount type=volume,source=db-data,target=/var/lib/postgresql/data postgres:9.4
docker service create --name result --network backend -p 5001:80
docker service ls
docker service ps result
docker service ps redis
docker service ps db
docker service ps vote
docker service ps worker
cat /etc/docker/
docker service logs worker
- In 1.13 Docker adds a new layer of abstraction to Swarm called Stacks
- Stacks accept Compose files as their declarative definition for services, networks, and volumes
- We use
docker stack deploy
ranther then docker service create - Stacks managers all those objects for us, including overlay network per stack. Adds stack name to start of their name
- New
deploy:
key in Compose file. Can't dobuild:
- Compose now ignores
deploy:
, Swarm ignoresbuild:
docker-compose
cli not needed on Swarm server
docker stack deploy -c example-voting-app-stack.yml voteapp
docker stack --help
docker stack ls
docker stack services voteapp
docker stack ps voteapp
docker network ls
# overlay
docker stack deploy -c example-voting-app-stack.yml voteapp #update
- Easiest "secure" solution for storing secrets in Swarm
- What is a Secret?
- Usernames and passwords
- TLS certificates and keys
- SSH keys
- Any data you would prefer not be "on front page of news"
- Supports generic strings or binary content up to 500Kb in size
- Doesn't require apps to be rewritten
- As of Docker 1.13.0 Swarm Raft DB is encrypted on disk
- Only stored on disk on Manager nodes
- Default is Managers and Workers "control plane" is TLS + Mutual Auth
- Secrets are first stored in Swarm, then assigned to a Service(s)
- Only containers in assigned Services(s) can see them
- They look lik files in container but are actually in-memory fs
/run/secrets/<secret_name>
or/run/secrets/<secret_alias>
- Local docker-compose can use file-based secrets, but not secure
docker secret create psql_user psql_user.txt
echo "myDBpassWORD" | docker secret create psql_pass -
docker secret ls
docker secret inspect psql_user
docker service create --name psql --secret psql_user --secret psql_pass -e POSTGRES_PASSWORD_FILE=/run/secrets/psql_pass -e POSTGRES_USER_FILE=/run/secrets/psql_user postgres
docker service ps psql
docker exec -it <container name> bash
cat /run/secrets/psql_user
# mysqluser -> it worked
docker service ps psql
docker service update --secret-rm
docker stack deploy -c docker-compose.yml mydb
docker secret ls
docker stack rm mydb
ll
#docker-compose.yml
#psql_password.txt
#psql_user.txt
docker node ls
docker-compose up -d
docker-compose exec psql cat /run/secrets/psql_user
#dbuser
- Live The Dream!
- Single set of Compose files for:
- Local
docker-compose up
development environment - Remote
docker-compose up
CI environment - Remote
docker stack deploy
production environment
- Docker Desktop preferred(Win/Mac)
- Docker Toolbox(Win 7/8/10 Home)
- Linux: Install via Docker Docs
- CLI: docker-compose
- Included in Docker Desktop & Toolbox
- Linux: pip install docker-compose
docker version
# xx
docker-compose version
# xx
- 2 parts: CLI and YAML files
- Designed around developer workflows
- docker-compose CLI a substitute for docker CLI
- Use CLI by default locally
OOPS!~I meant to say "I reccomend you use Docker Compose locally"
- Docker standard (not yet industry std)
- Defines multiple containers, networks, volumes, etc.
- Can layer sets of YAML files, use templates, variables, and more
- docker-compose.yml default
YAML: "YAML Ain't Markup Language"
- Common configuration file format
- Used by Docker, Kubernetes, Amazon, and others
- : used for key/value pairs
- Only spaces, no tabs
-
- used for lists
- Myth busting: v3 does not replace v2
- v2 focus: single-node dev/test
- v3 focus: muti-node orchestration
- If not using Swarm/Kubernetes, stick to v2
- many docker commands == docker-compose
- IDE's now support docker-compose
- "batteries included, but swappable"
- CLI and YAML versions differ
- "one stop shop"
- build/pull image(s) if missing
- create volume/network/container(s)
- starts containers(s) in foregound(-d to detach)
- --build to always build
- "one stop shop"
- stop and delete network/container(s)
- use -v to delete volumes
- Many commands take "service" option
- build just build/rebuild image(s)
- stop just stop containers don't delete
- ps list "services"
- push images to registry
- logs same as docker CLI
- exec same as docker CLI
- Run through simple compose commands
cd sample-02
docker-compose up
ctrl-c (same as docker-compose stop)
docker-compose down
docker-compose up -d
docker-compose ps
docker-compose logs
- While app is running detached...
docker-compose exec web sh
curl localhost
exit
- edit Dockerfile, add curl with apk
RUN apk add --update curl
docker-compose up -d
- Notice it didn't build so force it
docker-compose up -d --build
- Now try curl again
docker-compose exec web sh
curl localhost
exit
- Inside sample-02 directory
docker-compose down
docker-compose down --helps
- build no chace
docker-compose build --no-cache
docker-compose ps
sample-02_web_1 docker-entrypoint.sh node ... Up 0.0.0.0:3000->3000/tcp
sample-02
-> name of the projectweb
-> name of the service1
-> numerical number of the replica
open localhost:3000
docker-machine ls
docker-compose logs
docker-compose logs web
docker-compose exec
docker-compose exec web sh
curl localhost
- add
RUN apk add --update curl
toDockerfile
docker-compose up -d --build
docker-compose exec web sh
curl localhost
curl localhost:3000
docker-compose down
Best practices for writing Dockerfiles
- COPY, not ADD
- npm/yarn install during build
- CMD node, not npm
- requires another application to run
- not as literal in Dockerfiles
- doesn't work well as an init or PID 1 process
- WORKDIR not RUN mkdir
- Unless you need chown
Alpine Linux| Node.js Release Schedule| GitHub: ONBUILD deprecation| Node Official Image on Docker Hub
- Stick to even numbered major releases
- Don't use :latest tag
- Start with Debian if migrating
- Move to Alpline later
- Don't use :slim
- Don't use :onbuild
- Alpine is "small" and "sec focused"
- But Debian/Ubuntu are smaller now too
- ~100MB space savings isn't significant
- Alpine has its own issues
- Alpine CVE scanning fails
- Enterprises may require CentOS or Ubuntu/Debian
- Install Node in the official CentOS
- Copy Dockerfile lines frome node:10
- Use ENV to specify node version
- This will take a few tries
- Useful for knowing how to make your own node, but only if you have to
NOTE: You must add 'USER node' before CMD in Dockerfile to enable non-root user
# Dockfile
RUN groupadd --gid 1000 node \
&& useradd --uid 1000 --gid node --shell /bin/bash --create-home node
build
docker build -t centos-node .
docker run centos-node node --version
- Official node images have a node user
- But it's not used by default
- Do this after
apt/apk
andnpm i -g
- Do this before
npm i
- May cause permissions issues with write access
- May require
chown node:node
- Change user from root to node
USER node
- Set permissions on app dir
RUN mkdir app && chown -R node:node .
- Run a command as root in container
docker-compose exec -u root
- Pick proper FROM
- Line order matters
- COPY twice: package.json* then . .
- copy only the package and lock files
- run npm install
- copy everything else
- One apt-get per Dockerfile
- apt-get update cache prob
- No need for nodemon, forever, or pm2 on server
- We'll use nodemon in dev for file watch later
- Docker manages app start, stop, restart, healthcheck
- Node multi-thread: Docker manages multiple "replicas"
- One npm/node problem: They don't listen for proper shutdown signal by default
- PID 1 (Process Identifier) is the first process in a system (or container) (AKA init)
- Init process in a container has two jobs:
- reap zombie processes
- pass signals to sub-processes
- Zombie not a big Node issue
- Focus on proper Node shutdown
- Docker uses Linux signals to stop app (SIGINT/SIGTERM/SIGKILL)
- SIGINT/SIGTERM allow graceful stop
- npm doesn't respond to SIGINT/SIGTERM
- node doesn't respond by default, but can with code
- Docker provides a init PID 1 replacement option
- Temp: Use --init to fix ctrl-c for now
- Workaround: add tini to your image
- Production: your app captures SIGINT for proper exit
- Run any node app with --init to handle signals(temp solution)
docker run --init -d nodeapp
- Add tini to your Dockerfile, then use it in CMD (permanent workaround)
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "./bin/www"]
- Use JS snippet to properly capture signals(production solution)
./sample-graceful-shutdown/sample.js
- Make a Dockerfile for existing Node app
- use ./assignment-dockerfile/Dockerfile
- Start with node 10.15 on alpine
- Install tini, start node with tini
- Copy package/lock files first, then npm, then copy
FROM
EXPOSE
RUN
WORKDIR
COPY
RUN
COPY
ENTRYPOINT
CMD
docker build -t assignment1 .
docker run -p 3001:3000 assignment1
# docker run -d -p 3001:3000 assignment1
- Use ./assignment-dockerfile/
- Run with tini built in, try to ctrl-c
- Run with tini built in, try to stop
- Remove ENTRYPOINT, rebuild
- Add --init to run command,ctrl-c/stop
- Bonus: add signal watch code
docker build -t assignment1:notini .
docker run -d -p 3001:3000 assignment1:notini
docker top 48a6
docker stop 48a6
# docker run --init -d -p 3001:3000 assignment1:notini
# docker top
docker run -d -p 3002:3000 assignment1
docker top 6ec2
docker stop 6ec2
- New feature in 17.06 (mid-2017)
- Build multiple images from on file
- Those images can FROM each other
- COPY files between them
- Space + Security benefits
- Great for "artifact only"
- Great for dev + test + prod
- FROM node as prod
- ENV NODE_ENV=production
- COPY package*.json ./
- RUN npm install && npm cache clear --force
- COPY . .
- CMD ["node", "./bin/wwww"]
- FROM prod as dev
- ENV NODE_ENV = development
- CMD ["nodemon", "./bin/www", "--inspect=0.0.0.0:9229"]
- To build dev image from dev stage
docker build -t myapp .
- To build prod image from prod stage
docker build -t myapp:prod --target prod .
- Add a test stage that runs npm test
- Have CI build --target test stage before building prod
- Add npm install --only=development to dev stage
- Don't COPY code into dev stage
- Create a Dockerfile from ./sample-multi-stage
- Create three stages for prod, dev, and test
- Prod has no devDependencies and runs node
- Dev includes devDep, runs nodemon
- Test has devDep, runs npm test
- Build all three stages with unique tags
- Goal: don't repeat lines
Assignment Answers
# prod
docker build -t multistage --target prod . && docker run --init -p 3000:3000 multistage
# dev
docker build -t multistage:dev --target dev . && docker run --init -p 3000:3000 multistage:dev
# test
docker build -t multistage:test --target test . && docker run --init multistage:test
- Follow 12factor.net principles, especially
- Use Environment Variables for config
- Log to stdout/stderr
- Pin all versions, even npm
- Graceful exit SIGTERM/INIT
- Create a .dockerignore (like .gitignore)
- Heroku wrote a highly respected guide to creating distributed apps
- Twelve factors to consider when developing or designing distributed apps
- Containers are almost always distributed apps
- Good news: You get many of these by using Docker
- Lets focus on a few for Node.js
- 12factor.net/config
- Store environment config in Environment Variables (env vars)
- Docker & Compose are great at this with multiple options
- Old apps: Use CMD or ENTRYPOINT script with
envsubst
to pass env vars into conf files
- 12factor.net/logs
- Apps shouldn't route or transport logs to anything but stdout/stderr
console.log()
works- Winston/Bunyan/Morgan: Use levels to control verbosity
- Winston transport: "Console"
- Prevent bloat and unneeded files
- .git/
- node_modules/
- npm-debug
- docker-compose*.yml
- Not needed but useful in image
- Dockerfile
- README.md
- "Traditional App" = Pre-Docker App
- Take a typical Node app and "migrate"
- ./assignment-mta
- add .dockerignore
- Create Dockerfile
- Change Winston transport to Console
- See README.md for app details
- Image shouldn't include
in
,out
,node_modules
orlogs
directories - Change Winston to Console
winston.transports.Console
- bind-mount
in
andout
dirs - Set
CHARCOAL_FACTOR
to 0.1
- Running container with
./in
and./out
bind-mounts results in new chalk images in./out
on host - Changing
--env CHARCOAL_FACTOR
changes look of image (test with 10) - No
.gif
files in image docker logs
shows Winston output
Assignment
docker build -t mta .
docker run mta
docker run -it mta bash
docker run -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out mta
docker run -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out --env CHARCOAL_FACTOR=10 mta
docker ps -l
docker logs dbda736f5c09
docker run -v $(pwd)/logs:/app/logs -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out mta
docker run -v $(pwd)/logs:/app/logs -v $(pwd)/in:/app/in -v $(pwd)/out:/app/out --env CHARCOAL_FACTOR=10 mta
- cd ./compose-tips
- Do use docker-compose for local dev
- Do use v2 format for local dev
- v2 only: depends_on, hardware specific
- Do study compose file and CLI features
- Unnecessary: "alias" & "container_name"
- Legacy: "expose" & "links"
- No need to set defaults
- Don't use host file paths
- Don't bing-mount databases
- For local dev only?don't copy in code
- DDforWin needs drive perms
- Perms: Linux != Windows
- Problem: we shouldn't build images with node_modules from host
- Example: node-gyp
- Solution: add node_modules to
.dockerignore
- Let's do this to
./sample-sails
cp .gitignore .dockerignore docker build -t sailsbret .
- Problem: we can't bind-mount node_modules from host on macOS/Windows (different arch)
- To Potential Solutions:
- Never use
npm i
on host, runnpm i
in compose - Move modules in image, hide modules from host
- Never use
# sample-express
docker-compose up # can't find module ...
docker-compose run express npm install
docker-compose up
- Solution 1, simple but less flexible;
- You can't
docker-compose up
until you've useddocker-compose run
- node_modules on host is now only usable from container
- You can't
- Solution 2, more setup but flexible:
- Move node_modules up a directory in Dockerfile
- Use empty volume to hide node_modules on bind-mount
- node_modules on host doesn't conflict
# .dockerignore--->node_modules ls npm install docker-compose build # rebuiding docker-compose up -d docker-compose ps docker-compose exec express bash ls node_modules/
- Two ways to run various tools inside the container:
- docker-compose run: start a new container and run command/shell
- docker-compose exec: run additional command/shell in currently running container
# sample-strapi, .dockerignore -> node_modules
# Also remember to postinstall for strapi:
docker-compose run api npm i
docker-compose up
# other iterm
docker-compose exec api strapi --help
docker-compose exec api bash
- Use nodemon for compose file monitoring
- webpack-dev-server, etc. work the same
- Override Dockerfile via compose command
- If Windows, enable polling
- Create a nodemon.yml for advanced workflows(bower, webpack, parcel)
docker-compose run express npm install nodemon --save-dev
docker-compose build
docker-compose up
- Problem: Multi-service apps start out of order, node might exit or cycle
- Multi-container apps need:
- Dependency awareness
- Name resolution (DNS)
- Connection failure handling
depends_on:
when "up X", start Y first- Fixes name resolution issues with "can't resolve <service_name>"
- Only for compose, not Orch
- compose YAML v2: works with healthchecks like a "wait for script"
restart: on-failure
- Good: helps slow db startup and Node.js failing. Better: depends_on
- Bad: could spike CPU with restart cycling
- Solution: build connection timeout, buffer, and retries in your apps
depends_on
: is only dependency control by default- Add v2 healthchecks for true "wait_for"
- Let's see some examples
- Mongo
- Postgres/MySQL
- Web
- Problem: many HTTP endpoints, many ports
- Solution: Nginx/HAProxy/Traefik for host header routing + wildcard localhost domain
- Problem: CORS failures in dev
- Solution: Proxy with * header
- Problem: HTTPS locally
- Solution: Create local proxy certs
- Problem: Multiple endpoints and need unique DNS for each
- Use x.localhost, y.localhost in Chrome
- Use wildcard domains like
*.vcap.me
orxip.io
- Use dnsmasq on macOS/Linux
- Manually edit hosts file
- VS Code and other editors have some Docker and Compose features built-in
- Debugging works when we enable in nodemon and remote via TCP
- TypeScript compile and other pre-processors go in
nodemon.json
# typescript
docker-compose run ts npm i
docker-compose up
./assignment-sweet-compose
- Take all the learning from this section and apply it to a single compose file!
- Uses Docke's example voting app (Dog vs. Cat)
- Step-by-step in
README.md
- Multi-stage can solve this
- prod stages: npm i --only=production
- Dev stage: npm i --only=development
- Use
npm ci
to speed up builds - Ensure
NODE_ENV
is set - Sample
./multi-stage-deps/
- Document every line that isn't obvious
- FROM stage, document why it's needed
- COPY = don't document
- RUN = maybe document
- Add LABELS
- RUN npm config list
- LABEL has OCI standards now
LABEL org.opencontainers.image.<key>
- Use ARG to add info to labels like build date or git commit
- Docker Hub has built-in envvars for use with ARGs
- Sample
./dockerfile-labels/
- YAML (unlike JSON) supports comments!
- Document objects that aren't obvious
- Why a volume is needed
- Why custom CMD is needed
- Template blocks at top
- Override objects and files
Run npm test
in a specific build-stage- Also good for linting commands
- Only run unit tests in build
- Test stage not default
- Locally, run docker-compose, run node npm test
- Sample
./multi-stage-test/
docker build -t testnode --target=test .
docker build -t testnode --target=test --no-cache .
- Use test stage in multi-stage, or new
- Or run it once image is built with CI
- Only report at first, no failing (most images have at least one CVE vuln)
- Consider
RUN npm audit
./multi-stage-scanning/
docker build -t auditnode --target=audit --build-arg MICROSCANNER_TOKEN=$MICROSCANNER_TOKEN .
- Have CI build images on (some) branches
- Push to registry once build/tests pass
- Lint Dockerfile and Compose/Stack files
- Use
docker-compose run
or--exit-code-from
for proper exit codes - Docker Hub can do this
- :latest is only a convention
- Use latest for local easy access to current release
- Maybe do this per major branch too for convenience
- Don't repeat tags on CI or servers
- Always include
HEALTHCHECK
- Docker run and docker-compose: info noly
- Docker Swarm: Key for uptime and rolling updates
- Kubernetes: Not used, but helps in others making readiness/liveness probes
- Sample
./healthchecks/
# option 1
HEALTHCHECK --interval=5m --timeout=3s \
CMD curl -f http://localhost/ || exit 1
# option 2
HEALTHCHECK CMD curl -f http://localhost/healthz || exit 1
# option 3
HEALTHCHECK --interval=30s CMD node hc.js
./ultimate-node-dockerfile/
- Use an existing Node.js sample app
- Make a production grade Dockerfile
- Development friendly, testing stage, security, non-root user, labels, minimal prod size
- Requirements in
README.md
# security
docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:test --target test .
# buildkit
DOCKER_BUILDKIT=1 docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:test --target test .
DOCKER_BUILDKIT=1 docker build --build-arg=MICROSCANNER_TOKEN=$MICROSCANNER -t ultimatenode:prod --target prod .
- Node is usually single threaded
- Use multiple replicas, not PM2/forever
- Start with 1-2 replicas per CPU
- Unit testing = single replica.
- Integration testing = multiple replicas
- Only understands a single server (engine)
- Doesn't understand uptime or headlthchecks
- Swarm is easy and solves most use cases
- Single server? Use Swarm
- Kubernetes not ideal for 1-5 servers. Try cloud hosted
- Common: many HTTP containers need to listen on 80/443
- Nginx and HAProxy have lots of options
- Traefik is the new kid, full of cool features
- Think early how your Node apps will communicate on a single server or cluster
- Add SIGTERM Code to all Node.js apps
./sample-graceful-shutdown
- Prevents killing app, but not graceful connection migration
- Check godaddy/terminus for easier hc + shutdown
- Shutdown wait defaults: Docker/Swarm: 10s, Kubernetes: 30s
- Kubernetes/Swarm use healthchecks differently for ingress LB
- Give shutdown waits longer than HTTP long polling
- HTTP: Use stoppable to track open connections
- Multi-container, single image
- Startup "ready" state: healthchecks
- Multi-container client state sharing(don't use in-memory state)
- Shutdown cleanup: reconnect clients, close DB, fail readiness (K8s)
./sample-result-orchestration
- Kubernetes and Swarm-ready version
- Healthcheck/Readiness wait for DB
- Readiness re-checks DB connection
socket.io
uses redis- Stoppable for cleanup
./sample-swarm/
- Example of Node.js app stack
- Has cluster features under "deploy"
- replicas, update_config
- stop_grace_period
- ARM processors are used everywhere
- But it's hard to develop on ARM
- April 2019: docker + ARM partnership
- Docker Desktop runs ARM now!
- Node is great on ARM
- Docker is the easiest way to dev for ARM
- Easy button: Change the
FROM
image toarm64v8/node:<tag>
- This forces macOS/Win to run ARM
- Uses QEMU "proc emulator"
- Build/run like normal
- Mix with x86 in compose
docker image inspect arm64v8/node:10-alpine | grep Arch
# "Architectrue":"arm64",
docker image inspect node:10-alpine | grep Arch
# "Architectrue":"amd64",
docker build -t arm64node .
docker run -p 8080:3000 -d arm64node
docker ps
- AWS A1 Instances (Graviton Processors)
- Test your IOT/Embedded code
- Docker Hub doesn't build arm64 images
- Or does it?(QEMU hack)
- Build your own CI with QEMU
- Swarm just works!
- ARM + Docker partnership will make this easier
- Build multi-arch in one command
- Store multi-arch images in single repo
- Easier to know which arch you're running locally