Demo stand Kubernetes cluster is based on following components:
- Docker with NVIDIA Container Runtime for Docker
- Kubernetes packages: kubeadm, kubelet, kubectl, kubernetes-cni
- Docker registry
Docker and all Kubernetes packages should be deployed both on master and each worker node. NVIDIA Container Runtime for Docker should be deployed on each worker node with GPUs.
This docs are tested for following versions of demo stand components:
- Docker
19.03.1
- Kubelet
v1.15.3
- Kubeadm:
v1.15.3
-
Install Docker:
sudo apt-get update sudo apt-get install docker-ce
-
Install NVIDIA Container Runtime for Docker:
- Add package repositories:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \ sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update
- Install nvidia-docker2 and reload the Docker daemon configuration:
sudo apt-get install -y nvidia-docker2 sudo pkill -SIGHUP dockerd
- Add package repositories:
Reference resources:
- Docker installation: https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce-1
- NVIDIA Docker installation: https://github.com/NVIDIA/nvidia-docker
-
Kubernetes can't operate with swap enabled on host machine. Disable swap on master and each slave node:
sudo swapoff -a
-
Install Kubernetes:
- Add package repositories:
sudo apt-get update && sudo apt-get install -y apt-transport-https curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
- Install Kubernetes packages:
sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni
- Add package repositories:
Reference resources:
- Kubernetes docs, kubeadm installation: https://kubernetes.io/docs/setup/independent/install-kubeadm/
For now plain insecure HTTP registry is used.
-
If required, install Docker on Docker registry host machine:
sudo apt-get update sudo apt-get install docker-ce
-
Deploy Docker registry:
docker run -d -p <registry port, default=5000>:5000 --restart=always --name registry registry:2
-
To enable insecure access to registry, follow these steps on master and each slave node (as well as each host that wants to access the registry):
- Edit or create
/etc/docker/daemon.json
file with updating it with following contents:{ "insecure-registries" : ["<registry domain or IP>:<registry port>"] }
- Restart Docker for the changes to take effect:
sudo systemctl daemon-reload sudo systemctl restart docker
- Edit or create
Reference resources:
- Docker docs, docker registry: https://docs.docker.com/registry/
- Docker docs, insecure registry deployment: https://docs.docker.com/registry/insecure/
- Docker docs, secure registry deployment: https://docs.docker.com/registry/deploying/
All commands are assumed to be executed on the master node. We use Flannel as pod network driver.
- Clone the repo and
cd
to thekube_scripts
dir:git clone https://github.com/deepmipt/stand_kubernetes_cluster.git cd stand_kubernetes_cluster/tools/kube_scripts
- Initiate cluster master node:
If succeeded,
sudo sysctl net.bridge.bridge-nf-call-iptables=1 sudo kubeadm init --pod-network-cidr=10.244.0.0/16
kubeadm join
will be printed at the end of init process:Save it to run while nodes joining.kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>
- Make cubectl work for your user:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Deploy Flannel network driver:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
- Or you can simply run
kubeadm_init_flannel.sh
script fromstand_kubernetes_cluster/tools/kube_scripts
instead of steps 2-4:sudo sh kubeadm_init_flannel.sh
- Deploy NVIDIA device plugin for Kubernetes with running
deploy_nvidia_plugin.sh
script fromstand_kubernetes_cluster/tools/kube_scripts
:sudo sh deploy_nvidia_plugin.sh
Reference resources:
- NVIDIA device plugin for Kubernetes: https://github.com/NVIDIA/k8s-device-plugin
- To add worker nodes to the cluster run with
--ignore-preflight-errors=all
option savedkubeadm join
as sudo on each worker node and restart kubelet:sudo kubeadm join <master-ip>:<master-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>
- Check all system pods are running:
kubectl get all --namespace=kube-system
- List all nodes to check their availability:
kubectl -n kube-system get nodes
- To get
kubeadm join
run on master node:sudo kubeadm token create --print-join-command
- To tear down the node:
- Run on master:
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets kubectl delete node <node name>
- Run on node
reset_node.sh
script fromstand_kubernetes_cluster/tools/kube_scripts
:sudo sh reset_node.sh
- Run on master:
- To totally dismantle cluster first tear down all worker nodes then tear down master node as written above.
Reference resources:
- Kubernetes docs, crating cluster: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
- GitHub, Kube-router deployment: https://github.com/cloudnativelabs/kube-router/blob/master/Documentation/kubeadm.md
- Habrahabr, Kubernetes deployment: https://habrahabr.ru/company/southbridge/blog/334846/
- Habrahabr, Kubernetes deployment: https://habr.com/post/348688/
Build Docker images and push them to registry. Here is the basic reference:
- Docker docs, push/pull operations: https://docs.docker.com/registry/
- Docker docs, image naming: https://docs.docker.com/registry/introduction/#understanding-image-naming
For now demo stand solution requires some Kubernetes objects like Namespaces, PersistentVolumes and so on to be created before stand skills and services (payload) are deployed.
Yaml configs for these objects are located in stand_kubernetes_cluster/kuber_configs/common
.
To create these objects:
cd
tostand_kubernetes_cluster/kuber_configs/common
- We run demo stand payload in
stand-demo
Namespace. To create in run:kubectl create -f namespaces/stand_demo_ns.yaml
- Create hostpath volumes for stand component and logs:
kubectl create -f volumes/logs_hostpath kubectl create -f volumes/components_hostpath kubectl create -f volumes/db_hostpath kubectl create -f volumes/rb_hostpath
For Kubernetes storaging and Volumes you can reference:
- Kubernetes docs, Volumes: https://kubernetes.io/docs/concepts/storage/volumes/
- Kubernetes docs, PersistentVolumes manual: https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/
So far, payload deployment includes following steps:
- Building of Docker image and pushing it to the registry
- Deployment definition and launching
- Service definition and launching
Yaml configs for payload Deployments and Services are located in stand_kubernetes_cluster/kuber_configs/models
.
To deploy stand payload:
- Push payload Docker image to the cluster registry (Russian NER example):
sudo docker push kubeadm.ipavlov.mipt.ru:5000/stand/ner_ru
cd
tostand_kubernetes_cluster/kuber_configs/models
- Create payload Deployment (Russian NER example):
kubectl create -f stand_ner_ru/stand_ner_ru_dp.yaml
- Create payload Service (Russian NER example):
kubectl create -f stand_ner_ru/stand_ner_ru_lb.yaml
- Or you can simply run following instead of steps 3-4:
kubectl create -f stand_ner_ru
For Deployments and Services definition you can reference:
- Kubernetes docs, deployments: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
- Kubernetes docs, deployments API: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#deployment-v1-apps
- Kubernetes docs, services: https://kubernetes.io/docs/concepts/services-networking/service/
- Kubernetes docs, services API: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#service-v1-core
We run demo stand payload in stand-demo
Namespace. So we should use -n stand-demo
flag in all kubectl
operations with demo stand objects.
Demo stand objects listing:
- List Services:
kubectl -n stand-demo get services
- List Deployments:
kubectl -n stand-demo get deployments
- List Pods:
kubectl -n stand-demo get pods
Get detailed information about object:
- Service:
kubectl -n stand-demo describe service <service_name>
- Deployment:
kubectl -n stand-demo describe deployment <deployment_name>
- Pod:
kubectl -n stand-demo describe pod <pod_name>
Delete object:
- Service:
Or:
kubectl -n stand-demo delete service <service_name>
kubectl delete -f <service_config.yaml>
- Deployment:
Or:
kubectl -n stand-demo delete deployment <deployment_name>
kubectl delete -f <deployment_config.yaml>
- Pod (new pod will be instantly created instead of deleted):
kubectl -n stand-demo delete pod <pod_name>
Apache 2.0 - licensed.