diff --git a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md index 8e4b9434cf9b7..e9abd2235206c 100644 --- a/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md +++ b/content/en/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md @@ -184,6 +184,25 @@ DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a Support for the "PodResources service" requires `KubeletPodResources` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. It is enabled by default starting with Kubernetes 1.15. +## Device Plugin integration with Topoloy Manager + +{{< feature-state for_k8s_version="v1.16" state="alpha" >}} + +Topology Manager is a new Kubelet component that allows reaources to be co-ordintated in a Topology aligned manner. In order to do this, the Device Plugin API was extended to include a `TopologyInfo` struct. + +```gRPC +message TopologyInfo { + repeated NUMANode nodes = 1; +} + +message NUMANode { + int64 ID = 1; +} +``` +Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo struct as part of the device registration, along with the device IDs and the health of the device. The device manager will then use this information to consult with the Topology Manager to make resource assingment decisions based on Topology alignment. + +More information on Topology Manager availablle [here](/docs/tasks/adminster-cluster/topology-manager.md) + ## Device plugin examples {#examples} Here are some examples of device plugin implementations: diff --git a/content/en/docs/tasks/administer-cluster/topology-manager.md b/content/en/docs/tasks/administer-cluster/topology-manager.md index 66492041456b4..1c51a8021f5dc 100644 --- a/content/en/docs/tasks/administer-cluster/topology-manager.md +++ b/content/en/docs/tasks/administer-cluster/topology-manager.md @@ -50,9 +50,9 @@ The hint is then stored in the Topology Manager for use by the *Hint Providers* The Topology Manager currently: - Works on Nodes with the `static` CPU Manager Policy enabled. See [control CPU Management Policies](/docs/tasks/administer-cluster/cpu-management-policies/) - - Works on Pods in the `Guaranteed` {{< glossary_tooltip text="QoS class" term_id="qos-class" >}} + - Works on Pods making CPU or Device requsts via extended resources -If these conditions are met, Topology Manager will align CPU and device requests. +If these conditions are met, Topology Manager will align the requested resources. Topology Manager supports four allocation policies. You can set a policy via a Kubelet flag, `--topology-manager-policy`. There are four supported policies: @@ -85,6 +85,9 @@ Using this information, the Topology Manager stores the preferred NUMA Node affinity for that container. If the affinity is not preferred, Topology Manager will reject this pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure. +Once the pod is a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It it recommended to use a ReplicaSet or Deployment to trigger a redeploy of the pod. +An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error. + If the pod is admitted, the *Hint Providers* can then use this information when making the resource allocation decision. @@ -97,6 +100,8 @@ If it is, Topology Manager will store this and the *Hint Providers* can then use resource allocation decision. If, however, this is not possible then the Topology Manager will reject the pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure. +Once the pod is a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It it recommended a Deployment with Replicas to trigger a redeploy of the pod. +An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error. ### Pod Interactions with Topology Manager Policies @@ -149,8 +154,27 @@ spec: This pod runs in the `Guaranteed` QoS class because `requests` are equal to `limits`. Topology Manager would consider this Pod. The Topology Manager consults the CPU Manager `static` policy, which returns the topology of available CPUs. -Topology Manager also consults Device Manager to discover the topology of available devices for example.com/device. +Topology Manager also consults the Device Manager to discover the topology of available devices for example.com/device. Topology Manager will use this information to store the best Topology for this container. In the case of this Pod, CPU and Device Manager will use this stored information at the resource allocation stage. +```yaml +spec: + containers: + - name: nginx + image: nginx + resources: + limits: + example.com/deviceA: "1" + example.com/deviceB: "1" + requests: + example.com/deviceA: "1" + example.com/deviceB: "1" +``` +This pod runs in the `BestEffort` QoS class because there are not CPU and memory requests. + +Topology Manager would consider this Pod. The Topology Manager consults the Device Manager to discover the topology of the available devices for example.com/deviceA and example.com/deviceB. + +As above Topology Manager will use this information to store the best Topology for this container. Device Manager will then use this when assigning devices to the Pod. + {{% /capture %}}