Skip to content

Commit

Permalink
Added information on how device plugins can take advantage of Topolog…
Browse files Browse the repository at this point in the history
…y Manager.

Updated the Topology Manager documentation to include missing information pointed out by users as well as updating out of date sections.
  • Loading branch information
lmdaly committed Nov 8, 2019
1 parent df8a83b commit c0ef48b
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,25 @@ DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a

Support for the "PodResources service" requires `KubeletPodResources` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be enabled. It is enabled by default starting with Kubernetes 1.15.

## Device Plugin integration with Topoloy Manager

{{< feature-state for_k8s_version="v1.16" state="alpha" >}}

Topology Manager is a new Kubelet component that allows reaources to be co-ordintated in a Topology aligned manner. In order to do this, the Device Plugin API was extended to include a `TopologyInfo` struct.

```gRPC
message TopologyInfo {
repeated NUMANode nodes = 1;
}
message NUMANode {
int64 ID = 1;
}
```
Device Plugins that wish to leverage the Topology Manager can send back a populated TopologyInfo struct as part of the device registration, along with the device IDs and the health of the device. The device manager will then use this information to consult with the Topology Manager to make resource assingment decisions based on Topology alignment.

More information on Topology Manager availablle [here](/docs/tasks/adminster-cluster/topology-manager.md)

## Device plugin examples {#examples}

Here are some examples of device plugin implementations:
Expand Down
30 changes: 27 additions & 3 deletions content/en/docs/tasks/administer-cluster/topology-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ The hint is then stored in the Topology Manager for use by the *Hint Providers*
The Topology Manager currently:

- Works on Nodes with the `static` CPU Manager Policy enabled. See [control CPU Management Policies](/docs/tasks/administer-cluster/cpu-management-policies/)
- Works on Pods in the `Guaranteed` {{< glossary_tooltip text="QoS class" term_id="qos-class" >}}
- Works on Pods making CPU or Device requsts via extended resources

If these conditions are met, Topology Manager will align CPU and device requests.
If these conditions are met, Topology Manager will align the requested resources.

Topology Manager supports four allocation policies. You can set a policy via a Kubelet flag, `--topology-manager-policy`.
There are four supported policies:
Expand Down Expand Up @@ -85,6 +85,9 @@ Using this information, the Topology Manager stores the
preferred NUMA Node affinity for that container. If the affinity is not preferred,
Topology Manager will reject this pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure.

Once the pod is a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It it recommended to use a ReplicaSet or Deployment to trigger a redeploy of the pod.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error.

If the pod is admitted, the *Hint Providers* can then use this information when making the
resource allocation decision.

Expand All @@ -97,6 +100,8 @@ If it is, Topology Manager will store this and the *Hint Providers* can then use
resource allocation decision.
If, however, this is not possible then the Topology Manager will reject the pod from the node. This will result in a pod in a `Terminated` state with a pod admission failure.

Once the pod is a `Terminated` state, the Kubernetes scheduler will **not** attempt to reschedule the pod. It it recommended a Deployment with Replicas to trigger a redeploy of the pod.
An external control loop could be also implemented to trigger a redeployment of pods that have the `Topology Affiniy` error.

### Pod Interactions with Topology Manager Policies

Expand Down Expand Up @@ -149,8 +154,27 @@ spec:
This pod runs in the `Guaranteed` QoS class because `requests` are equal to `limits`.

Topology Manager would consider this Pod. The Topology Manager consults the CPU Manager `static` policy, which returns the topology of available CPUs.
Topology Manager also consults Device Manager to discover the topology of available devices for example.com/device.
Topology Manager also consults the Device Manager to discover the topology of available devices for example.com/device.

Topology Manager will use this information to store the best Topology for this container. In the case of this Pod, CPU and Device Manager will use this stored information at the resource allocation stage.

```yaml
spec:
containers:
- name: nginx
image: nginx
resources:
limits:
example.com/deviceA: "1"
example.com/deviceB: "1"
requests:
example.com/deviceA: "1"
example.com/deviceB: "1"
```
This pod runs in the `BestEffort` QoS class because there are not CPU and memory requests.

Topology Manager would consider this Pod. The Topology Manager consults the Device Manager to discover the topology of the available devices for example.com/deviceA and example.com/deviceB.

As above Topology Manager will use this information to store the best Topology for this container. Device Manager will then use this when assigning devices to the Pod.

{{% /capture %}}

0 comments on commit c0ef48b

Please sign in to comment.