-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: Support Device Monitoring #2454
Merged
k8s-ci-robot
merged 1 commit into
kubernetes:master
from
vikaschoudhary16:compute-devices
Nov 12, 2018
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
--- | ||
kep-number: 18 | ||
title: Kubelet endpoint for device assignment observation details | ||
authors: | ||
- "@dashpole" | ||
- "@vikaschoudhary16" | ||
owning-sig: sig-node | ||
reviewers: | ||
- "@thockin" | ||
- "@derekwaynecarr" | ||
- "@dchen1107" | ||
- "@vishh" | ||
approvers: | ||
- "@sig-node-leads" | ||
editors: | ||
- "@dashpole" | ||
- "@vikaschoudhary16" | ||
creation-date: "2018-07-19" | ||
last-updated: "2018-07-19" | ||
status: provisional | ||
--- | ||
# Kubelet endpoint for device assignment observation details | ||
|
||
Table of Contents | ||
================= | ||
* [Abstract](#abstract) | ||
* [Background](#background) | ||
* [Objectives](#objectives) | ||
* [User Journeys](#user-journeys) | ||
* [Device Monitoring Agents](#device-monitoring-agents) | ||
* [Changes](#changes) | ||
* [Potential Future Improvements](#potential-future-improvements) | ||
* [Alternatives Considered](#alternatives-considered) | ||
|
||
## Abstract | ||
In this document we will discuss the motivation and code changes required for introducing a kubelet endpoint to expose device to container bindings. | ||
|
||
## Background | ||
[Device Monitoring](https://docs.google.com/document/d/1NYnqw-HDQ6Y3L_mk85Q3wkxDtGNWTxpsedsgw4NgWpg/edit?usp=sharing) requires external agents to be able to determine the set of devices in-use by containers and attach pod and container metadata for these devices. | ||
|
||
## Objectives | ||
|
||
* To remove current device-specific knowledge from the kubelet, such as [accellerator metrics](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/stats/v1alpha1/types.go#L229) | ||
* To enable future use-cases requiring device-specific knowledge to be out-of-tree | ||
|
||
## User Journeys | ||
|
||
### Device Monitoring Agents | ||
|
||
* As a _Cluster Administrator_, I provide a set of devices from various vendors in my cluster. Each vendor independently maintains their own agent, so I run monitoring agents only for devices I provide. Each agent adheres to to the [node monitoring guidelines](https://docs.google.com/document/d/1_CdNWIjPBqVDMvu82aJICQsSCbh2BR-y9a8uXjQm4TI/edit?usp=sharing), so I can use a compatible monitoring pipeline to collect and analyze metrics from a variety of agents, even though they are maintained by different vendors. | ||
* As a _Device Vendor_, I manufacture devices and I have deep domain expertise in how to run and monitor them. Because I maintain my own Device Plugin implementation, as well as Device Monitoring Agent, I can provide consumers of my devices an easy way to consume and monitor my devices without requiring open-source contributions. The Device Monitoring Agent doesn't have any dependencies on the Device Plugin, so I can decouple monitoring from device lifecycle management. My Device Monitoring Agent works by periodically querying the `/devices/<ResourceName>` endpoint to discover which devices are being used, and to get the container/pod metadata associated with the metrics: | ||
|
||
![device monitoring architecture](https://user-images.githubusercontent.com/3262098/43926483-44331496-9bdf-11e8-82a0-14b47583b103.png) | ||
|
||
|
||
## Changes | ||
|
||
Add a v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns information about the kubelet's assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager. The GRPC Service returns a single PodResourcesResponse, which is shown in proto below: | ||
```protobuf | ||
// PodResources is a service provided by the kubelet that provides information about the | ||
// node resources consumed by pods and containers on the node | ||
service PodResources { | ||
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {} | ||
} | ||
|
||
// ListPodResourcesRequest is the request made to the PodResources service | ||
message ListPodResourcesRequest {} | ||
|
||
// ListPodResourcesResponse is the response returned by List function | ||
message ListPodResourcesResponse { | ||
repeated PodResources pod_resources = 1; | ||
} | ||
|
||
// PodResources contains information about the node resources assigned to a pod | ||
message PodResources { | ||
string name = 1; | ||
string namespace = 2; | ||
repeated ContainerResources containers = 3; | ||
} | ||
|
||
// ContainerResources contains information about the resources assigned to a container | ||
message ContainerResources { | ||
string name = 1; | ||
repeated ContainerDevices devices = 2; | ||
} | ||
|
||
// ContainerDevices contains information about the devices assigned to a container | ||
message ContainerDevices { | ||
string resource_name = 1; | ||
repeated string device_ids = 2; | ||
} | ||
``` | ||
|
||
### Potential Future Improvements | ||
|
||
* Add `ListAndWatch()` function to the GRPC endpoint so monitoring agents don't need to poll. | ||
* Add identifiers for other resources used by pods to the `PodResources` message. | ||
* For example, persistent volume location on disk | ||
|
||
## Alternatives Considered | ||
|
||
### Add v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns a list of [CreateContainerRequest](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L734)s used to create containers. | ||
* Pros: | ||
* Reuse an existing API for describing containers rather than inventing a new one | ||
* Cons: | ||
* It ties the endpoint to the CreateContainerRequest, and may prevent us from adding other information we want in the future | ||
* It does not contain any additional information that will be useful to monitoring agents other than device, and contains lots of irrelevant information for this use-case. | ||
* Notes: | ||
* Does not include any reference to resource names. Monitoring agentes must identify devices by the device or environment variables passed to the pod or container. | ||
|
||
### Add a field to Pod Status. | ||
* Pros: | ||
* Allows for observation of container to device bindings local to the node through the `/pods` endpoint | ||
* Cons: | ||
* Only consumed locally, which doesn't justify an API change | ||
* Device Bindings are immutable after allocation, and are _debatably_ observable (they can be "observed" from the local checkpoint file). Device bindings are generally a poor fit for status. | ||
|
||
### Use the Kubelet Device Manager Checkpoint file | ||
* Allows for observability of device to container bindings through what exists in the checkpoint file | ||
* Requires adding additional metadata to the checkpoint file as required by the monitoring agent | ||
* Requires implementing versioning for the checkpoint file, and handling version skew between readers and the kubelet | ||
* Future modifications to the checkpoint file are more difficult. | ||
|
||
### Add a field to the Pod Spec: | ||
* A new object `ComputeDevice` will be defined and a new variable `ComputeDevices` will be added in the `Container` (Spec) object which will represent a list of `ComputeDevice` objects. | ||
```golang | ||
// ComputeDevice describes the devices assigned to this container for a given ResourceName | ||
type ComputeDevice struct { | ||
// DeviceIDs is the list of devices assigned to this container | ||
DeviceIDs []string | ||
// ResourceName is the name of the compute resource | ||
ResourceName string | ||
} | ||
|
||
// Container represents a single container that is expected to be run on the host. | ||
type Container struct { | ||
... | ||
// ComputeDevices contains the devices assigned to this container | ||
// This field is alpha-level and is only honored by servers that enable the ComputeDevices feature. | ||
// +optional | ||
ComputeDevices []ComputeDevice | ||
... | ||
} | ||
``` | ||
* During Kubelet pod admission, if `ComputeDevices` is found non-empty, specified devices will be allocated otherwise behaviour will remain same as it is today. | ||
* Before starting the pod, the kubelet writes the assigned `ComputeDevices` back to the pod spec. | ||
* Note: Writing to the Api Server and waiting to observe the updated pod spec in the kubelet's pod watch may add significant latency to pod startup. | ||
* Allows devices to potentially be assigned by a custom scheduler. | ||
* Serves as a permanent record of device assignments for the kubelet, and eliminates the need for the kubelet to maintain this state locally. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a pattern we are following elsewhere - GRPC over UNIX socket via a named socket-file?
Is there an established pattern for "I want to talk to my local kubelet" ?
@dchen1107 @yujuhong @derekwaynecarr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern thus far for components that talk to the kubelet is to use grpc over unix sockets, or to use the read-only port. The only examples I am aware of are all plugins (CRI, Device Plugin, CSI, etc), which are all grpc-based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is that this requires
hostPath
which many would rather have disabled entirely. We should probably think about a first-class way of accessing "my kubelet" that doesn't rely on that. Maybe even a new volume type that exposes these UNIX files, or something.I would love a volunteer to help drive that brainstorm...