Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podresources: add Watch endpoint #1926

Closed
wants to merge 5 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions keps/sig-node/compute-device-assignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,18 @@ In this document we will discuss the motivation and code changes required for in

## Changes

Add a v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns information about the kubelet's assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager. The GRPC Service returns a single PodResourcesResponse, which is shown in proto below:
Add a v1alpha1 Kubelet GRPC service, at `/var/lib/kubelet/pod-resources/kubelet.sock`, which returns information about the kubelet's assignment of devices to containers. It obtains this information from the internal state of the kubelet's Device Manager.
The GRPC Service exposes two endpoints:
- `List`, which returns a single PodResourcesResponse, enabling monitor applications to poll for resources allocated to pods and containers on the node.
- `Watch`, which returns a stream of PodResourcesResponse, enabling monitor applications to be notified of new resource allocation, release or resource allocation updates, using the `action` field in the response.

This is shown in proto below:
```protobuf
// PodResources is a service provided by the kubelet that provides information about the
// node resources consumed by pods and containers on the node
service PodResources {
rpc List(ListPodResourcesRequest) returns (ListPodResourcesResponse) {}
rpc Watch(WatchPodResourcesRequest) returns (stream WatchPodResourcesResponse) {}
}

// ListPodResourcesRequest is the request made to the PodResources service
Expand All @@ -76,6 +82,21 @@ message ListPodResourcesResponse {
repeated PodResources pod_resources = 1;
}

// WatchPodResourcesRequest is the request made to the Watch PodResourcesLister service
message WatchPodResourcesRequest {}

enum WatchPodAction {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is each action emitted? can you clarify when modified would be used in life of pod?

Copy link
Contributor Author

@ffromani ffromani Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actions should be emitted:

  • ADDED: when resources are assigned to the pod (I'm thinking about HintProvider's Allocate())
  • DELETED: when resources are claimed back (I'm thinking about UpdateAllocatedDevices())
    I'll document better in the KEP text.

In Hindsight we most likely don't need MODIFED, will just remove it.

UPDATED = 0;
DELETED = 1;
ADDED = 2;
}

// WatchPodResourcesResponse is the response returned by Watch function
message WatchPodResourcesResponse {
WatchPodAction action = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if just exposing the pod resourceVersion here is a good way forward

repeated PodResources pod_resources = 2;
}

// PodResources contains information about the node resources assigned to a pod
message PodResources {
string name = 1;
Expand All @@ -98,7 +119,6 @@ message ContainerDevices {

### Potential Future Improvements

* Add `ListAndWatch()` function to the GRPC endpoint so monitoring agents don't need to poll.
* Add identifiers for other resources used by pods to the `PodResources` message.
* For example, persistent volume location on disk

Expand Down Expand Up @@ -164,6 +184,7 @@ Beta:

## Implementation History

- 2020-08-XX: KEP extended with ListAndWatch function
ffromani marked this conversation as resolved.
Show resolved Hide resolved
- 2018-09-11: Final version of KEP (proposing pod-resources endpoint) published and presented to sig-node. [Slides](https://docs.google.com/presentation/u/1/d/1xz-iHs8Ec6PqtZGzsmG1e68aLGCX576j_WRptd2114g/edit?usp=sharing)
- 2018-10-30: Demo with example gpu monitoring daemonset
- 2018-11-10: KEP lgtm'd and approved
Expand Down