Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add implementation details for configuration device plugin #76

Merged

Conversation

johnsonshih
Copy link
Contributor

Add implementation details for Configuration level resource support, the PR project-akri/akri#627 implements the design

Signed-off-by: Johnson Shih <jshih@microsoft.com>
Copy link
Contributor

@diconico07 diconico07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a line/paragraph that is a bit hard to understand that could benefit from a better phrasing.
Also noticed a few typos, but can have missed other ones, so you may want to do a check pass for this.

The `ConfiguratinDevicePlugin` defines the behavior of the configuration-level resources, combined with `DevicePluginService`, it forms a device plugin that advertises the configuration-level resources to the kubelet.

Similar to the Agent creates a `DevicePluginService` with `InstanceDevicePlugin` behavior for each discovered Instance. The Agent creates a `DevicePluginService` with `ConfigurationDevicePlugin` behavior for each Configuration when the Configuration is applied to the cluster. All `DevicePluginService` share the same structure that has `list_and_watch` and `allocate` for kubelet to call. The actual behavior of `list_and_watch` and `allocate` is defined in `ConfigurationDevicePlugin` and `InstanceDevicePlugin` for Configuration and Instance respectively.
The CL and IL device plugin need to coordinate to sync up available resources so when a resource is claimed by CL (or IL) device plugin, the other device plugin is notified and re-caculate the available resources for itself. The `list_and_watch_message_sender` in `DevicePluginService` is used for notifying available resource change within a device plugin. For notifying resouce change across device plugins, a copy of `list_and_watch_message_sender` for each `DevicePluginService` are saved in `InstanceConfig`, where `usage_update_message_sender` holds the message sender of CL device plugin and IL device plugin's `list_and_watch_message_sender` is saved in each instance's `InstanceInfo`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The CL and IL device plugin need to coordinate to sync up available resources so when a resource is claimed by CL (or IL) device plugin, the other device plugin is notified and re-caculate the available resources for itself. The `list_and_watch_message_sender` in `DevicePluginService` is used for notifying available resource change within a device plugin. For notifying resouce change across device plugins, a copy of `list_and_watch_message_sender` for each `DevicePluginService` are saved in `InstanceConfig`, where `usage_update_message_sender` holds the message sender of CL device plugin and IL device plugin's `list_and_watch_message_sender` is saved in each instance's `InstanceInfo`.
The CL and IL device plugin need to coordinate to sync up available resources so when a resource is claimed by CL (or IL) device plugin, the other device plugin is notified and re-caculate the available resources for itself. The `list_and_watch_message_sender` in `DevicePluginService` is used for notifying available resource change within a device plugin. For notifying resource change across device plugins, a copy of `list_and_watch_message_sender` for each `DevicePluginService` are saved in `InstanceConfig`, where `usage_update_message_sender` holds the message sender of CL device plugin and IL device plugin's `list_and_watch_message_sender` is saved in each instance's `InstanceInfo`.


The `ConfiguratinDevicePlugin` defines the behavior of the configuration-level resources, combined with `DevicePluginService`, it forms a device plugin that advertises the configuration-level resources to the kubelet.

Similar to the Agent creates a `DevicePluginService` with `InstanceDevicePlugin` behavior for each discovered Instance. The Agent creates a `DevicePluginService` with `ConfigurationDevicePlugin` behavior for each Configuration when the Configuration is applied to the cluster. All `DevicePluginService` share the same structure that has `list_and_watch` and `allocate` for kubelet to call. The actual behavior of `list_and_watch` and `allocate` is defined in `ConfigurationDevicePlugin` and `InstanceDevicePlugin` for Configuration and Instance respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This line is a bit hard to read/understand

`akri-onvif-8120fe-1`, and `akri-onvif-a19705-0` to free. The Configuration device plugin then reduce the device availability to "0", "1" and "3".
If kubelet retry to claim "0" and "1", the Configuration device plugin will allow it by mapping "0" to `akri-onvif-8120fe-0` or `akri-onvif-8120fe-1`, "1" to `akri-onvif-a19705-0`.

The Configuration device plugin reports "0", "1", ... as virtual device ids in `list_and_watch` and determines the actual device slot to be used when `allocate` is called. The algorithm to map virtual device ids to actul device slot works on the allocation requests on a per-container basis that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Configuration device plugin reports "0", "1", ... as virtual device ids in `list_and_watch` and determines the actual device slot to be used when `allocate` is called. The algorithm to map virtual device ids to actul device slot works on the allocation requests on a per-container basis that:
The Configuration device plugin reports "0", "1", ... as virtual device ids in `list_and_watch` and determines the actual device slot to be used when `allocate` is called. The algorithm to map virtual device ids to actual device slot works on the allocation requests on a per-container basis that:

…ConfigurationDevicePlugin/InstanceDevicePlugin

Signed-off-by: Johnson Shih <jshih@microsoft.com>
@johnsonshih johnsonshih merged commit 2fcb2bb into project-akri:main Jul 20, 2023
@johnsonshih johnsonshih deleted the user/jshih/cl-implementation branch July 20, 2023 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants