notification events from CRI runtime #39

ffromani · 2021-10-01T08:02:58Z

The major container runtimes, containerd and cri-o, both offer extensive hooking mechanism we can leverage to get container lifecycle events while the podresources API catches up. It could work like this:

the RTE opens a notification socket on a well known location
RTE offer a client program to be called from the CRI runtime hooks
When a noteworthy container lifeycle event (create and remove; what else?) happens, the hook calls the client to notify RTE
Depending on further design decisions, RTE either trusts the client program with the data it received from the notification socket, or performs a full GetAllocatable/List poll to update its status.

The text was updated successfully, but these errors were encountered:

swatisehgal · 2021-10-01T09:16:50Z

This is a good idea and definitely worth exploring! Currently, I only see PreStart and PostStop hooks here which should take care of the container lifecycle event you mentioned above (create and remove). In addition to this, I can think of update lifecycle event where resources are updated but that could be phase 2 of this work.

AlexeyPerevalov · 2021-10-05T07:35:09Z

@swatisehgal, @fromanirh my college interested in this, but for solving issues with PLEG.
He has a draft at https://github.com/ikeeip/containerd/tree/cri_subscribe_events

ffromani · 2021-10-05T07:36:43Z

@AlexeyPerevalov very nice! thanks for letting us know. I'll surely have a look ASAP.

ffromani · 2021-10-27T08:34:13Z

Brainstorming a bit more of implementation details.
Prerequisite:

RTE must be the listener (e.g. the one creating the endpoint and waiting for notification)
The protocol must be as simple as possible
It should be possible to notify events using plain shell scripts - to make the job of the hook writers as simple as possible
No pod/container-details data should be passed alongside as notifications. E.g. not the pod spec.
We should leave the option open to be forward compatible and to be able to send the container resources alongside the notification in the future

Hence the implementation could look like

RTE gains an option to enable this feature
If enabled, RTE creates a fifo (not a unix domain socket, see requirements 2+3) optionally with a user-supplied location (let's have a sane default)
RTE adds an event loop to read from the fifo
Messages in the fifo are fixed size, considering we target amd64 I'd say exactly 8 bytes
We don't define actual content of the messages now. The content of each message is discarded; we only get the notification, and the notification triggers a poll event as usual.
Because of the point above, each message can be just "0" x 8 (eight "0" chars)
Throttling (if ever a concern) will be done in the server side (aka RTE). meaning clients can just try to write the message in the fifo, discarding (maybe just logging) any error

ffromani · 2021-10-27T10:26:11Z

Even simpler implementation discussed offline with @cynepco3hahue

Start the ds with some host directory /path/to/whatever/rte
the ds pod will create a new file under /path/to/whatever/rte say /path/to/whatever/rte/notify
the hooks will touch the file each time a (guaranteed) is pod created or deleted
the fsnotifier under the ds pod will watch for CHMOD event

I think this is actually better than my proposal because leaving room for future expansion is a double edged sword. The real path forward is to make the podresources kubelet api watchable.

ffromani · 2021-10-27T10:53:18Z

Even simpler implementation discussed offline with @cynepco3hahue
1. Start the ds with some host directory `/path/to/whatever/rte`

2. the ds pod will create a new file under `/path/to/whatever/rte` say `/path/to/whatever/rte/notify`

3. the hooks will touch the file each time a (guaranteed) is pod created or deleted

4. the fsnotifier under the ds pod will watch for CHMOD event
I think this is actually better than my proposal because leaving room for future expansion is a double edged sword. The real path forward is to make the podresources kubelet api watchable.

tentative implementation: #54

ffromani · 2021-11-18T14:15:45Z

implemented in #54 (merged)

swatisehgal mentioned this issue Oct 26, 2021

Topology aware scheduler plugin in kube-scheduler kubernetes/enhancements#2044

Closed

ffromani closed this as completed Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notification events from CRI runtime #39

notification events from CRI runtime #39

ffromani commented Oct 1, 2021

swatisehgal commented Oct 1, 2021

AlexeyPerevalov commented Oct 5, 2021

ffromani commented Oct 5, 2021

ffromani commented Oct 27, 2021

ffromani commented Oct 27, 2021

ffromani commented Oct 27, 2021

ffromani commented Nov 18, 2021

notification events from CRI runtime #39

notification events from CRI runtime #39

Comments

ffromani commented Oct 1, 2021

swatisehgal commented Oct 1, 2021

AlexeyPerevalov commented Oct 5, 2021

ffromani commented Oct 5, 2021

ffromani commented Oct 27, 2021

ffromani commented Oct 27, 2021

ffromani commented Oct 27, 2021

ffromani commented Nov 18, 2021