Add reset annotation and CRI command #13

helsaawy · 2021-11-23T00:08:34Z

This PR supports reusing a container's scratch space by allowing pods and containers to be reset.
Containerd allows containers to be started and stopped repeatedly, but CRI explicitly enforces a state machine for containers (pkg/store/container/status.go) and pods (pkg/store/sandbox/status.go). Resetting keeps as much of a pod's or container's existing resources allocated and reused (namely the scratch space VHD) and recreate closed resources, as needed.

Two new gPRC commands were added to the CRI plugin to reset a pod back to the READY state from NOTREADY and a container back to CREATED from the EXITED state, respectively.
Additionally, a reset annotation (io.microsoft.cri.enablereset) was added to allow run (or start) to work on NOTREADY (or EXITED) pods (or containers).

Resetting a pod will restart its sandbox/pause container, recreate the HCN namespace and volatile directories, and reset the containers within it.
Resetting a container will change its state back to CREATED, and recreate IO pipes and directories, and update the Linux namespaces (if applicable).
In all cases, the scratch vhd (sandbox.vhdx) will be reused, preserving the disk state of the containers.

It is possible to reset a container and start it again without having to the stop the pod it is in.

Resetting state pod state back to READY was an easier and approach than adding in a CREATED state to them to mirror the container state machine. The latter approach involved changing how the CRI plugin handled and tracked pods internally, and possibly risk corrupting state it persisted on disk.

Associated PRs:
Currently, after a container is stopped, CRI issues a task delete command to the shim, but runhcsshim does not delete the task ID from the underlying pod objects internal state, even after the task itself is removed. This causes the pod to track a phantom task that no longer exists, and prevents the pod from creating a task with the same ID as a prior task that was removed.
The PR to fix it is: microsoft/hcsshim#1271.

For LCOW, the Linux gcs prematurely deletes the underlying runc container when a stop task request is issued, which causes the subsequent delete task request to fail, since it cannot find the container. That prevents the above issue, with removing the task from the shim's internal state, from occuring, as the delete task request errors half-way through.
The PR to fix it is: microsoft/hcsshim#1272

Tests of end to end reset functionality and persisting state across restarts are in draft PR, waiting on this and associated PRs to merged: microsoft/hcsshim#1273

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

dcantah · 2021-12-29T22:58:42Z

Oh I didn't even know this existed haha. For PRs to Kevins forks you'll have to @ us manually as cplat doesn't get auto tagged, I should've brought this up. @ambarve @kevpar @katiewasnothere @anmaxvl @msscotb

Adds tests for resetting tasks and containers explicitly with CRI plugin API, and implicitly using annotations and start/stop commands. PR relies on accompanying CRI PR (kevpar/cri#13) being merged. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>

pkg/server/sandbox_reset_windows.go

pkg/server/container_start.go

pkg/server/sandbox_run_windows.go

anmaxvl

lgtm

This commit supports restarting containers and pods using CRI: kevpar/cri#13 This PR allows the service to remove tasks from a pods workloadTasks map after the task and associated execs have been shut down during in a delete task request, allowing for proper deletion of the task and freeing up associated resources when received by the service. Namely, this frees up the deleted task's ID, so that new tasks can be created with that same ID after the original task has been deleted (ie, so a task can be restarted within a running pod). A DeleteTask function was added to the shimPod interface to implement most of this functionality. Additionally, the service, in deleteInternal, resets its internal reference to the init task (shimPod or shimTask) reference, taskOrPod, if the delete is issued for the init task, as a marker that the service is no longer operational and to prevent future operations from occurring. Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

hack/update-proto.sh

pkg/registrar/registrar_test.go

pkg/server/container_reset_windows.go

katiewasnothere · 2022-02-10T08:11:39Z

pkg/api/v1/api.proto

+ rpc ResetPodSandbox(ResetPodSandboxRequest) returns (ResetPodSandboxResponse) {}
+ // ResetContainer resets a stopped container back to the created state, keeping
+ // its scratch space untouched.
+ // This call is idempotent, and must not return an error if the container is already
+ // in the created state.
+ rpc ResetContainer(ResetContainerRequest) returns (ResetContainerResponse) {}


Why do we need these new API calls if we can accomplish what we want in a call to StartContainer or RunPodSandbox?

katiewasnothere · 2022-02-10T08:15:55Z

pkg/store/sandbox/status.go

+// | | +----+----+ | |
+// | | | | |
+// | | Reset | Stop/Exit | |
+// | | | | |
+// | | +----v----+ | |
+// | | | <---------+ +----v----+
+// | +------+ NOTREADY| | |


Not sure I follow what the change to this chart is doing?

I added a transition from the not ready state (stopped) back to ready (running).
It does look pretty confusing though, since the Ready --> Ready transition by way of PortForward convolutes the new arrow.

pkg/server/sandbox_reset_windows.go

Added annotation (`io.microsoft.cri.allowreset`) to allow run (for pods) and start (for containers) to reset previously created pods or containers (respectively). Additionally, there is are CRI grpc commands to reset the state of stopped pods and containers. Reset will cause a stopped pod to run again and start its sandbox/pause container, and recreate its network namespace, as well as update the `IP`, `NetNSPath`, `CNIResult`, and `Container` fields. For containers, it will reset its state back to `CREATED`, recreating IO pipes and other resources as needed. In all cases, the scratch VHD (`sandbox.vhdx`) will be reused, preserving the disk state of the containers. Resetting a pod will reset its containers as well. Resetting state was an easier approach than adding in a `CREATED` state to pods, so that they would have similar creation and start semantics to containers. The latter approach involved changing how most CRI pods commands functioned. Added combination `LoadNetNS` and `Remove` function for HCN namespaces: function skips duplicate `hcn.GetNamespaceByID` and error checking. Combined namespace removal with recreation, as they are likely always happen together. Rather than flushing uVM file buffers on shutdown to ensure VHD logs SCSI unmounting for container scratch space from the uVM and removes the corresponding junction points, code now removes old uVM scratch space/snapshot and creates a new one when resetting hypervisor WCOW pods. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>

katiewasnothere · 2022-03-02T20:24:02Z

pkg/server/sandbox_reset_windows.go

+ state := sandbox.Status.Get().State
+ switch state {
+ case sandboxstore.StateReady:
+ entity.Debugf("sandbox is already running")
+ return nil
+ case sandboxstore.StateNotReady:
+ default:
+ return errors.Errorf("sandbox %q is in invalid state %q", id, state)
+ }


So I know we don't have a corresponding "Created" state for pods, but does this mean we have no way of telling if a pod has actually run already and exited when we go to reset it? Could that be a problem for us?

Ish: on pod "creation" (start), the underlying task is automatically started and run, and the pod goes into the ready state.
So, if a pod exists and is not ready, the assumption is that either it was previously running (ready) and was stopped or was loaded (after a containerd/CRI restart) in a stopped or unknown state. From what I can tell, CRI guarantees that pods that fail to start their tasks when created are deleted automatically.

katiewasnothere

Couple comments but otherwise looks good to me

Adds tests for resetting tasks and containers explicitly with CRI plugin API, and implicitly using annotations and start/stop commands. PR relies on accompanying CRI PR (kevpar/cri#13) being merged. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>

* Tests for task and sandbox reset/restart Adds tests for resetting tasks and containers explicitly with CRI plugin API, and implicitly using annotations and start/stop commands. PR relies on accompanying CRI PR (kevpar/cri#13) being merged. Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com> * PR: wrappers, annotation, comments Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>

helsaawy force-pushed the he/restart branch from d007502 to cfc19e8 Compare November 29, 2021 20:23

helsaawy force-pushed the he/restart branch 5 times, most recently from 73388ac to eff8a6f Compare December 16, 2021 20:28

helsaawy changed the title ~~added annotation to allow for restart~~ added reset annotation and CRI command Dec 16, 2021

helsaawy force-pushed the he/restart branch from eff8a6f to b3dbf2d Compare December 22, 2021 01:58

helsaawy marked this pull request as ready for review December 27, 2021 20:28

helsaawy mentioned this pull request Dec 27, 2021

Add proper deletion of workloads tasks in shim microsoft/hcsshim#1234

Closed

helsaawy changed the title ~~added reset annotation and CRI command~~ Add reset annotation and CRI command Dec 30, 2021

helsaawy mentioned this pull request Jan 10, 2022

Tests for task and sandbox reset/restart microsoft/hcsshim#1273

Merged

helsaawy force-pushed the he/restart branch from b3dbf2d to b87fa7b Compare January 18, 2022 00:56

anmaxvl reviewed Jan 20, 2022

View reviewed changes

pkg/server/sandbox_reset_windows.go Outdated Show resolved Hide resolved

pkg/server/container_start.go Outdated Show resolved Hide resolved

pkg/server/sandbox_run_windows.go Show resolved Hide resolved

pkg/server/sandbox_run_windows.go Outdated Show resolved Hide resolved

anmaxvl self-assigned this Jan 20, 2022

helsaawy force-pushed the he/restart branch 2 times, most recently from 1e33171 to f63c437 Compare January 25, 2022 04:18

anmaxvl approved these changes Jan 26, 2022

View reviewed changes

helsaawy mentioned this pull request Feb 1, 2022

Delete shim workloads tasks in pod. microsoft/hcsshim#1271

Merged

msscotb assigned katiewasnothere Feb 9, 2022