Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reset annotation and CRI command #13

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

helsaawy
Copy link
Collaborator

@helsaawy helsaawy commented Nov 23, 2021

This PR supports reusing a container's scratch space by allowing pods and containers to be reset.
Containerd allows containers to be started and stopped repeatedly, but CRI explicitly enforces a state machine for containers (pkg/store/container/status.go) and pods (pkg/store/sandbox/status.go). Resetting keeps as much of a pod's or container's existing resources allocated and reused (namely the scratch space VHD) and recreate closed resources, as needed.

Two new gPRC commands were added to the CRI plugin to reset a pod back to the READY state from NOTREADY and a container back to CREATED from the EXITED state, respectively.
Additionally, a reset annotation (io.microsoft.cri.enablereset) was added to allow run (or start) to work on NOTREADY (or EXITED) pods (or containers).

Resetting a pod will restart its sandbox/pause container, recreate the HCN namespace and volatile directories, and reset the containers within it.
Resetting a container will change its state back to CREATED, and recreate IO pipes and directories, and update the Linux namespaces (if applicable).
In all cases, the scratch vhd (sandbox.vhdx) will be reused, preserving the disk state of the containers.

It is possible to reset a container and start it again without having to the stop the pod it is in.

Resetting state pod state back to READY was an easier and approach than adding in a CREATED state to them to mirror the container state machine. The latter approach involved changing how the CRI plugin handled and tracked pods internally, and possibly risk corrupting state it persisted on disk.

Associated PRs:
Currently, after a container is stopped, CRI issues a task delete command to the shim, but runhcsshim does not delete the task ID from the underlying pod objects internal state, even after the task itself is removed. This causes the pod to track a phantom task that no longer exists, and prevents the pod from creating a task with the same ID as a prior task that was removed.
The PR to fix it is: microsoft/hcsshim#1271.

For LCOW, the Linux gcs prematurely deletes the underlying runc container when a stop task request is issued, which causes the subsequent delete task request to fail, since it cannot find the container. That prevents the above issue, with removing the task from the shim's internal state, from occuring, as the delete task request errors half-way through.
The PR to fix it is: microsoft/hcsshim#1272

Tests of end to end reset functionality and persisting state across restarts are in draft PR, waiting on this and associated PRs to merged: microsoft/hcsshim#1273

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com

@helsaawy helsaawy force-pushed the he/restart branch 5 times, most recently from 73388ac to eff8a6f Compare December 16, 2021 20:28
@helsaawy helsaawy changed the title added annotation to allow for restart added reset annotation and CRI command Dec 16, 2021
@helsaawy helsaawy marked this pull request as ready for review December 27, 2021 20:28
@dcantah
Copy link
Collaborator

dcantah commented Dec 29, 2021

Oh I didn't even know this existed haha. For PRs to Kevins forks you'll have to @ us manually as cplat doesn't get auto tagged, I should've brought this up. @ambarve @kevpar @katiewasnothere @anmaxvl @msscotb

@helsaawy helsaawy changed the title added reset annotation and CRI command Add reset annotation and CRI command Dec 30, 2021
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Jan 10, 2022
Adds tests for resetting tasks and containers explicitly with CRI
plugin API, and implicitly using annotations and start/stop commands.

PR relies on accompanying CRI PR (kevpar/cri#13) being merged.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
pkg/server/sandbox_reset_windows.go Outdated Show resolved Hide resolved
pkg/server/container_start.go Outdated Show resolved Hide resolved
pkg/server/sandbox_run_windows.go Show resolved Hide resolved
pkg/server/sandbox_run_windows.go Outdated Show resolved Hide resolved
@anmaxvl anmaxvl self-assigned this Jan 20, 2022
@helsaawy helsaawy force-pushed the he/restart branch 2 times, most recently from 1e33171 to f63c437 Compare January 25, 2022 04:18
Copy link
Collaborator

@anmaxvl anmaxvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

helsaawy added a commit to microsoft/hcsshim that referenced this pull request Feb 8, 2022
This commit supports restarting containers and pods using CRI:
kevpar/cri#13

This PR allows the service to remove tasks from a pods workloadTasks map after the task and associated execs have been shut down during in a delete task request, allowing for proper deletion of the task and freeing up associated resources when
received by the service. Namely, this frees up the deleted task's ID, so that new tasks can be created with that same ID after the original task has been deleted (ie, so a task can be restarted within a running pod).

A DeleteTask function was added to the shimPod interface to implement most of this functionality.

Additionally, the service, in deleteInternal, resets its internal reference to the init task (shimPod or shimTask) reference, taskOrPod, if the delete is issued for the init task, as a marker that the service is no longer operational and to prevent future operations from occurring.

Signed-off-by: Hamza El-Saawy hamzaelsaawy@microsoft.com
Comment on lines +24 to +29
rpc ResetPodSandbox(ResetPodSandboxRequest) returns (ResetPodSandboxResponse) {}
// ResetContainer resets a stopped container back to the created state, keeping
// its scratch space untouched.
// This call is idempotent, and must not return an error if the container is already
// in the created state.
rpc ResetContainer(ResetContainerRequest) returns (ResetContainerResponse) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these new API calls if we can accomplish what we want in a call to StartContainer or RunPodSandbox?

Comment on lines +41 to +47
// | | +----+----+ | |
// | | | | |
// | | Reset | Stop/Exit | |
// | | | | |
// | | +----v----+ | |
// | | | <---------+ +----v----+
// | +------+ NOTREADY| | |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow what the change to this chart is doing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a transition from the not ready state (stopped) back to ready (running).
It does look pretty confusing though, since the Ready --> Ready transition by way of PortForward convolutes the new arrow.

Added annotation (`io.microsoft.cri.allowreset`) to allow run (for pods)
and start (for containers) to reset previously created pods or
containers (respectively).
Additionally, there is are CRI grpc commands to reset the state of
stopped pods and containers. Reset will cause a stopped pod to run
again and start its sandbox/pause container, and recreate its network
namespace, as well as update the `IP`, `NetNSPath`, `CNIResult`,
and `Container` fields.

For containers, it will
reset its state back to `CREATED`, recreating IO pipes and other
resources as needed. In all cases, the scratch VHD (`sandbox.vhdx`) will
be reused, preserving the disk state of the containers.
Resetting a pod will reset its containers as well.

Resetting state was an easier approach than adding in a `CREATED` state
to pods, so that they would have similar creation and start semantics to
containers. The latter approach involved changing how most CRI pods
commands functioned.

Added combination `LoadNetNS` and `Remove` function for HCN namespaces:
function skips duplicate `hcn.GetNamespaceByID` and error checking.
Combined namespace removal with recreation, as they are likely always
happen together.

Rather than flushing uVM file buffers on shutdown to ensure VHD logs
SCSI unmounting for container scratch space from the uVM and removes
the corresponding junction points, code now removes old uVM scratch
space/snapshot and creates a new one when resetting hypervisor WCOW
pods.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Comment on lines +45 to +53
state := sandbox.Status.Get().State
switch state {
case sandboxstore.StateReady:
entity.Debugf("sandbox is already running")
return nil
case sandboxstore.StateNotReady:
default:
return errors.Errorf("sandbox %q is in invalid state %q", id, state)
}
Copy link
Collaborator

@katiewasnothere katiewasnothere Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I know we don't have a corresponding "Created" state for pods, but does this mean we have no way of telling if a pod has actually run already and exited when we go to reset it? Could that be a problem for us?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ish: on pod "creation" (start), the underlying task is automatically started and run, and the pod goes into the ready state.
So, if a pod exists and is not ready, the assumption is that either it was previously running (ready) and was stopped or was loaded (after a containerd/CRI restart) in a stopped or unknown state. From what I can tell, CRI guarantees that pods that fail to start their tasks when created are deleted automatically.

Copy link
Collaborator

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments but otherwise looks good to me

@helsaawy helsaawy merged commit 4c5c347 into kevpar:windows_port Mar 2, 2022
@helsaawy helsaawy deleted the he/restart branch March 2, 2022 21:07
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Mar 2, 2022
Adds tests for resetting tasks and containers explicitly with CRI
plugin API, and implicitly using annotations and start/stop commands.

PR relies on accompanying CRI PR (kevpar/cri#13) being merged.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to helsaawy/hcsshim that referenced this pull request Apr 19, 2022
Adds tests for resetting tasks and containers explicitly with CRI
plugin API, and implicitly using annotations and start/stop commands.

PR relies on accompanying CRI PR (kevpar/cri#13) being merged.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
helsaawy added a commit to microsoft/hcsshim that referenced this pull request May 2, 2022
* Tests for task and sandbox reset/restart

Adds tests for resetting tasks and containers explicitly with CRI
plugin API, and implicitly using annotations and start/stop commands.

PR relies on accompanying CRI PR (kevpar/cri#13) being merged.

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>

* PR: wrappers, annotation, comments

Signed-off-by: Hamza El-Saawy <hamzaelsaawy@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants