Skip to content
This repository has been archived by the owner on Aug 14, 2020. It is now read-only.

spec: add ExitPolicy type in pod manifest. #500

Closed
wants to merge 1 commit into from

Conversation

yifan-gu
Copy link
Contributor

The optional ExitPolicy type defines the behavior of the pod when
the apps within it exit.

This PR adds 3 valid policies:

  • untilAll: The pod exits only when all the apps exit (no matter they
    are successful or not).
  • onAny: The pod exits when any of the apps exit (no matter they are
    successful or not).
  • onAnyFailure: The pod exits when any of the apps exit unsuccessfully.

@yifan-gu
Copy link
Contributor Author

Let's iterate on this PR specifically for the pod's exit policy :)

#276 #rkt/rkt#1461

/cc @iaguis @alban for the rkt related issue. Also cc @thockin @jonboulle @vbatts @xiang90 who has discussed on the original issue.

@@ -179,3 +180,4 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org
* **ports** (list of objects, optional) list of ports that SHOULD be exposed on the host.
* **name** (string, required, restricted to the [AC Name](#ac-name-type) formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
* **hostPort** (integer, required) port number on the host that will be mapped to the application port.
* **exitPolicy** (string, optional) a string that specify the exit policy of the pod, valid values are "untilAll" (the pod exits only when all the apps exit, no matter they are successful or not) , "onAny" (the pod exits when any of the apps exits either successfully or unsuccessfully), and "onAnyFailure" (the pod exits when any of the pod exits unsuccessfully).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss a) default behaviour if this is omitted, b) whether it's required/optional by implementations. Perhaps we need a short lifecycle section in ace.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is current behaviour of Rocket? As far as I understand, it's defined by systemd it invokes as stage1, but I tend to get lost in systemd docs.

FWIW, current work-in-progress multi-app branch of Jetpack exits when all processes exit ("untilAll") or when Jetpack itself is expicitly killed (SIGTERM/SIGINT/SIGQUIT).

With "onAnyFailure", what happens when one of the apps exits successfully? What happens when all of them exits successfully? Is restarting the app a possibility?

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

@jonboulle
Copy link
Contributor

/cc @philips @thockin @vbatts

@yifan-gu
Copy link
Contributor Author

@mpasternacki

What is current behaviour of Rocket? As far as I understand, it's defined by systemd it invokes as stage1, but I tend to get lost in systemd docs.

Currently rkt is sort like onAnyFailure, which if any apps exit with non-zero, the pod exits. Otherwise it waits for all apps, (if any app exit with zero, the other apps continue)

With "onAnyFailure", what happens when one of the apps exits successfully? What happens when all of them exits successfully? Is restarting the app a possibility?

With onAnyFailure, when one of the apps exits successfully, other apps continue to run until all of them exit successfully. The restarting is not defined in this scope.

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

This sounds like a valid use case, any thoughts @jonboulle ?

@jonboulle
Copy link
Contributor

With all possible combinations, I'd rather see it on app level. Say, "onExit" and "onFailure" fields that could be "nothing" (default), "restart", or "stopPod". This allows me to easily say that if my flaky webapp dies, just bring it up, but if Postgres server exits with a failure, it's probably serious and we're better off shutting down everything and waiting for somebody to inspect.

This is not unreasonable, I'm just a little wary of the lifecycle complexity it implies. (Actually, it should be fine to implement in rkt, but thinking about the spec more abstractly..)

The optional `ExitPolicy` type defines the behavior of the pod when
the apps within it exit.

This PR adds 3 valid policies:

- untilAll: The pod exits only when all the apps exit (no matter they
are successful or not).

- onAny: The pod exits when any of the apps exit (no matter they are
successful or not).

- onAnyFailure: The pod exits when any of the apps exit unsuccessfully.
@yifan-gu
Copy link
Contributor Author

/cc @iaguis @alban as if we are going down this way, we probably need another rework on service files... However while the app is being restarted, it's also considered stopped. I don't know if there is a way to differentiate an app being real stopped or stopped, but will be restarted

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org
* **ports** (list of objects, optional) list of ports that SHOULD be exposed on the host.
* **name** (string, required, restricted to the [AC Name](#ac-name-type) formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
* **hostPort** (integer, required) port number on the host that will be mapped to the application port.
* **exitPolicy** (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First: Kubernetes is assuming "untilAll", and we resisted adding this for lack of really concrete use-cases. My instinct is that it SOUNDS cool, but isn't that useful IRL. As far as I know we have no such equivalent internally. If any app container exits with failure, we know the pod is doomed to fail, but we let the other containers finish.

But I'm going to assume you have a concrete set of use-cases that justify this (you should write them down in this PR description) or else you would not be adding hypothetical complexity.

I just went to refresh on the state of the spec, and I realize there is no (what kubernetes calls) restartPolicy. Is this supposed to be an analog of that? I think it's interesting to contrast the approaches.

Kubernetes defines:

  • RestartAlways: Always restart app containers, regardless of exit code. The pod can only terminate in Failure if the runtime decides that it is not viable (hardware failure, machine drain, etc).
  • RestartOnFailure: Restart containers if and only if they exited with a non-zero code. The pod's terminal state is the worst-of any container's terminal state.
  • RestartNever: Never intentionally restart containers. The pod's terminal state is the worst-of any container's terminal state.

Superficially kube's RestartAlways feels the same as untilAll here. But here's the rub - the definition of untilAll doesn't actually say anything about restart. Is that part of the policy here or is that governed somewhere else that I am not seeing?

I'll not write much more now, because I have asked enough questions that I am probably attacking a straw man.

From a functional POV I think the concepts that matter to a user are "when does my container get restarted?" and "what does that mean for the fate of my pod?", but this only answers the latter, and only partially.

From an API usability POV I think it might be clearer to express these things "in the positive". E.g. I think a "RunPolicy" would be clearer (RunForever, RunToCompletion, RunOnce), and I sort of wish Kubernetes had done it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin Actually that's part of the plan to implement k8s' restart policy in rkt

Basically as we are using systemd to launch rkt pods, my original plan is to use systemd's restart policy, and combined with this pod exit policy.

But I see your point, and we actually shouldn't make something just to ease the implementation... I am thinking to change this to restart policy, and implement it in the runtime itself. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think a little bit more on this, and I found that a RestartPolicy would imply that the runtime is long running, otherwise if the runtime get's killed, nothing can enforce the restart policy (e.g. kill a pod after killing kubelet, pod is not restarted)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any policy around exit/restart needs something to babysit, right? There's not a way (AFAIK) to tell the OS to kill process B when process A dies (short of SIGCHLD which is a stretch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that with systemd service's dependency though.

My point is if the thing(or runtime) that launches the pod is not PID1, then the restart policy will not be enforced in some cases.
But maybe that's ok for now as we can limiting the scope of the restart policy, e.g. we assume the runtime is always there, and we don't consider what if the runtime gets killed.

People can just let PID1 to monitor the runtime, when the runtime fails, we treat it in a like a machine crash, and restart the runtime anyway(which will consequently restarts the pod).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've made the argument that "userspace is unreliable" in many arguments with Kernel folks, but their pushback (and rightly so) is "make it more reliable". There will always be corner cases, but there has to be a turtle at the bottom, and that turtle can't always be the kernel. In this case, I think kernel includes systemd - it really does fancy itself as important as the kernel.

So define the behavior you think is correct, and engineer towards a good enough answer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I am happy with changing this to restart policy. Waiting for other maintainers' feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that I am not saying you should change it to be like
kubernetes. Consider it in fresh light. I thing RestartPolicy is clearer
than ExitPolicy, but I think RunPolicy might be even better.

On Fri, Sep 25, 2015 at 4:40 PM, Yifan Gu notifications@github.com wrote:

In spec/pods.md
#500 (comment):

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org

  • ports (list of objects, optional) list of ports that SHOULD be exposed on the host.
    • name (string, required, restricted to the AC Name formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
    • hostPort (integer, required) port number on the host that will be mapped to the application port.
      +* exitPolicy (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:

Sure, I am happy with changing this to restart policy. Waiting for other
maintainers' feedback.


Reply to this email directly or view it on GitHub
https://github.com/appc/spec/pull/500/files#r40485441.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have something than empty here :) Any thoughts/votes on
ExitPolicy vs RestartPolicy vs RunPolicy?
@jonboulle @vbatts @philips ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecPolicy ?
On Oct 15, 2015 5:59 PM, "Yifan Gu" notifications@github.com wrote:

In spec/pods.md
#500 (comment):

@@ -179,3 +180,7 @@ JSON Schema for the Pod Manifest, conforming to [RFC4627](https://tools.ietf.org

  • ports (list of objects, optional) list of ports that SHOULD be exposed on the host.
    • name (string, required, restricted to the AC Name formatting) name of the port to be exposed on the host. This field is a key referencing by name ports specified in the Image Manifest(s) of the app(s) within this Pod Manifest; consequently, port names MUST be unique among apps within a pod.
    • hostPort (integer, required) port number on the host that will be mapped to the application port.
      +* exitPolicy (string, optional) a string that specify the exit policy of the pod, if left empty, then it's up to ACE to choose the default behavior. Valid values are:

I'd like to have something than empty here :) Any thoughts/votes on
ExitPolicy vs RestartPolicy vs RunPolicy?
@jonboulle https://github.com/jonboulle @vbatts
https://github.com/vbatts @philips https://github.com/philips ?


Reply to this email directly or view it on GitHub
https://github.com/appc/spec/pull/500/files#r42186397.

@yifan-gu yifan-gu mentioned this pull request Sep 30, 2015
5 tasks
@jonboulle jonboulle added this to the v0.7.2 milestone Oct 6, 2015
@yifan-gu
Copy link
Contributor Author

ExecPolicy sounds good to me. How do you guys think @jonboulle @thockin @philips

@yifan-gu
Copy link
Contributor Author

Ping?

@yifan-gu
Copy link
Contributor Author

For refreshing the memory.
Originally, this proposal is intended for implementing the kubelet's restart policy. As if this is implemented in runtime, then we can add the Restart in the service files[1] to match the kubernetes restart policy. However there's no exponential back-off, and if the app restarts, it will cause the other apps within the pod to be restarted.

So after today's discussion with @jonboulle @philips , we planned to pull the kubernetes restart policy into spec, and let the runtime (e.g. rkt) to handle how each app restarts. Also, this implies the restart of app A should not affect the running state of app B.

[1] The service files manage the pods started by kubelet, e.g. they all have ExecStart=/bin/rkt run ${pod_id}

@yifan-gu
Copy link
Contributor Author

Closed for #547

@jonboulle jonboulle closed this Dec 1, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants