sandbox: don't exit if an app fails #3478

s-urbaniak · 2016-12-14T15:41:30Z

Currently the sandbox follows the same semantic rule as an immutable
pod. As soon as an app fails, it quits immediately.

This behavior does not hold for mutable sandboxes. As soon as an app
fails the sandbox needs to stay alive but track the exit code of the
failed app appropriately.

Also in case of restarts the /rkt/status files need to be readjusted.

This fixes it.

Also fixes #3472

TODOs:

functional test which starts two apps
functional test which starts a faulty app and an app
functional test which starts only faulty apps

s-urbaniak · 2016-12-14T16:01:02Z

/cc @euank

euank · 2016-12-16T02:41:02Z

Add to the list of tests:

adds, starts, and removes an app.

Although with this change rkt app status ... --app=.. correctly shows exited, rkt app rm .... --app=... can't remove the application with this error. I didn't dig into it further, but my naive guess is that RemainAfterExit is turning the status of the app into stopped instead of inactive, or maybe inactive wasn't ever right.

s-urbaniak · 2016-12-19T15:08:23Z

gah, thanks for the follow-up, I need to implement more functional tests :-/

s-urbaniak · 2016-12-19T15:58:41Z

@euank while I am writing the tests, do you mind to provide the sequence that reproduces the issue for you?

At least the following works for me:

$ sudo -E rkt app add --debug $pod docker://debian:sid --name=debian --exec=/bin/sh -- -c 'while true; do echo hallo; sleep .5; done'
stage0: locking pod manifest
stage0: Loading image sha512-e68319869a62c391b77dbc77ec0db84290c9803718fbe5df89a230d08ee5bbc0
stage0: Writing image manifest
stage0: adding app to sandbox
$ sudo -E rkt app start --debug $pod --app=debian
$ sudo -E rkt app stop --debug $pod --app=debian
$ sudo -E rkt app rm --debug $pod --app=debian
stage0: locking sandbox manifest

euank · 2016-12-19T19:57:19Z

It looks like it only happens if the app crashes, not if rkt stops it.

@s-urbaniak I can repro it with the same commands as #3472 for setup, and then the following:

$ rkt app status $(cat /tmp/uuid) --app=busybox           
name=busybox
state=exited
image_id=sha512-060ab846247313a0aa120aa60761a2beb2080e1392c2969631f8caa7aa7d1597
created_at=2016-12-19 11:51:45.373342304 -0800 PST
started_at=2016-12-19 11:51:55.273725672 -0800 PST
finished_at=2016-12-19 11:51:55.493734411 -0800 PST
exit_code=127
$ rkt app rm $(cat /tmp/uuid) --app=busybox    
rm: error removing app: error executing stage1 entrypoint: app-rm: cleanup error: app "busybox" is still running
$ rkt app stop $(cat /tmp/uuid) --app=busybox
$ echo $?
0
$ rkt app rm $(cat /tmp/uuid) --app=busybox    
rm: error removing app: error executing stage1 entrypoint: app-rm: cleanup error: app "busybox" is still running

s-urbaniak · 2017-01-09T08:53:44Z

@euank 👍 now I can reproduce this now, thanks a lot. Current mode: investigating, fixing, writing tests.

Currently the sandbox follows the same semantic rule as an immutable pod. As soon as an app fails, it quits immediately. This behavior does not hold for mutable sandboxes. As soon as an app fails the sandbox needs to stay alive but track the exit code of the failed app appropriately. Also in case of restarts the /rkt/status files need to be readjusted. This fixes it. Also fixes rkt#3472

Currently if a sandbox app failed, it cannot be removed because it is assumed to be still running. This fixes it.

Currently if `rkt app --debug` is enabled we get stage1 debug output only if an error occurs. This fixes it by printing debug output always if it is enabled.

Currently we have mixture of file and directory conventions for app subcommands in stage0 and stage1. This fixes it by consolidating them following Go standards (underscore delimiters in filenames).

s-urbaniak · 2017-01-09T16:25:23Z

@euank still writing the final functional tests, but do you mind to quickly verify if removing failed apps is fixed now?

This simplifies the sandbox tests.

euank · 2017-01-09T23:04:57Z

Indeed, works as far as I can tell, thanks @s-urbaniak

This adds more functional tests for the app sandbox. It adds a test which starts/stops an app within a running sandbox and another test which adds three different kinds of apps and evaluates their respective status in rkt.

squeed · 2017-01-10T17:04:51Z

This PR will expose some logic flaws in some status code. For example, the API service assumes that if a pod is running, all apps are running.
I'll file an issue to fix this.

squeed · 2017-01-10T19:21:18Z

I like that you cleaned up some of the stage0 files. Did you change any of the behavior, or just shuffle code?
Just wondering if we should review more carefully the new files.

s-urbaniak · 2017-01-10T19:30:21Z

@squeed yes, the first two commits are behavioral changes (b308a80 and 0282d80), the rest is code cleanup + adding tests.

Admittedly I can make it separate PRs.

lucab

LGTM (one question inline, but really just a nit and I'm fine to land it as is).

lucab · 2017-01-11T11:15:32Z

stage1/app-rm/app-rm.go

@@ -107,7 +107,10 @@ func cleanupStage0(appName *types.ACName, enterCmd []string) error {
 	// rely only on the output, since is-active returns non-zero for inactive units
 	out, _ := cmd.Output()

-	if string(out) != "inactive\n" {
+	switch string(out) {
+	case "failed\n":


Is there a reason why we don't TrimRight here?

No, just pure sloppiness, I will address this in a follow-up.

s-urbaniak added the area/cri label Dec 14, 2016

s-urbaniak added this to the v1.22.0 milestone Dec 14, 2016

s-urbaniak added component/stage1 do not merge labels Dec 14, 2016

s-urbaniak force-pushed the sandbox-stay-alive-3472 branch 2 times, most recently from ebb3714 to b32937c Compare December 14, 2016 15:59

lucab modified the milestones: v1.23.0, v1.22.0 Dec 19, 2016

lucab mentioned this pull request Jan 5, 2017

CRI: unit failing leads to undesirable sandbox failure #3472

Closed

s-urbaniak added 4 commits January 9, 2017 10:04

app-rm: remove app if it failed

0282d80

Currently if a sandbox app failed, it cannot be removed because it is assumed to be still running. This fixes it.

stage0/common: print stage1 debug output

3116615

Currently if `rkt app --debug` is enabled we get stage1 debug output only if an error occurs. This fixes it by printing debug output always if it is enabled.

stage0/stage1 app: unify filenames

b4d2b92

Currently we have mixture of file and directory conventions for app subcommands in stage0 and stage1. This fixes it by consolidating them following Go standards (underscore delimiters in filenames).

s-urbaniak force-pushed the sandbox-stay-alive-3472 branch from b32937c to 12f92f4 Compare January 9, 2017 15:58

tests/sandbox: simplify test

1a72e2a

This simplifies the sandbox tests.

s-urbaniak force-pushed the sandbox-stay-alive-3472 branch from 12f92f4 to 1a72e2a Compare January 9, 2017 16:26

tests/sandbox: add tests for start/remove and multiple apps

a11d199

This adds more functional tests for the app sandbox. It adds a test which starts/stops an app within a running sandbox and another test which adds three different kinds of apps and evaluates their respective status in rkt.

s-urbaniak force-pushed the sandbox-stay-alive-3472 branch from 869189b to a11d199 Compare January 10, 2017 15:22

s-urbaniak removed the do not merge label Jan 10, 2017

s-urbaniak requested review from lucab and squeed January 10, 2017 15:46

squeed mentioned this pull request Jan 10, 2017

api service: update status logic for sandbox #3524

Open

lucab approved these changes Jan 11, 2017

View reviewed changes

squeed approved these changes Jan 11, 2017

View reviewed changes

s-urbaniak merged commit 9f7cc9c into rkt:master Jan 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sandbox: don't exit if an app fails #3478

sandbox: don't exit if an app fails #3478

s-urbaniak commented Dec 14, 2016 •

edited

Loading

s-urbaniak commented Dec 14, 2016

euank commented Dec 16, 2016 •

edited by s-urbaniak

Loading

s-urbaniak commented Dec 19, 2016

s-urbaniak commented Dec 19, 2016

euank commented Dec 19, 2016 •

edited

Loading

s-urbaniak commented Jan 9, 2017

s-urbaniak commented Jan 9, 2017

euank commented Jan 9, 2017

squeed commented Jan 10, 2017

squeed commented Jan 10, 2017

s-urbaniak commented Jan 10, 2017

lucab left a comment •

edited

Loading

lucab Jan 11, 2017 •

edited

Loading

s-urbaniak Jan 11, 2017

sandbox: don't exit if an app fails #3478

sandbox: don't exit if an app fails #3478

Conversation

s-urbaniak commented Dec 14, 2016 • edited Loading

s-urbaniak commented Dec 14, 2016

euank commented Dec 16, 2016 • edited by s-urbaniak Loading

s-urbaniak commented Dec 19, 2016

s-urbaniak commented Dec 19, 2016

euank commented Dec 19, 2016 • edited Loading

s-urbaniak commented Jan 9, 2017

s-urbaniak commented Jan 9, 2017

euank commented Jan 9, 2017

squeed commented Jan 10, 2017

squeed commented Jan 10, 2017

s-urbaniak commented Jan 10, 2017

lucab left a comment • edited Loading

Choose a reason for hiding this comment

lucab Jan 11, 2017 • edited Loading

Choose a reason for hiding this comment

s-urbaniak Jan 11, 2017

Choose a reason for hiding this comment

s-urbaniak commented Dec 14, 2016 •

edited

Loading

euank commented Dec 16, 2016 •

edited by s-urbaniak

Loading

euank commented Dec 19, 2016 •

edited

Loading

lucab left a comment •

edited

Loading

lucab Jan 11, 2017 •

edited

Loading