Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] forcedly remove container failed issue #815

Closed
Letty5411 opened this issue Mar 6, 2018 · 5 comments
Closed

[bug] forcedly remove container failed issue #815

Letty5411 opened this issue Mar 6, 2018 · 5 comments
Assignees
Labels
containerd-related kind/bug This is bug report for project

Comments

@Letty5411
Copy link
Contributor

Letty5411 commented Mar 6, 2018

Ⅰ. Issue Description

In travis CI, https://travis-ci.org/alibaba/pouch/builds/349283089, this is a failure when remove container forcedly, it's the first time to see this error and didn't reproduce in the next CI run.
Error message is as following:

Command:  /usr/local/bin/pouch rm -f test-run-with-directory-device
ExitCode: 1
Error:    exit status 1
Stdout:   
Stderr:   Error: failed to remove container: {"message":"failed to destory container: af8caff67a34e82e9e9b7a266c6a27506f969ceb2493ac6ed72da90720185fcb: failed to delete container: cannot delete running task af8caff67a34e82e9e9b7a266c6a27506f969ceb2493ac6ed72da90720185fcb: failed precondition"}

also see in:

FAIL: /go/src/github.com/alibaba/pouch/test/cli_run_test.go:54: PouchRunSuite.TestRunPrintHi
/go/src/github.com/alibaba/pouch/test/cli_run_test.go:63:
    command.PouchRun("rm", "-f", name).Assert(c, icmd.Success)
/go/src/github.com/alibaba/pouch/vendor/github.com/gotestyourself/gotestyourself/icmd/command.go:61:
    t.Fatalf("at %s:%d - %s\n", filepath.Base(file), line, err.Error())
... Error: at cli_run_test.go:63 - 
Command:  /usr/local/bin/pouch rm -f test-run-print-hi
ExitCode: 1
Error:    exit status 1
Stdout:   
Stderr:   Error: failed to remove container: {"message":"failed to destory container: 216f7d188620ac9f54c312e16e71238f6e34b0994570fdd2bedc26ce5ce0da74: failed to delete container: cannot delete running task 216f7d188620ac9f54c312e16e71238f6e34b0994570fdd2bedc26ce5ce0da74: failed precondition"}
FAIL: /go/src/github.com/alibaba/pouch/test/cli_run_test.go:364: PouchRunSuite.TestRunWithLocalVolume
/go/src/github.com/alibaba/pouch/test/cli_run_test.go:380:
    command.PouchRun("rm", "-f", name).Assert(c, icmd.Success)
/go/src/github.com/alibaba/pouch/vendor/github.com/gotestyourself/gotestyourself/icmd/command.go:61:
    t.Fatalf("at %s:%d - %s\n", filepath.Base(file), line, err.Error())
... Error: at cli_run_test.go:380 - 
Command:  /usr/local/bin/pouch rm -f TestRunWithLocalVolume
ExitCode: 1
Error:    exit status 1
Stdout:   
Stderr:   Error: failed to remove container: {"message":"failed to destory container: 7fcea8f4b12eb1fd336153b2f65a1552b164bdf3223f3c0c206ce66741021b1a: failed to delete container: cannot delete running task 7fcea8f4b12eb1fd336153b2f65a1552b164bdf3223f3c0c206ce66741021b1a: failed precondition"}
----------------------------------------------------------------------
FAIL: /go/src/github.com/alibaba/pouch/test/cli_run_test.go:335: PouchRunSuite.TestRunWithoutCapability
/go/src/github.com/alibaba/pouch/test/cli_run_test.go:342:
    command.PouchRun("rm", "-f", name).Assert(c, icmd.Success)
/go/src/github.com/alibaba/pouch/vendor/github.com/gotestyourself/gotestyourself/icmd/command.go:61:
    t.Fatalf("at %s:%d - %s\n", filepath.Base(file), line, err.Error())
... Error: at cli_run_test.go:342 - 
Command:  /usr/local/bin/pouch rm -f run-capability
ExitCode: 1
Error:    exit status 1
Stdout:   
Stderr:   Error: failed to remove container: {"message":"failed to destory container: 82ec9e2f279f6d0e5a6434d522847794c8cb6678c2fb7439901e051aede11176: failed to delete container: cannot delete running task 82ec9e2f279f6d0e5a6434d522847794c8cb6678c2fb7439901e051aede11176: failed precondition"}

This is a flaky test I think. It almost happens everyday in travisCI.

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Ⅵ. Environment:

  • pouch version (use pouch version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@pouchrobot pouchrobot added the kind/bug This is bug report for project label Mar 6, 2018
@allencloud
Copy link
Collaborator

allencloud commented Mar 6, 2018

I found that the failure is due to failed precondition which has an explanation of :

	// FailedPrecondition indicates operation was rejected because the
	// system is not in a state required for the operation's execution.
	// For example, directory to be deleted may be non-empty, an rmdir
	// operation is applied to a non-directory, etc.
	//
	// A litmus test that may help a service implementor in deciding
	// between FailedPrecondition, Aborted, and Unavailable:
	//  (a) Use Unavailable if the client can retry just the failing call.
	//  (b) Use Aborted if the client should retry at a higher-level
	//      (e.g., restarting a read-modify-write sequence).
	//  (c) Use FailedPrecondition if the client should not retry until
	//      the system state has been explicitly fixed. E.g., if an "rmdir"
	//      fails because the directory is non-empty, FailedPrecondition
	//      should be returned since the client should not retry unless
	//      they have first fixed up the directory by deleting files from it.
	//  (d) Use FailedPrecondition if the client performs conditional
	//      REST Get/Update/Delete on a resource and the resource on the
	//      server does not match the condition. E.g., conflicting
	//      read-modify-write on the same resource.
	FailedPrecondition Code = 9

And I am afraid that this is an accidental issue, since the weird status of underlying component. @Letty5411 https://github.com/grpc/grpc-go/blob/master/codes/codes.go#L77-L96

@allencloud
Copy link
Collaborator

Maybe we can use a script to execute 1000 times to see how often it happens.

If it happens in the 1000 times, we can do the testing thing by comparing to another pouch which is using container 1.0.2 rather than 1.0.0.

If it still happens, we can test containerd cli ctr to test 1000 times to see if this is a bug in containerd. If the answer is yes, we could try to fix the upstream. @Letty5411 @HusterWan

@HusterWan
Copy link
Contributor

Record Error logs

FAIL: /go/src/github.com/alibaba/pouch/test/cli_run_test.go:54: PouchRunSuite.TestRunPrintHi
/go/src/github.com/alibaba/pouch/test/cli_run_test.go:63:
    command.PouchRun("rm", "-f", name).Assert(c, icmd.Success)
/go/src/github.com/alibaba/pouch/vendor/github.com/gotestyourself/gotestyourself/icmd/command.go:61:
    t.Fatalf("at %s:%d - %s\n", filepath.Base(file), line, err.Error())
... Error: at cli_run_test.go:63 - 
Command:  /usr/local/bin/pouch rm -f test-run-print-hi
ExitCode: 1
Error:    exit status 1
Stdout:   
Stderr:   Error: failed to remove container: {"message":"failed to destory container: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef: failed to delete container: cannot delete running task da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef: failed precondition"}
INFO[2018-03-15 01:25:49.260058430] Calling POST /v1.24/containers/create?name=test-run-print-hi, client @ 
INFO[2018-03-15 01:25:49.281635845] Calling POST /v1.24/containers/test-run-print-hi/attach?stdin=0, client @ 
INFO[2018-03-15 01:25:49.281742923] start to subscribe io, backend: hijack, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:49.281945272] Calling POST /v1.24/containers/test-run-print-hi/start, client @ 
INFO[2018-03-15 01:25:49.314161904] success to get image: registry.hub.docker.com/library/busybox:latest, container id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:49.322579961] success to new container: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
time="2018-03-15T01:25:49Z" level=info msg="shim containerd-shim started" address="/containerd-shim/default/da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef/shim.sock" debug=false module="containerd/tasks" pid=12347 
time="2018-03-15T01:25:49Z" level=info msg="Firewalld running: false" 
INFO[2018-03-15 01:25:49.803343091] success to new task, container id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef, pid: 12362 
INFO[2018-03-15 01:25:49.836467039] success to start task, container id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:49.836505542] success to add container, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:49.838821962] End of Calling POST /v1.24/containers/test-run-print-hi/start, costs 556 ms. client @ 
INFO[2018-03-15 01:25:49.939727055] Calling DELETE /v1.24/containers/test-run-print-hi?force=true, client @ 
INFO[2018-03-15 01:25:49.945887382] the task has quit, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef, err: <nil>, exitcode: 0, time: 2018-03-15 01:25:49.94444564 +0000 UTC 
time="2018-03-15T01:25:50Z" level=info msg="shim reaped" id=da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef module="containerd/tasks" 
INFO[2018-03-15 01:25:50.073210318] close containerio backend: hijack, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:50.073244322] close containerio backend: hijack, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:50.073656138] finished to subscribe io, backend: hijack, id: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef 
INFO[2018-03-15 01:25:50.210981896] handle event: da3dc4722d6cf529df2e2b79133274edc4e92af1cf6991fcd0e5bcb06ab52fef exit 

@Letty5411
Copy link
Contributor Author

Run into this failure in travisCI:

FAIL: /go/src/github.com/alibaba/pouch/test/api_container_logs_test.go:52: APIContainerLogsSuite.TestNoShowStdoutAndShowStderr
/go/src/github.com/alibaba/pouch/test/api_container_logs_test.go:60:
    ...
/go/src/github.com/alibaba/pouch/test/util_api.go:21:
    c.Assert(resp.StatusCode, check.Equals, status, check.Commentf("Error:%s", got.Message))
... obtained int = 500
... expected int = 204
... Error:failed to destroy container: d6ef416c7e5c65decee39822b81684f7c7e6e19022ae38d3e33e89021eafbde5: failed to delete container: cannot delete running task d6ef416c7e5c65decee39822b81684f7c7e6e19022ae38d3e33e89021eafbde5: failed precondition

@fuweid
Copy link
Contributor

fuweid commented Sep 20, 2018

we have improved the cleanup function. will keep eyes on this.

@fuweid fuweid closed this as completed Sep 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containerd-related kind/bug This is bug report for project
Projects
None yet
Development

No branches or pull requests

5 participants