Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecard does not seem to receive kill signal after main container finishes (exit code 126) #8745

Closed
3 tasks done
maxott opened this issue May 12, 2022 · 4 comments
Closed
3 tasks done

Comments

@maxott
Copy link

maxott commented May 12, 2022

Checklist

  • Double-checked my configuration.
  • Tested using the latest version.
  • Used the Emissary executor.

Summary

What happened/what you expected to happen?

I installed Argo using the 'getting started with postgres' template on my local minikube installation. I then submitted a workflow which contains a sidecar (in the argo workflow def). When the 'main' task finishes, the signal (15) sent by the controller is not terminating the sidecar. However, sending the same signal using docker kill --signal=15 ... triggers the sidecar to terminate.

I assume that I'm missing some permissions as I see in the controller logs:

msg=".../exec?command=%2Fbin%2Fsh&command=-c&command=kill+-15+1&container=data_proxy&stderr=true&stdout=true&tty=false"
....
msg="signaled container" container=data-proxy error="command terminated with exit code 126" 

where `data_proxy' is the defined side car.

What version are you running?

I assume the latest:

kubectl create ns argo
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/master/manifests/quick-start-postgres.yaml

Diagnostics

Paste the smallest workflow that reproduces the bug. We must be able to run the workflow.

{
	"metadata": {
		"name": "order-7f0289b2-886c-44f5-a177-fa58505f6e7b",
		"generateName": "cayp.service.d939b74d-0070-59a4-a832-36c5c07e657d.gradient-image-",
		"namespace": "argo",
		"uid": "1b597734-3701-47f7-a096-a0eca7cc96ce",
		"resourceVersion": "251939",
		"generation": 1,
		"creationTimestamp": "2022-05-12T07:34:29Z",
		"managedFields": [{
			"manager": "__debug_bin",
			"operation": "Update",
			"apiVersion": "argoproj.io/v1alpha1",
			"time": "2022-05-12T07:34:29Z",
			"fieldsType": "FieldsV1",
			"fieldsV1": {
				"f:metadata": {
					"f:generateName": {}
				},
				"f:spec": {},
				"f:status": {}
			}
		}]
	},
	"spec": {
		"templates": [{
			"name": "processA",
			"inputs": {
				"parameters": [{
					"name": "msg"
				}]
			},
			"outputs": {},
			"metadata": {},
			"container": {
				"name": "cayp.service.d939b74d-0070-59a4-a832-36c5c07e657d.gradient-image",
				"image": "172.17.0.1:5000/testing.com/gradient_image:latest",
				"args": ["--msg", "{{inputs.parameters.msg}}"],
				"env": [],
				"resources": {}
			},
			"sidecars": [{
				"name": "data-proxy",
				"image": "172.17.0.1:5000/cayp_data_proxy:latest",
				"env": [],
				"resources": {}
			}]
		}, {
			"name": "exit-handler",
			"inputs": {},
			"outputs": {},
			"metadata": {},
			"container": {
				"name": "exit-handler",
				"image": "172.17.0.1:5000/cayp_exit_handler:latest",
				"args": ["--name", "{{workflow.name}}", "--status", "{{workflow.status}}", "--failures", "{{workflow.failures}}"],
				"env": [],
				"resources": {}
			}
		}],
		"entrypoint": "processA",
		"arguments": {
			"parameters": [{
				"name": "cayp-version",
				"value": "0.5"
			}, {
				"name": "cayp-order-id",
				"value": "cayp:order:7f0289b2-886c-44f5-a177-fa58505f6e7b"
			}, {
				"name": "cayp-service-id",
				"value": "cayp:service:d939b74d-0070-59a4-a832-36c5c07e657d:gradient-image"
			}, {
				"name": "msg",
				"value": "Hello Sydney"
			}]
		},
		"serviceAccountName": "argo",
		"volumes": [{
			"name": "dp-config",
			"configMap": {
				"name": "data-proxy-config"
			}
		}, {
			"name": "minio-cert",
			"configMap": {
				"name": "minio-tls",
				"items": [{
					"key": "public.crt",
					"path": "public.crt"
				}]
			}
		}],
		"onExit": "exit-handler"
	},
	"status": {
		"startedAt": null,
		"finishedAt": null
	}
}
# Logs from the workflow controller:
kubectl logs -n argo deploy/workflow-controller | grep ${workflow} 

time="2022-05-12T06:45:52.034Z" level=info msg="cleaning up pod" action=terminateContainers key
=argo/order-101c5317-31d2-4070-8bfb-dea46a33de6f/terminateContainers
time="2022-05-12T06:45:52.035Z" level=info msg="https://10.96.0.1:443/api/v1/namespaces/argo/po
ds/order-101c5317-31d2-4070-8bfb-dea46a33de6f/exec?command=%2Fbin%2Fsh&command=-c&command=kill+
-15+1&container=data-proxy&stderr=true&stdout=true&tty=false"
time="2022-05-12T06:45:52.079Z" level=info msg="Create pods/exec 101"
time="2022-05-12T06:45:52.198Z" level=info msg="signaled container" container=data-proxy error=
"command terminated with exit code 126" namespace=argo pod=order-101c5317-31d2-4070-8bfb-dea46a
33de6f stderr="<nil>" stdout="<nil>"
time="2022-05-12T06:45:52.199Z" level=warning msg="failed to clean-up pod" action=terminateCont
ainers error="command terminated with exit code 126" key=argo/order-101c5317-31d2-4070-8bfb-dea
46a33de6f/terminateContainers
time="2022-05-12T06:45:52.199Z" level=warning msg="Non-transient error: command terminated with
 exit code 126"

# If the workflow's pods have not been created, you can skip the rest of the diagnostics.

# The workflow's pods that are problematic:
kubectl get pod -o yaml -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

apiVersion: v1
items: []
kind: List
metadata:
resourceVersion: ""


# Logs from in your workflow's wait container, something like:
kubectl logs -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

time="2022-05-12T12:48:04.924Z" level=info msg="Starting Workflow Executor" version=untagged
time="2022-05-12T12:48:04.942Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2022-05-12T12:48:04.942Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo podName=order-7f0289b2-886c-44f5-a177-fa58505f6e7b template="{"name":"processA","inputs":{"parameters":[{"name":"msg","value":"Hello Sydney"}]},"outputs":{},"metadata":{},"container":{"name":"cayp.service.d939b74d-0070-59a4-a832-36c5c07e657d.gradient-image","image":"172.17.0.1:5000/testing.com/gradient_image:latest","args":["--msg","Hello Sydney"],"env":[{"name":"CAYP_ORDER_ID","value":"cayp:order:7f0289b2-886c-44f5-a177-fa58505f6e7b"},{"name":"http_proxy","value":"http://localhost"},{"name":"CAYP_STORAGE_URL","value":"http://storage.local"}],"resources":{}},"sidecars":[{"name":"data-proxy","image":"172.17.0.1:5000/cayp_data_proxy:latest","env":[{"name":"CAYP_ORDER_ID","value":"cayp:order:7f0289b2-886c-44f5-a177-fa58505f6e7b"},{"name":"CAYP_ACCOUNT_ID","value":"cayp:account:58d8e161-9a2b-513a-bd32-28d7e8af1658:testing.com"},{"name":"CAYP_APP_METADATA_PROVIDER","value":"sql"},{"name":"CAYP_SQL_HOST","value":"postgresql-dev.ivcap.svc.cluster.local"},{"name":"CAYP_SQL_PORT","value":"5432"},{"name":"CAYP_SQL_USER","value":"ivcap"},{"name":"CAYP_SQL_PASSWORD","value":"ivcap123"},{"name":"CAYP_SQL_DBNAME","value":"ivcap"},{"name":"CAYP_SQL_SSLMODE","value":"disable"},{"name":"S3_BUCKET_NAME","value":"cayp-artifacts"},{"name":"CAYP_MINIO_ENDPOINT","value":"minio.argo.svc.cluster.local:9000"},{"name":"CAYP_MINIO_ACCESS_KEY","valueFrom":{"secretKeyRef":{"name":"my-minio-cred","key":"accesskey"}}},{"name":"CAYP_MINIO_SECRET_KEY","valueFrom":{"secretKeyRef":{"name":"my-minio-cred","key":"secretkey"}}},{"name":"CAYP_MINIO_USE_TLS","value":"false"}],"resources":{}}],"archiveLocation":{"archiveLogs":true,"s3":{"endpoint":"minio:9000","bucket":"my-bucket","insecure":true,"accessKeySecret":{"name":"my-minio-cred","key":"accesskey"},"secretKeySecret":{"name":"my-minio-cred","key":"secretkey"},"key":"order-7f0289b2-886c-44f5-a177-fa58505f6e7b/order-7f0289b2-886c-44f5-a177-fa58505f6e7b"}}}" version="&Version{Version:untagged,BuildDate:2022-05-12T03:24:34Z,GitCommit:1f2417e30937399e96fd4dfcd3fcc2ed7333291a,GitTag:untagged,GitTreeState:clean,GoVersion:go1.18.2,Compiler:gc,Platform:linux/amd64,}"
time="2022-05-12T12:48:04.943Z" level=info msg="Starting deadline monitor"
time="2022-05-12T12:48:16.052Z" level=info msg="Main container completed" error=""
time="2022-05-12T12:48:16.052Z" level=info msg="Deadline monitor stopped"
time="2022-05-12T12:48:16.053Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2022-05-12T12:48:16.053Z" level=info msg="No output parameters"
time="2022-05-12T12:48:16.053Z" level=info msg="No output artifacts"
time="2022-05-12T12:48:16.054Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: order-7f0289b2-886c-44f5-a177-fa58505f6e7b/order-7f0289b2-886c-44f5-a177-fa58505f6e7b/main.log"
time="2022-05-12T12:48:16.054Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2022-05-12T12:48:16.054Z" level=info msg="Saving file to s3" bucket=my-bucket endpoint="minio:9000" key=order-7f0289b2-886c-44f5-a177-fa58505f6e7b/order-7f0289b2-886c-44f5-a177-fa58505f6e7b/main.log path=/tmp/argo/outputs/logs/main.log
time="2022-05-12T12:48:16.084Z" level=info msg="Save artifact" artifactName=main-logs duration=30.336824ms error="" key=order-7f0289b2-886c-44f5-a177-fa58505f6e7b/order-7f0289b2-886c-44f5-a177-fa58505f6e7b/main.log
time="2022-05-12T12:48:16.084Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2022-05-12T12:48:16.084Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2022-05-12T12:48:16.109Z" level=info msg="Create workflowtaskresults 201"
time="2022-05-12T12:48:16.110Z" level=info msg="Alloc=7159 TotalAlloc=11883 Sys=18642 NumGC=3 Goroutines=9"

---
<!-- Issue Author: Don't delete this message to encourage other users to support your issue! -->
**Message from the maintainers**:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
@alexec alexec linked a pull request May 12, 2022 that will close this issue
@alexec
Copy link
Contributor

alexec commented May 24, 2022

@maxott I've created a new version which I hope fixes this issue. please can you use :dev-kill for both controller and executor?

@stale
Copy link

stale bot commented Jun 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Jun 12, 2022
@stale
Copy link

stale bot commented Jul 10, 2022

This issue has been closed due to inactivity. Feel free to re-open if you still encounter this issue.

@stale stale bot closed this as completed Jul 10, 2022
@agilgur5 agilgur5 removed the problem/stale This has not had a response in some time label Sep 13, 2023
@agilgur5
Copy link
Contributor

I believe this was fixed in #8908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants