Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check whether State.Waiting object exists #619

Merged
merged 1 commit into from
Jun 8, 2022
Merged

Conversation

knw257
Copy link
Contributor

@knw257 knw257 commented Jun 3, 2022

Issue: Running AWX-Operator 0.16.1 with AWX version 19.5.1, the AWX-EE container would panic when any job was run, causing the job to fail:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11baab8]

goroutine 185 [running]:
github.com/ansible/receptor/pkg/workceptor.podRunningAndReady.func1({{0x146846c, 0xc0005536b8}, {0x162e980, 0xc00030b800}})
/source/pkg/workceptor/kubernetes.go:97 +0x258
k8s.io/client-go/tools/watch.UntilWithoutRetry({0x16414c8, 0xc000380300}, {0x1630028, 0xc00049e360}, {0xc000553958, 0x1, 0x8})
/root/go/pkg/mod/k8s.io/client-go@v0.18.6/tools/watch/until.go:82 +0x397
k8s.io/client-go/tools/watch.UntilWithSync({0x16414c8, 0xc000380300}, {0x16304d8, 0xc0004c8198}, {0x162e980, 0xc00050e800}, 0x0, {0xc00012d958, 0x1, 0x1})
/root/go/pkg/mod/k8s.io/client-go@v0.18.6/tools/watch/until.go:153 +0x245
github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).createPod(0xc000344c60, 0x0)
/source/pkg/workceptor/kubernetes.go:231 +0xabb
github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).runWorkUsingLogger(0xc000344c60)
/source/pkg/workceptor/kubernetes.go:272 +0x85
created by github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).startOrRestart
/source/pkg/workceptor/kubernetes.go:823 +0xdb

I confirmed that the awx-ee image was based on the latest available from quay.io, which, based on the action history in the ansible/awx-ee, was built using the latest devel image of Receptor.

I noted that within the podRunningAndReady function, a loop to check the statuses of each non-ready container within a non-ready pod is called, but the code assumes that the container's status is waiting. Per the k8s API documentation at https://pkg.go.dev/k8s.io/api/core/v1#ContainerState, only one of these states may be active at a time. In my case, the container was failing the readiness check due to networking issues, but Receptor communication utilizing a unix socket worked correctly. As such, the container was failing the readiness check, but the container state was Running, not Waiting. This meant that the attempt to call ContainerStatus.State.Waiting.Reason failed with a nil pointer dereference.

In this PR, I have wrapped the faulting code in a check that ensures the ContainerStatus.State.Waiting object is not nil, allowing it to skip over this code if the container state is not Waiting.

@fosterseth fosterseth merged commit ce6c620 into ansible:devel Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants