Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness probe fails with 500 and workflow never reconciled when using semaphores #6351

Closed
atombender opened this issue Jul 16, 2021 · 23 comments · Fixed by #6356
Closed

Liveness probe fails with 500 and workflow never reconciled when using semaphores #6351

atombender opened this issue Jul 16, 2021 · 23 comments · Fixed by #6356

Comments

@atombender
Copy link

Version 3.1.1 is failing a lot for us:

$ kubectl get pod
NAME                                   READY   STATUS             RESTARTS   AGE
argo-server-895985bb9-fz2hf            1/1     Running            0          13h
workflow-controller-6c79bf6684-8xlbx   0/1     CrashLoopBackOff   83         9h

$ kubectl describe pod workflow-controller-6c79bf6684-8xlbx
[...]
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  4m54s (x251 over 9h)  kubelet  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    17s (x952 over 8h)    kubelet  Back-off restarting failed container

$ kubectl -n argo logs workflow-controller-6c79bf6684-8xlbx --tail=10
time="2021-07-16T06:50:23.197Z" level=info msg="Get leases 200"
time="2021-07-16T06:50:23.206Z" level=info msg="Update leases 200"
time="2021-07-16T06:50:28.211Z" level=info msg="Get leases 200"
time="2021-07-16T06:50:28.219Z" level=info msg="Update leases 200"
time="2021-07-16T06:50:33.224Z" level=info msg="Get leases 200"
time="2021-07-16T06:50:33.233Z" level=info msg="Update leases 200"
time="2021-07-16T06:50:38.238Z" level=info msg="Get leases 200"
time="2021-07-16T06:50:38.245Z" level=info msg="Update leases 200"
time="2021-07-16T06:50:42.410Z" level=info msg="List workflows 200"
time="2021-07-16T06:50:42.413Z" level=info msg=healthz age=5m0s err="workflow never reconciled: webhook-test-2h4zr" instanceID= labelSelector="!workflows.argoproj.io/phase,!workflows.argoproj.io/controller-instanceid" managedNamespace=

$ curl -si http://localhost:6060/healthz
HTTP/1.1 500 Internal Server Error
Date: Thu, 15 Jul 2021 21:24:52 GMT
Content-Length: 45
Content-Type: text/plain; charset=utf-8

workflow never reconciled: webhook-test-2h4zr

$ kubectl -n e2e get wf webhook-test-2h4zr
NAME                 STATUS   AGE
webhook-test-2h4zr

$ kubectl -n e2e get wf webhook-test-2h4zr -oyaml
[...]
status:
  artifactRepositoryRef:
    default: true
  finishedAt: null
  message: 'Waiting for e2e/ConfigMap/semaphore/all-workflows lock. Lock status: 1/1 '
  progress: 0/0
  startedAt: null
  storedTemplates: [...]
  synchronization:
    semaphore:
      waiting:
      - semaphore: e2e/ConfigMap/semaphore/all-workflows

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec
Copy link
Contributor

alexec commented Jul 16, 2021

Interesting... this workflow does not have status.phase set, so the health check thinks it has never been reconciled - and therefore workflow controller has had a general error.

@sarabala1979
Copy link
Member

@atombender can you provide the full workflow yaml (sample which we can reproduce) and controller log for that workflow?

@alexec
Copy link
Contributor

alexec commented Jul 16, 2021

I've committed a change, and our CI will push engineering builds argoproj/workflow-controller:dev-6351 and argoproj/argoexec:dev-6351 in 1h. Can you please test to see if it fixes your issue?

@alexec alexec self-assigned this Jul 16, 2021
@alexec alexec added this to the v3.1 milestone Jul 16, 2021
@alexec
Copy link
Contributor

alexec commented Jul 16, 2021

@sarabala1979 I jumped on this issue as (a) it will prevent workflows from running and (b) I wrote the code and (c) I could see a clear fix so knew it would be quick.

@atombender
Copy link
Author

Thanks! We haven't seen the issue today, actually, but we've been heavily developing our workflows and things haven't really had a chance to "settle". I will see if I can put this fix into testing soon.

@alexec
Copy link
Contributor

alexec commented Jul 16, 2021

Thank you. The problem will only occur if you are using semaphores.

@atombender
Copy link
Author

We haven't tried the build yes, as I need some support from our SRE staff. In the meantime, we got a panic relating to one of these workflows that don't have a status. Does this log help?

@atombender
Copy link
Author

New log from the dev-6351 build: https://gist.github.com/atombender/03520de681ca7a09c9535065469ddc22

@alexec
Copy link
Contributor

alexec commented Jul 19, 2021

I've fixed the panic and pushed the change. It should be ready to test in < 1h. Can you please test again?

@atombender
Copy link
Author

atombender commented Jul 20, 2021

Deployed it and testing. I'm not seeing any status-less workflow, but it is apparently buggy in a different way: My workflow is stuck progressing on a daemon step that has successfully started.

Eventually, it fails with "Pod was active on the node longer than the specified deadline", and the workflow fails without any apparently retries, even though my workflow has:

  templateDefaults:
    retryStrategy:
      retryPolicy: onError
      limit: 10

When I roll back to the release version, it works.

@atombender
Copy link
Author

I see this with multiple workflows. The initial pods start successfully, but no new steps are scheduled.

@alexec
Copy link
Contributor

alexec commented Jul 20, 2021

Can you try using :latest too?

@atombender
Copy link
Author

Same behaviour. In case it's relevant, the wait container just says:

time="2021-07-20T15:38:55.097Z" level=info msg="Starting Workflow Executor" executorType=k8sapi version=untagged
time="2021-07-20T15:38:55.102Z" level=info msg="Creating a K8sAPI executor"
time="2021-07-20T15:38:55.103Z" level=info msg="Executor initialized" [...]
time="2021-07-20T15:38:55.104Z" level=info msg="Starting deadline monitor"
time="2021-07-20T15:38:55.111Z" level=info msg="Watch pods 200"

@alexec
Copy link
Contributor

alexec commented Jul 20, 2021

Looking at the kubectl describe output, your main container is still running, which is not terminated.

Can you double-check that this runs on v3.1.

@atombender
Copy link
Author

It's a daemon: true step that's part of a DAG. Reverting to 3.1.2 works.

@alexec
Copy link
Contributor

alexec commented Jul 20, 2021

@atombender can you please raise a separate bug?

@atombender
Copy link
Author

Sure.

@atombender
Copy link
Author

#6379

@alexec
Copy link
Contributor

alexec commented Jul 20, 2021

So we can close this issue, can you test with a workflow that does not use daemon steps?

@atombender
Copy link
Author

We haven't confirmed whether your fix solves this issue. Unfortunately, the problem doesn't always show up, and I have no idea what provokes it, but I will try to reproduce it. We don't have any workflows that don't use daemon steps.

alexec added a commit that referenced this issue Jul 22, 2021
…6351 (#6356)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
uturunku1 pushed a commit to newrelic-forks/argo-workflows that referenced this issue Jul 22, 2021
…rgoproj#6351 (argoproj#6356)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>
alexec added a commit that referenced this issue Jul 26, 2021
* Update events.md (#6119)

Trying to use the argo workflows events and I noticed that some crucial explanations are missing here. I would like to add:
- A simple WorkflowTemplate bound to the WorkflowEventBinding, to show what is triggered by the curl that send the event
- Some infos about the process that bind the event to the workflow template:
   - template creation
   - event binding apply
   - api call to trigger the workflow template creation
Plus: there is a little mistake in the selector:  metadata["x-argo"] instead of metadata["X-Argo-E2E"] I would like to correct it in order to avoid mistakes during the curl.

Hope this is appreciated! ;)

Denis

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add note on the requirements of resource templates. Fixes #5566 (#6125)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6127)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* add troubleshooting notes section for running-locally docs (#6132)

Co-authored-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Check whether any errors within checkResourceState() are transient. Fixes #6118. (#6134)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* build: Remove PNS_PRIVILEGED=true (#6138)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Document the extraction of data from a k8s resource (#6102)

* Document the extraction of data from a k8s resource

* remove reference to lines of a file that can be outdated

Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>

* Remove yaml snippet and only keep the link to the example

Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* build image output to docker (#6128)

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: Update stress rig and docs. Fixes #6136 (#6141)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: Upgrade Alibaba OSS to use more secure ListObjectsV2() (#6142)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Allow setting workflow input parameters in UI. Fixes #4234 (#5319)

* fix: Allow setting workflow input parameters in UI. Fixes #4234

Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>

* fix: Allow setting workflow input parameters in UI. Fixes #4234

 - Allow workflow input parameters as well as entrypoint parameters.

Signed-off-by: Kenny Trytek <kenneth.g.trytek@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Performance improvement for Sprig. Fixes #6135 (#6140)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* update from v0.19.6 to v0.20.4 and indirect dependencies

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* exec.GetAuthenticator takes two arguments in the k8s-client-go v0.20.4

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* update makefile to use code-generator@v0.20.4

Signed-off-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix release-notes.md

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Update Graviti's website link (#6148)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(ui): Fix-up local storage namespaces. Fixes #6109 (#6144)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Capture emissary main-logs. Fixes #6145 (#6146)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(ui): Fix event-flow scrolling. Fixes #6133 (#6147)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* test: Fix logging test (#6159)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(ui): Add checkbox to check all workflows in list. Fixes #6069 (#6158)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Use 'depends' instead of 'dependencies' in examples (#6166)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(server): Allow redirect_uri to be automatically resolved when using sso (#6167)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Allow retry on transient errors when validating workflow spec. Fixes #6163 (#6178)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): dehydrate workflow before deleting offloaded node status (#6112)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6160)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Remove RBAC for SSO from Roadmap (Already implemented) (#6174)

It looks like RBAC for SSO is already implemented by #4198 so hopefully it can be removed from the roadmap as it is also documented? https://argoproj.github.io/argo-workflows/argo-server-sso/#sso-rbac

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6187)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix changelog order for .0 tags (#6188)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Wrong validate order when validate DAG task's argument (#6190)

Signed-off-by: BOOK <book78987book@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix rebase conflict

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* refactor: Remove the need for pod annotations to be mounted as a volume (#6022)

Signed-off-by: Antony Chazapis <chazapis@ics.forth.gr>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: ContainerSets do not have 'depends' (#6199)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Fix security issues related to file closing and paths (G307 & G304) (#6200)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add links to Python examples to description annotations (#6202)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(executor): document k8s executor behaviour with program warnings (#6212)

* docs(executor): document k8s executor behaviour with program warnings

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

* docs(executor): fix typo

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Fix certain sibling tasks not connected to parent (#6193)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(ui): Add copy to clipboard shortcut (#6217)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6220)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add KarrotPay in USERS.md (#6221)

Signed-off-by: Byungjin Park <posquit0.bj@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add workflow-count-resourcequota.yaml example (#6225)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Reduce argoexec image size (#6197)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(conttroller): Always set finishedAt dote. Fixes #6135 (#6139)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat: Add support for deletion delay when using PodGC (#6168)

Signed-off-by: Stefan Sedich <stefan.sedich@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: update bug report template (#6236)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6242)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: alexec <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): emissary - make argoexec executable from non-root containers. Fixes #6238 (#6247)

Signed-off-by: Yuan Gong <gongyuan94@gmail.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat: Introduce when condition to retryStrategy (#6114)

Signed-off-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* ci: Add Go code security scanner via gosec. Fixes #6203 (#6232)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: fix end of files, new lines and remove multiple lines (#6240)

Signed-off-by: NikeNano <niklas.sven.hansson@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: add json destructuring example (#6250)


Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): Tolerate docker re-creating containers. Fixes #6244 (#6252)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(executor): emissary - make /var/run/argo files readable from non-root users. Fixes #6238 (#6304)

Signed-off-by: Yuan Gong <gongyuan94@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(controller): add missing emissary executor (#6291)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: docs and hacks improvements (#6310)

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): Only list needed fields. Fixes #6000 (#6298)

* fix(cli): Only list needed fields

Signed-off-by: Alex Collins <alex_collins@intuit.com>

* ok

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Fix typo (#6311)

Signed-off-by: Byungjin Park <posquit0.bj@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* require sso redirect url to be an argo url (#6211)

Signed-off-by: Brandon Goode <brandon.goode@cox.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: code format (#6269)

- Add yaml rendering
- Add bash rendering

Co-authored-by: Simon Behar <simbeh7@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Store artifact repository in workflow status. Fixes #6255 (#6299)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: document using ingress with TLS enabled (#6324)

Signed-off-by: valorl <11498571+valorl@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: document how to access hyphenated steps in expression templates (#6318)

Signed-off-by: Michael Crenshaw <michael@crenshaw.dev>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Differentiate CronWorkflow submission vs invalid spec error metrics (#6309)

* feat(controller): Differentiate CronWorkflow submission vs invalid spec error metrics

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

* Address feedback

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: deleted wft.yaml

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* ci: only run Snyk once a day on master

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Not updating StoredWorkflowSpec when WFT changed during workflow running (#6342)

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): v3.1 Argo Auth Token (#6344)

* fix(cli): v3.1 Argo Auth Token

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>

* update

Signed-off-by: Saravanan Balasubramanian <sarabala1979@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add Alibaba Group to USERS.md (#6353)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(crd): temp fix 34s timeout bug for k8s 1.20+ (#6350)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: updated CHANGELOG.md (#6348)

Signed-off-by: GitHub <noreply@github.com>

Co-authored-by: sarabala1979 <sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs(users): Add WooliesX (#6358)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(cli): Overridding name/generateName when creating CronWorkflows if specified (#6308)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): sortDAGTasks supports sort by field Depends (#6307)

Signed-off-by: BOOK <book78987book@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(fields): handle nexted fields when excluding (#6359)

Signed-off-by: AntoineDao <antoinedao1@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* feat(controller): Allow configurable host name label key when retrying different hosts (#6341)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* pull argo-events changes
update versions in go.mod and go.sum

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run go mod tidy

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): allow workflow.duration to pass validator (#6376)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): fix retry on transient errors when validating workflow spec (#6370)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: examples/ci.yaml indent (#6328)

Signed-off-by: kungho.back <kungho.back@naverlabs.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* chore: import grafana dashboard (#6365)

Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(gcs): throw argo not found error if key not exist (#6393)

Signed-off-by: AntoineDao <antoinedao1@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* Revert "fix: examples/ci.yaml indent (#6328)"

This reverts commit 3f72fe5.

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix: Server crash when opening timeline tab for big workflows (#6369)

Signed-off-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>

Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add 4intelligence (#6400)

Signed-off-by: Thiago Gil <t.gil@4intelligence.com.br>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Add note on additional required permission for createBucketIfNotPresent for OSS driver (#6378)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): allow initial duration to be 0 instead of current_time-0 (#6389)


Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix(controller): Mark workflows wait for semaphore as pending. Fixes #6351 (#6356)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* docs: Updating upgrading.md. Closes #6314

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* not need to convert to unstructured.unstructured

I was getting this error controller_test.go: pkg/mod/k8s.io/client-go@v0.20.4/tools/cache/reflector.go:167: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: item[0]: can't assign or convert unstructured.Unstructured into v1alpha1.Workflow

Based on this comment, it seems like the conversion is not needed: kubernetes-sigs/controller-runtime#524 (comment)

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* run make pre-commit -B

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix potential file inclusion via variable lint error

there is a risk that an unintended file path will be specified. So uuse filepath.Clean() to clean up possible bad paths

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

* fix format issue

Signed-off-by: uturunku1 <luces.huayhuaca@gmail.com>

Co-authored-by: Denis Bellotti <denis.bellotti.android@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: alexec <alexec@users.noreply.github.com>
Co-authored-by: uturunku1 <“21225410+uturunku1@users.noreply.github.com”>
Co-authored-by: Christophe Blin <christophe.blin@free.fr>
Co-authored-by: meijin <859037421@qq.com>
Co-authored-by: kennytrytek <kenneth.g.trytek@gmail.com>
Co-authored-by: Caden <32856921+CadenOf@users.noreply.github.com>
Co-authored-by: Simon Behar <simbeh7@gmail.com>
Co-authored-by: Stefan Sedich <stefan.sedich@gmail.com>
Co-authored-by: Reijer Copier <copierrj@users.noreply.github.com>
Co-authored-by: Brandon High <highb@users.noreply.github.com>
Co-authored-by: BOOK <book78987book@gmail.com>
Co-authored-by: Antony Chazapis <chazapis@ics.forth.gr>
Co-authored-by: Tianchu Zhao <evantczhao@gmail.com>
Co-authored-by: Byungjin Park (Claud) <posquit0.bj@gmail.com>
Co-authored-by: Yuan (Bob) Gong <4957653+Bobgy@users.noreply.github.com>
Co-authored-by: Niklas Hansson <niklas.sven.hansson@gmail.com>
Co-authored-by: Michael Crenshaw <michael@crenshaw.dev>
Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Co-authored-by: brgoode <86316314+brgoode@users.noreply.github.com>
Co-authored-by: Valér Orlovský <11498571+valorl@users.noreply.github.com>
Co-authored-by: Alex Collins <alex_collins@intuit.com>
Co-authored-by: sarabala1979 <sarabala1979@users.noreply.github.com>
Co-authored-by: Antoine Dao <antoinedao1@gmail.com>
Co-authored-by: KUNG HO BACK <bkh751@gmail.com>
Co-authored-by: Zadkiel <zadkiel.aharonian@gmail.com>
Co-authored-by: Alexander Matyushentsev <Alexander_Matyushentsev@intuit.com>
Co-authored-by: Thiago Bittencourt Gil <79285506+thiago4int@users.noreply.github.com>
@sarabala1979 sarabala1979 mentioned this issue Jul 27, 2021
39 tasks
sarabala1979 pushed a commit that referenced this issue Jul 27, 2021
…6351 (#6356)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
@sarabala1979 sarabala1979 mentioned this issue Aug 2, 2021
49 tasks
@agilgur5 agilgur5 changed the title Liveness probe fails with 500 and "workflow never reconciled" Liveness probe fails with 500 and "workflow never reconciled" (using semaphores) Apr 22, 2024
@omerlh
Copy link
Contributor

omerlh commented Jun 23, 2024

I am experiencing similar issue but without semaphor - saw this error in the logs:

argo-workflows-workflow-controller-7786c48df-zkwrj controller time="2024-06-23T09:48:19.772Z" level=error msg="recovered from panic &{%!q(*logrus.Logger=&{0xc000098030 map[2:[0xc0007242d0] 3:[0xc0007242d0] 4:[0xc0007242d0]] 0xc0000b9c20 false 4 {{0 0} false} {{} 0xc088e8b000 16 0xc000186000 0 <nil>} 0x4f4480 <nil>}) map[\"fromPhase\":\"Succeeded\" \"namespace\":\"workflows\" \"toPhase\":\"Error\" \"workflow\":\"recon-0394134a-f3ea-43f2-85c5-22fe67c0e06c-rspl8\"] \"2024-06-23 09:48:19.56050614 +0000 UTC m=+368.315614323\" \"panic\" %!q(*runtime.Frame=<nil>) \"workflow is already fulfilled\" \"<nil>\" <nil> \"\"}. Call stack:\ngoroutine 390 [running]:\ngithub.com/argoproj/argo-workflows/v3/util/runtime.RecoverFromPanic(0xc081379da0?)\n\t/go/src/github.com/argoproj/argo-workflows/util/runtime/panic.go:15 +0x65\npanic({0x23d9b00?, 0xc0020ab0a0?})\n\t/usr/local/go/src/runtime/panic.go:920 +0x270\ngithub.com/sirupsen/logrus.(*Entry).log(0xc0020ab030, 0x0, {0xc0311e9ec0, 0x1d})\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491\ngithub.com/sirupsen/logrus.(*Entry).Log(0xc0020ab030, 0x0, {0xc002783bf0?, 0x7?, 0xc002783a18?})\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48\ngithub.com/sirupsen/logrus.(*Entry).Panic(...)\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:342\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markWorkflowPhase(0xc0874423c0, {0x27be158, 0x3a5b2c0}, {0x2414c81, 0x5}, {0xc001a04500, 0xf2})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2291 +0x325\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markWorkflowError(0x24135ec?, {0x27be158, 0x3a5b2c0}, {0x2794340?, 0xc05d12c100?})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2424 +0x56\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate.func2()\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:204 +0x234\npanic({0x23d9b00?, 0xc0020aae00?})\n\t/usr/local/go/src/runtime/panic.go:920 +0x270\ngithub.com/sirupsen/logrus.(*Entry).log(0xc0020aad90, 0x0, {0xc0311e9e40, 0x1d})\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:260 +0x491\ngithub.com/sirupsen/logrus.(*Entry).Log(0xc0020aad90, 0x0, {0xc0027849e8?, 0x7?, 0x30?})\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:304 +0x48\ngithub.com/sirupsen/logrus.(*Entry).Panic(...)\n\t/go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/entry.go:342\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markWorkflowPhase(0xc0874423c0, {0x27be158, 0x3a5b2c0}, {0x2416255, 0x6}, {0xc089c145b0, 0x64})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2291 +0x325\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).markWorkflowFailed(...)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:2420\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*wfOperationCtx).operate(0xc0874423c0, {0x27be158?, 0x3a5b2c0})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:261 +0xaf4\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).processNextItem(0xc000158000, {0x27be158, 0x3a5b2c0})\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:811 +0x627\ngithub.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).runWorker(0xc000015ea0?)\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:734 +0x88\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x33\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x2796540, 0xc083347290}, 0x1, 0xc0002d12c0)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xaf\nk8s.io/apimachinery/pkg/util/wait.JitterUntil(0x17ba8a0?, 0x3b9aca00, 0x0, 0xe0?, 0x17ba9a0?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:133 +0x7f\nk8s.io/apimachinery/pkg/util/wait.Until(0xc000015f80?, 0xa005aa001?, 0xc000015fa0?)\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:90 +0x1e\ncreated by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run in goroutine 84\n\t/go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:344 +0x1885\n" namespace=workflows workflow=recon-0394134a-f3ea-43f2-85c5-22fe67c0e06c-rspl8

Once I deleted the workflow in the log the controller is live again - but it was very old workflow so I am not sure what happened

@ospiegel91
Copy link

were still experiencing this issue, memory util of controller is at 30%.

@agilgur5 agilgur5 changed the title Liveness probe fails with 500 and "workflow never reconciled" (using semaphores) Liveness probe fails with 500 and workflow never reconciled"(using semaphores) Sep 1, 2024
@agilgur5 agilgur5 changed the title Liveness probe fails with 500 and workflow never reconciled"(using semaphores) Liveness probe fails with 500 and workflow never reconciled(using semaphores) Sep 1, 2024
@agilgur5 agilgur5 changed the title Liveness probe fails with 500 and workflow never reconciled(using semaphores) Liveness probe fails with 500 and workflow never reconciled when using semaphores Sep 1, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Sep 1, 2024

I am experiencing similar issue but without semaphor [sic]

That's a different issue then. Same error message but different cause. Please open a new issue with a reproduction.

This specific issue was with semaphores and was resolved 3 years ago with no further comments.

@argoproj argoproj locked as resolved and limited conversation to collaborators Sep 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants