Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fork/exec <command> operation not permitted error in Workflows with 'tty' option enabled on v3.5.2+ #12829

Closed
3 of 4 tasks
z63d opened this issue Mar 21, 2024 · 1 comment
Closed
3 of 4 tasks
Assignees
Labels
P3 Low priority type/bug type/dependencies PRs and issues specific to updating dependencies type/regression Regression from previous behavior (a specific type of bug)

Comments

@z63d
Copy link

z63d commented Mar 21, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

We use Argo Worklfows in our production environment. Last month, we upgradeded Argo Workflows from v3.4.6 to v3.5.4 and encountered a strange error at a low frequency. The affected step in the Workflow was marked as a failed phase.

Pod failed: Error (exit code 64): failed to start command: fork/exec /opt/docker/bin/example_command: operation not permitted

To identify the version causing the error, we gradually increased the version of Argo Workflows and determined that it was caused by Argo Workflows v3.5.2. After reverting the suspicious changes and determining the cause of the error, we found that the error was caused by a change in argoproj/argo-workflows#12139.

In PR argoproj/argo-workflows#12139, the dependant creack/pty library was was upgraded to v1.1.20, affecting workflows that enable the tty option. (source code)

Our Workflow had the tty option set to true. The following is a simplified version of our workflow illustrates this setup:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: sample-workflow
spec:
  entrypoint: sleep
  templates:
  - name: sleep
    container:
      image: alpine:3.19.1
      imagePullPolicy: IfNotPresent
      command: ["sleep", "30"]
      tty: true

We investigated the releases in creack/pty and discovered that creack/pty v1.1.21 reverted the change introduced in v1.1.20 due to race conditions on Linux. For more information, refer to the revert PR. The main branch of Argo Workflows has already incorporated PR argoproj/argo-workflows#12312 to upgrade creack/pty to v1.1.21.

After reverting the changes made in argoproj/argo-workflows#12139, we confirmed that the error no longer occurred in Argo Workflows v3.5.2. Additionally, we upgraded creack/pty to v1.1.21 in Argo Workflows v3.5.2 and confirmed the same result.

Currently, it seems that the PR argoproj/argo-workflows#12312 hasn't been cherry-picked into the v3.5 release branch. Would you consider cherry-picking it to the v3.5 release branch?

Version

v3.5.2

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

listed above

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@z63d z63d added the type/bug label Mar 21, 2024
@agilgur5 agilgur5 added type/dependencies PRs and issues specific to updating dependencies P3 Low priority labels Mar 21, 2024
@agilgur5 agilgur5 changed the title Encountering 'fork/exec operation not permitted' error in Workflows with 'tty' option enabled on Argo Workflows v3.5.2 and above fork/exec <command> operation not permitted error in Workflows with 'tty' option enabled on v3.5.2+ Mar 21, 2024
@agilgur5 agilgur5 self-assigned this Mar 21, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Mar 21, 2024

Thanks for root causing this and filing a detailed issue on it!

Completed the cherry-pick to the release-3.5 branch in #12312 (comment)

@agilgur5 agilgur5 added the type/regression Regression from previous behavior (a specific type of bug) label Mar 21, 2024
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 3, 2024
@argoproj argoproj locked as resolved and limited conversation to collaborators Jul 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P3 Low priority type/bug type/dependencies PRs and issues specific to updating dependencies type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

No branches or pull requests

2 participants