Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail jobs without starting pods #363

Merged
merged 1 commit into from
Aug 8, 2024
Merged

Fail jobs without starting pods #363

merged 1 commit into from
Aug 8, 2024

Conversation

DrJosh9000
Copy link
Contributor

@DrJosh9000 DrJosh9000 commented Aug 6, 2024

So a job is nonsense, a pod can't be constructed, or Kubernetes won't accept it. Currently, we build a whole 'nother pod containing an agent, a command container, an init container, etc using the default agent image, and tell it to...

echo "the error message" && exit 1

which smacks of cracking a walnut with the Mesta 50000.

This PR changes it so that the scheduler uses the new agent functionality introduced in buildkite/agent#2915 - core.Controller and core.JobController - to acquire and fail the job directly within the agent-stack-k8s controller.

Fixes #273

@DrJosh9000 DrJosh9000 force-pushed the podless-failure branch 9 times, most recently from faa59e7 to b8e175b Compare August 8, 2024 02:04
@DrJosh9000 DrJosh9000 changed the title [WIP] Fail jobs without starting pods Fail jobs without starting pods Aug 8, 2024
@DrJosh9000 DrJosh9000 requested a review from wolfeidau August 8, 2024 03:46
Copy link
Contributor

@wolfeidau wolfeidau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 🚀

@DrJosh9000 DrJosh9000 merged commit 17ae5f3 into main Aug 8, 2024
1 check passed
@DrJosh9000 DrJosh9000 deleted the podless-failure branch August 8, 2024 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"failure" job uses private image causing imagePullBackoff
2 participants