Enforce timeout and notify when job is waiting for a runner to pick up the job #50926
Replies: 8 comments 8 replies
-
Hi Rajendra!You can enforce a timeout for jobs that are waiting for a runner to pick up the job by setting the To set the For example, to set the
When a runner has been idle for the specified amount of time, it will be automatically removed from the queue and the job will be marked as failed. As for monitoring the queue of jobs waiting for runners, you can use the GitHub API to retrieve a list of pending jobs. The API endpoint for this is:
This will return a list of jobs for the specified run, including their current status (e.g. waiting, in progress, completed). You can use this information to monitor the queue of jobs waiting for runners and take appropriate action if necessary. |
Beta Was this translation helpful? Give feedback.
-
Hi @radhikari-arch, I have been facing similar problem and created a workaround which is working fine for me so far. It might not be an optimal way, but this is what I did: Create another workflow, say name: Cancel deploy on timeout
on:
push:
branches: [main]
workflow_dispatch:
jobs:
timeout:
timeout-minutes: 2
runs-on: ubuntu-latest
permissions:
actions: write
steps:
- name: Monitor deployment for timeout and cancel if crossed the threshold
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
sleep 1m
gh -R {owner}/{repo} run list -w deploy.yml -s queued --json databaseId -q .[].databaseId | xargs gh -R {owner}/{repo} run cancel How it works?
You can probably use this simple workflow as base and modify as per your needs by applying checks on different statuses and varying timeouts per status. Hope it helps! |
Beta Was this translation helpful? Give feedback.
-
Another missing feature that makes me wonder how Actions shipped, let alone has stayed in such a similar state for years. It's like observability of Actions is actually an anti-feature in Microsoft's eyes. Just like the ridiculous way failed job notifications are sent and are rotely unconfigurable, it's outlandish that there's no way to timeout STALLED JOBS that can't run. |
Beta Was this translation helpful? Give feedback.
-
FYI I adapted @develop-at-github 's suggestion for a scheduled cleanup job that runs hourly. The workflow is in
That kills any outstanding runs of the workflow, not including (via the I do run this job on I do agree with @colemickens that this is functionality that should be built in to GHA. |
Beta Was this translation helpful? Give feedback.
-
🕒 Discussion Activity Reminder 🕒 This Discussion has been labeled as dormant by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This dormant notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
-
This timeout probably shouldn't be a workflow yml parameter, but a repository setting. In a repository I am working on, we have a job that is stuck since over 2 days, which we are even unable to cancel using the force cancel api. And it is just waiting for a runner to come online - but the workers are online and other jobs are able to run on them. Also the jobs itself states: |
Beta Was this translation helpful? Give feedback.
-
Actually, I wrote the job to get the status of the self-hosted runner before the next job is executed on the expected runner.
|
Beta Was this translation helpful? Give feedback.
-
Adding my two cents here. Below is a job definition I use to check named runner status. In my case, we only have two self hosted runners, one for dev, one for prod. If you have more, maybe consider using a matrix to check that all the runners are up. Or, move this into a re-usable action that allows an input param specifying the worker. This does require an Org admin to grant a Github Token with the correct fine grained access control. I believe its an org read workflow permission. You could add this to the front of all your actions, then have any subsequent jobs in the workflow use
|
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Question
Body
We have a situations for various reasons the self hosted runners run into issues and jobs are waiting for several hours to pick up the runners with message "Waiting for a runner to pick up this job...". Is there something we can enforce timeout for this? The timeout in job works only when jobs actually started to run. But I am looking for a solution where several jobs if waiting for the runners for sometime would like to timeout and get notified.
And also is there a way we can monitor the queue that are waiting for the runners?
Beta Was this translation helpful? Give feedback.
All reactions