Skip to content

Commit

Permalink
fix(runners): increase the log level to WARN when using the enable_jo…
Browse files Browse the repository at this point in the history
…b_queued_check parameter (#3046)

The readme section detailing the `enable_job_queued_check` makes it an
appealing function, but it was unclear to me that it could cause some
issues when used with ephemeral runners and matrix workflows.

It is particularly challenging to notice for organizations with a very
large number of repositories, at times with low activity the issue is
not present, but when matrix jobs are started or when multiple
repositories are triggering jobs around the same time window (such as
cron'd workflow), this issue is more likely to be faced.

For example, on the execution that allowed me to find the cause, when
starting a matrix of 9 jobs, 4 of them were actually triggering this log
message:
```
2023-03-10 15:29:09.728  INFO  [scale-up:a0502c68-f9b9-5d2d-afd6-70edb7486316 index.js:135249  isJobQueued] Job not queued 
```
This log message, aside from being one amongst many other `INFO`
messages, does not provide many details about the consequences.

I suggest bumping the level to `WARN` and adding more details in the
message itself.

Related discussion:
#2931
  • Loading branch information
Fgerthoffert authored Mar 17, 2023
1 parent a3eb81e commit 1de73bf
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ You can configure runners to be ephemeral, runners will be used only for one job

- The scale down lambda is still active, and should only remove orphan instances. But there is no strict check in place. So ensure you configure the `minimum_running_time_in_minutes` to a value that is high enough to got your runner booted and connected to avoid it got terminated before executing a job.
- The messages sent from the webhook lambda to scale-up lambda are by default delayed delayed by SQS, to give available runners to option to start the job before the decision is made to scale more runners. For ephemeral runners there is no need to wait. Set `delay_webhook_event` to `0`.
- All events on the queue will lead to a new runner crated by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example whn jobs got cancelled before a runner is created. We suggest to use this in combination with a pool.
- All events on the queue will lead to a new runner created by the lambda. By setting `enable_job_queued_check` to `true` you can enforce only create a runner if the event has a correlated queued job. Setting this can avoid creating useless runners, for example when jobs got cancelled before a runner is created or if the job was already picked up by another runner. We suggest to use this in combination with a pool.
- To ensure runners are created in the same order GitHub sends the events we use by default a FIFO queue, this is mainly relevant for repo level runners. For ephemeral runners you can set `enable_fifo_build_queue` to `false`.
- Error related to scaling should be retried via SQS. You can configure `job_queue_retention_in_seconds` `redrive_build_queue` to tune the behavior. We have no mechanism to avoid events will never processed, which means potential no runner could be created and the job in GitHub can time out in 6 hours.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ async function isJobQueued(githubInstallationClient: Octokit, payload: ActionReq
throw Error(`Event ${payload.eventType} is not supported`);
}
if (!isQueued) {
logger.info(`Job not queued`, LogFields.print());
logger.warn(`Job not queued in GitHub. A new runner instance will NOT be created for this job.`, LogFields.print());
}
return isQueued;
}
Expand Down

0 comments on commit 1de73bf

Please sign in to comment.