Rather than provisioning a specific, long-lived EC2 instance for this, I imagine using GitHub Actions itself to launch and register instances on demand. GitHub Actions supports ephemeral runners which deregister themselves and exit (or maybe vice versa?) after some period of inactivity. This seems perfect for a workflow job which launches a new instance when necessary to handle a job and lets the runner exit, deregister, and the instance terminate after some time. This sort of system could use a pool where up to N new instances are launched but after that the auto-launch is a no-op, or it could always launch a new instance specifically for use by the current workflow run.
In terms of implementation, the workflow would look something like this pseudo-code:
jobs:
launch-runner:
id: launch-runner
uses: nextstrain/.github/.github/workflow/launch-runner.yaml
with:
os: linux
machine: arm64
outputs:
labels: ...
build:
needs: launch-runner
runs-on: ${{ fromJSON(needs.launch-runner.outputs.labels) }}
steps:
- ...
The labels
output would either be the bare minimum required by GitHub like
["self-hosted", "linux", "ARM64"]
for a pool model or ["self-hosted", "linux", "ARM64", "28e892c3-674b-4fc9-b7cb-813467d58a5c"]
for a
one-instance-per-workflow model.
The launch-runner shared workflow could also be a custom action. Not sure what's easier and a better fit.
It seems like ephemeral runners are only assigned a single job, so they'd only be appropriate for the one-instance-per-workflow model.
There are, of course, other auto-scaling solutions out there, but I wonder if they're good fits or not for our needs. If they are, great!
Using the webhooks mentioned there instead of launching explicitly in the workflow could be a useful abstraction barrier, but it might also be harder to manage than the more explicit coupling. In particular, it means having another logging/monitoring system for the actions taken on receiving a webhook, when we could instead have that integrated into GitHub Actions.
As a sort of middle ground between webhooks and the approach described above,
we might be able to use workflow_run
events
to accomplish a similar thing while staying within the GitHub Actions
ecosystem.