Skip to content

Github workflow failed to start due to an out-of-date github runner image #37115

@shunping

Description

@shunping

What happened?

We have seen recently some flakiness in a few workflows (e.g. https://github.com/apache/beam/actions/runs/20245962422/job/58125988587, #30525 (comment), https://github.com/apache/beam/actions/runs/19710817239).

The following error can be seen if such workflows failed:

Current runner version: '2.318.0'
...
Error: System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter ''using: node24' is not supported, use 'docker', 'node12', 'node16' or 'node20' instead.')
   at GitHub.Runner.Worker.ActionManifestManager.ConvertRuns(IExecutionContext executionContext, TemplateContext templateContext, TemplateToken inputsToken, String fileRelativePath, MappingToken outputs)
   at GitHub.Runner.Worker.ActionManifestManager.Load(IExecutionContext executionContext, String manifestFile)
Error: Failed to load gradle/actions/4d9f0ba0025fe599b4ebab900eb7f3a1d93ef4c2/setup-gradle/action.yml

The reason is that node24 is added at 2.327.1, so it failed if an older github runner is used: https://github.com/actions/runner/releases/tag/v2.327.1.

Looking at the log, I see that there is actually a updating step when the runner started. If it is successful, we will see:

Image

However, when the github workflow is run before the runner finished updating, we see

Image

After some digging, I see the ARC images for apache-beam-testing were built via github workflow: https://github.com/apache/beam/actions/workflows/build_runner_image.yml and stored at https://pantheon.corp.google.com/artifacts/docker/apache-beam-testing/us-central1/beam-github-actions/beam-arc-runner.

However, we are still pinning an old version (3063b55757509dad1c14751c9f2aa5905826d9a0), which was created at Aug 14, 2024. I verified that the runner version inside this docker image is 2.318.0.

runner_image = "us-central1-docker.pkg.dev/apache-beam-testing/beam-github-actions/beam-arc-runner:3063b55757509dad1c14751c9f2aa5905826d9a0"

I think we should update our config to use a newer version of the image. Thoughts?

Issue Failure

Failure: Test is flaky

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions