Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusable for container jobs with host network MTU < 1500 #2966

Closed
petertiedemann opened this issue Nov 1, 2023 · 7 comments
Closed

Unusable for container jobs with host network MTU < 1500 #2966

petertiedemann opened this issue Nov 1, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@petertiedemann
Copy link

petertiedemann commented Nov 1, 2023

Background:
It is a very common scenario for the environment in which the runner executes to have a network MTU lower than the "standard" 1500.

In general it happens due to various virtualization layers / VLAN overhead:

Problem:
The runner always creates a randomly named docker network for the job containers to run in, this network will always have MTU 1500.

The current experience with the runner and its use in the Github Actions Runner Controller in these scenarios can be extremely poor, and requires an extreme amount of research to address. Typical symptoms includes problems with network connectivity from within the containerized job. (A quick google search will reveal just how common docker related MTU problems are ).

It is possible to configure the mtu for dockerd, but this only affects the default bridge. The same problem is also described in #775. The conclusion there is that you don't want the runner to become a pass through for settings (which is fair), but the proposed solutions are not clear (and not documented) or only possible in some environments (e.g. changing to MTU 1500).

Possible solutions:

  1. Change the runner to create the network with the same MTU as the default bridge (note that this does not require exposing new settings)
  2. Detect (or have an option) that the runner itself is containerized and use the default bridge (the dedicated network is created to avoid port conflict)
  3. Do what docker should have done and look at the host networks and pick a safe MTU (e.g. minimum among the interfaces)
  4. Document how to use the job hooks to correct the MTU
  5. Make MTU 1500 a system requirement (please don't :)

1 is probably the "safe" approach as it solves the problem by default for those people that have already managed to configure the default bridge to mitigate this, and it changes nothing for those who haven't. 3 is great in that it would make the problem magically go away for most people, but might cause confusion as to the source of the MTU value.

As a minimum I think 4 should be addressed, if only to stop issues from these from being created :)

@petertiedemann petertiedemann added the bug Something isn't working label Nov 1, 2023
@petertiedemann
Copy link
Author

Having looked in further details at the hook customization options, I can see that users would have to commit to maintaining their own version of this whole package: https://github.com/actions/runner-container-hooks/tree/main/packages/docker. It seems excessive for something like this. Frankly, it is simpler to fork the runner itself with a one-line C# fix.

@petertiedemann
Copy link
Author

I also just want to add my work-around here, for anyone else finding this. I ended up shiming the docker executable and creating my own runner docker image:

Custom runner Dockerfile:

FROM ghcr.io/actions/actions-runner:latest
RUN sudo mv /bin/docker /bin/docker-wrapped
COPY docker /bin/docker
ENV DOCKER_MTU 1400

Shim:

#!/bin/bash
# Check if the first two arguments are 'network create'
if [ "$1" = "network" ] && [ "$2" = "create" ]; then
    # If they are, append the additional argument and pass all arguments to the original executable
    "docker-wrapped" "$@" -o "com.docker.network.driver.mtu"=$DOCKER_MTU
else
    # If not, just pass all arguments to the original executable
    "docker-wrapped" "$@"
fi

Terrible hack ;(

@johnnyhuy
Copy link

johnnyhuy commented Dec 22, 2023

I did some digging, and it's correct. We'll need to fork out container hooks to add a condition to add options like configuring MTUs.

Add a PR actions/runner-container-hooks#126

network.ts

export async function networkCreate(networkName): Promise<void> {
  const dockerArgs: string[] = ['network', 'create']
  dockerArgs.push('--label')
  dockerArgs.push(getRunnerLabel())
  dockerArgs.push(networkName)

  if (process.env.DOCKER_MTU) {
    dockerArgs.push('--opt')
    dockerArgs.push(`com.docker.network.driver.mtu=${process.env.DOCKER_MTU}`)
  }

  await runDockerCommand(dockerArgs)
}

I followed https://github.com/actions/runner-container-hooks/blob/main/CONTRIBUTING.md#e2e by rebuilding the project and baking it into the Actions runner container.

Dockerfile

FROM node:20.8.0 AS builder

WORKDIR /opt/runner-container-hooks

COPY runner-container-hooks/package.json /opt/runner-container-hooks/package.json
COPY runner-container-hooks/package-lock.json /opt/runner-container-hooks/package-lock.json
COPY runner-container-hooks/packages/docker/package.json /opt/runner-container-hooks/packages/docker/package.json
COPY runner-container-hooks/packages/docker/package-lock.json /opt/runner-container-hooks/packages/docker/package-lock.json
COPY runner-container-hooks/packages/hooklib/package.json /opt/runner-container-hooks/packages/hooklib/package.json
COPY runner-container-hooks/packages/hooklib/package-lock.json /opt/runner-container-hooks/packages/hooklib/package-lock.json
COPY runner-container-hooks/packages/k8s/package.json /opt/runner-container-hooks/packages/k8s/package.json
COPY runner-container-hooks/packages/k8s/package-lock.json /opt/runner-container-hooks/packages/k8s/package-lock.json

RUN npm ci
RUN npm run bootstrap

COPY runner-container-hooks /opt/runner-container-hooks
RUN npm run build-all

FROM ghcr.io/actions/actions-runner:latest

ENV DOCKER_MTU 1450
ENV ACTIONS_RUNNER_CONTAINER_HOOKS /home/runner/docker/index.js

COPY --from=builder /opt/runner-container-hooks/packages/docker/dist/index.js /home/runner/docker/index.js

Looks good! Made it so that the MTU can be configured as an environment variable.

image

@jgreat
Copy link

jgreat commented Jan 4, 2024

I recently learned docker has added the ability to set a default MTU for all created bridged networks, no more need for janky override scripts.

The new options were added in: moby/moby#43197

CLI daemon options.
dockerd --default-network-opts=bridge=com.docker.network.driver.mtu=1350

Example docker daemon.json:

{
  "default-network-opts": {
    "bridge": {
     "com.docker.network.driver.mtu": "1350"
    }
  }
}

@bo0tzz
Copy link

bo0tzz commented Jan 9, 2024

I can confirm that adding that default-network-opts config to daemon.json works, thank you @jgreat!

@petertiedemann
Copy link
Author

Just want to confirm that this is also working for me. Very nice to be rid of the shim and custom docker image. I will close this now, but I would suggest highlighting this setting in the troubleshooting docs.

@mustafakapasi-cbng
Copy link

Just putting a solution I found elsewhere. For self-hosted github runners on openshift,

  1. find out the mtu of your runner pod using ip link show
  2. Then, add the same mtu value to your RunnerDeployment Instance Yaml under spec>template>spec
    eg : dockerMTU: 1400
    Not my solution, but worked for me. Borrowed from containers create by Github workflow have wrong dockerMTU actions-runner-controller#1046
    Hope it helps !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants