Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner Entrypoint: fix daemon.json #1409

Merged
merged 2 commits into from
Jun 30, 2022

Conversation

Vladyslav-Miletskyi
Copy link
Contributor

@Vladyslav-Miletskyi Vladyslav-Miletskyi commented May 2, 2022

Do not overwrite daemon.json if it already exists.
Usage: custom images, which are using a public image as source.

runner/startup.sh Outdated Show resolved Hide resolved
@UrosCvijan
Copy link

Having the same issue... Is there a way to provide daemon.json that is custom to regular summerwind/actions-runner-dind:latest
As our datacenter has the same network as docker default network, so we want to change the docker network using daemon.json
I wanted to create my own image using FROM the the above and to add my daemon.json but it gets overwritten.
Is there some other way of doing it?

@mumoshu
Copy link
Collaborator

mumoshu commented Jun 15, 2022

@UrosCvijan How about modifying startup.sh in your custom runer image?

mumoshu pushed a commit that referenced this pull request Jun 23, 2022
This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes #1409 into account and things can be merged in any order. There are a few important changes here to the logic:

- The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly.
- This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run.
- I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt.
- The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash.

The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.
@UrosCvijan
Copy link

@mumoshu Aha, sorry for late reply. My idea was not to touch the "core" of everything that was used to build the docker image. As later something might be added to the startup.sh so I would not touch any of those but just add my daemon, basically do the seamless upgrades. But if there wont be a change like this, I guess this will have to be the way.

@mumoshu
Copy link
Collaborator

mumoshu commented Jun 28, 2022

@UrosCvijan Thanks for clarifying! I hear you- probably we should merge this and after that you'd be able to provide your own daemon.json in your custom runner image's Dockerfile. Does that sound good to you?

@UrosCvijan @Vladyslav-Miletskyi

The only reason I was a bit hesitant to improve startup.sh like this is that now you have another vector to break ARC so we'd need to improve our already very lengthy bug report form to also include a question like "Paste your custom docker daemon.json if any". If the only way you can customize it was by modifying start.sh, almost certainly the user is advanced enough so that they don't need basic user support so I won't include such question in the bug report form.

Perhaps you'd think it's better to merge this and enhance the bug report form, rather than not merging this at all? If so, I'd go that way anyway!

@UrosCvijan
Copy link

@mumoshu Yeah, that is what i currently did with the base docker:dind image. As I don't want to "break" the future upgrades and flow, this way, I don't touch any of the mechanics of the startup.sh or entrypoint.sh, I can just use your image and add stuff I need, like packages new deamon.json etc..
So from all of this, as you said, i could change startup.sh but then i "interfere" with the things you are doing in the setup of the main docker image.
The thing is that it worked for us when we just added it in original dind image, but as we started burning up a lot of runners, i was asked if i can use only one container, did the same steps but deamon.json was {}, didn't even think that it could be in startup.sh script :)
I would not touch the Bug report form, as people that make changes and are adding their own deamon.json, know the reason behind it.

Vladyslav-Miletskyi and others added 2 commits June 30, 2022 01:37
Do not owerwrite daemon.json if it already exists.
Usage: custom images, which are using public image as source.
Co-authored-by: Callum Tait <15716903+toast-gear@users.noreply.github.com>
@mumoshu
Copy link
Collaborator

mumoshu commented Jun 30, 2022

@UrosCvijan Thanks for confirming! Got it. We'll do our best to maintain it :) But beware that we may or may not unexpectedly break this as it currently lacks a unit test. It would be great if you could submit a PR to add that. I remember we had a few tests for entrypoint.sh in https://github.com/actions-runner-controller/actions-runner-controller/tree/master/test/entrypoint. It would be great if you could write something similar for startup.sh.

@mumoshu
Copy link
Collaborator

mumoshu commented Jun 30, 2022

At least this seems to not break existing behavior. I've verified that by running a few tests with the updated image locally.

Copy link
Collaborator

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution @Vladyslav-Miletskyi!

@mumoshu mumoshu merged commit 2fe6adf into actions:master Jun 30, 2022
@Vladyslav-Miletskyi Vladyslav-Miletskyi deleted the fix/daemon_json branch June 30, 2022 08:15
neggles pushed a commit to neggles/actions-runners that referenced this pull request Nov 10, 2022
This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes actions/actions-runner-controller#1409 into account and things can be merged in any order. There are a few important changes here to the logic:

- The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly.
- This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run.
- I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt.
- The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash.

The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.
neggles pushed a commit to neggles/actions-runners that referenced this pull request Nov 10, 2022
This overhaul turns it into a shellcheck valid script with explicit error handling for all possible situations I could think of. This change takes actions/actions-runner-controller#1409 into account and things can be merged in any order. There are a few important changes here to the logic:

- The wait logic for checking if docker comes up was fundamentally flawed because it checks for the PID. Docker will always come up and thus become visible in the process list, just to immediately die when it encounters an issue, after which supervisor starts it again. This means that our check so far is flaky due to the `sleep 1` it might encounter a PID, or it might not, and the existence of the PID does not mean anything. The `docker ps` check we have in the `entrypoint.sh` script does not suffer from this as it checks for a feature of docker and not a PID. I thus entirely removed the PID check, and instead I am handing things over to our `entrypoint.sh` script by setting the environment variables correctly.
- This change has an influence on the `docker0` interface MTU configuration, because the interface might or might not exist after we started docker. Hence, I changed this to a time boxed loop that tries for one minute to set up the interface's MTU. In case the command fails we log an error and continue with the run.
- I changed the entire MTU handling by validating its value before configuring it, logging an error and continuing without if it is set incorrectly. This ensures that we are not going to send our users on a bug hunt.
- The way we started supervisord did not make much sense to me. It sends itself into the background automatically, there is no need for us to do so with Bash.

The decision to not fail on errors but continue is a deliberate choice, because I believe that running a build is more important than having a perfectly configured system. However, this strategy might also hide issues for all users who are not properly checking their logs. It also makes testing harder. Hence, we could change all error conditions from graceful to panicking. We should then align the exit codes across `startup.sh` and `entrypoint.sh` to ensure that every possible error condition has its own unique error code for easy debugging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants