-
-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Connectivity is lost once gluetun container is restarted #641
Comments
Hey there! Thanks for the detailed issue! It is a well known Docker problem I need to workaround. Let's keep this opened for now although there is at least one duplicate issue about this problem somewhere in the issues. Note this only happens if gluetun is updated and uses a different image (afaik). For now, you might want to have all your gluetun and connected containers in a single docker-compose.yml and I'm developing https://github.com/qdm12/deunhealth and should add a feature tailored for this problem soon (give it 1-5 days), feel free to subscribe to releases on that side repo. That way it would watch your containers and restart your connected containers if gluetun gets updated & restarted. |
Thank you for the answer @qdm12. It does seem to be indeed a Docker problem just as you said and unfortunately they seem a bit reluctant to discuss possible solutions for the issue, unfortunately. :( For the time being, there's a temporary ugly, brutal, but 100% working fix. Maybe it would be worth mentioning it in the wiki/docker-compose.yml example? Although there are some gotchas, since it completely replaces the original healthcheck command, and some images don't include either curl or wget. Currently I'm probing example.com every minute on child containers attached to gluetun's network stack and so far so good. I just subscribed to deunhealth, seems promising and probably even better than things like autoheal due to the network fix thing. I'll make sure to check it out in a week (or earlier, as you deem appropiate) and provide feedback/do some testing. |
Similar conversation in #504 to be concluded. |
I have the same thing, when i restart Gluetun, it doesn't want to start the containers within the same network_mode. Only difference is that i configured it with: network_mode: 'container:VPN'. I think when i restart or recreate the Gluetun container it gets a different ID. What would be the solution to this problem? |
Stumbled across this issue while researching ways to restart dependent containers once gluetun is recreated with a new image (via Watchtower). https://github.com/qdm12/deunhealth seems like it might work, but I wanted to make sure I understand the use case. If I have a number of services with: However, when the gluetun container restarts, the dependent containers don't actually end up gettin marked unhealthy, they just lose connectivity. I'm wondering if you've updated deunhealth yet to include this function. |
No sorry, but I'll get to it soon. Ideally, there is a way to re-attach the disconnected containers to gluetun without restarting them (I guess with Docker's Go API since I doubt the docker cli supports such thing). That would work by marking each connected container with a label to indicate this network re-attachment. If there isn't, I'll setup something to cascade the restart from gluetun to connected containers, probably using labels to avoid any surprise (mark gluetun as a parent container with a unique id, and mark all connected containers as child containers with that same id). |
For the time being, if anyone wants a dirty, cheap solution, here's my current setup:
This will only work with containers where curl is already preinstalled. There are docker images that include wget but not curl, in which case you can replace test command with |
Any progress or resolution to this, either in gluetun or deunhealth? |
I have bits and pieces for it, but I am moving country + visiting family + starting a new job right now, so it might take at least 2 weeks for me to finish it up, sorry about that. But it's at the top of my OSS things-to-do list, so it won't be forgotten 😉 |
I'd also like to thank you for creating gluetun and to say this is a very good project. |
Any update on this by any chance? |
I'm not really sure. I turned off the Watchtower container and since then
my setup worked flawlessly. It's a workaround, but it's all I know so far.
Op di 7 mrt 2023 om 01:25 schreef Paul Hawkins ***@***.***>:
… Any update on this by any chance?
—
Reply to this email directly, view it on GitHub
<#641 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADIU3IYFQ7MQ6JKOGY2H32TW2Z57BANCNFSM5EW2LD4Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Any news or progress on this issue? |
following |
Since I also have this problem, I would like to report it here and find out if and how it continues. Thank you! |
Having the same behavior. |
I built myself a systemd service which runs 30 seconds after the docker service has started and starts all containers with the |
@ioqy This seems like a fundamentally better approach to me, thanks for the link. |
I tried this but the containers didn't restart. Am I missing something? Do all the different services dependent on gluetun have to be in the same docker compose file for this to work? I can confirm that "curl" is working in this container This is an example of one of my containers compose files using this "work around"
|
You're missing some spaces in your healthcheck. Indenting must be exact. I had the same issue until i fixed those indents on the healthcheck. |
Thanks for the help/feedback @vdrover. I ran all my containers through vs code as docker compose files and ran "format document". Hopefully that should fix it. I'll restart gluetun and see if that makes a difference. |
@qdm12. I would recommend adding the results of this thread to the wiki. It seems like a pretty important issue that should be flagged to users when setting up this project. |
I do not think it is a good idea to curl a public website every minute or so. This causes unnecessary traffic for you and for the website (even if they have the bandwidth, as Google surely does) - especially considering that there may be multiple containers connected to gluetun, all doing the same checks every minute. I solved this by simply checking the healthcheck address of gluetun itself.
|
Could you provide an example healthcheck? |
To use the example from above:
This requires netcat ( It can be tested from the Docker host by running the following command inside the container you have connected to gluetun: |
@Babadabupi: excellent idea, thank you |
@Babadabupi or someone else here. Can you please provide some guidance on how to add linux commands/tools so they are accessible via commands in Docker? curl works, but netcat isn't found and I can't seem to see how to get it accessible. I'm running Docker (and Portainer) on a Synology server running the latest DSM (7.2 I think). Thanks! |
@begunfx: If you can't use the command inside the container, then nc isn't installed inside. You can add it (e.g. debian based container, if you have another Linux image then change bash and apt if necessary): |
Thanks for the response, insight and suggestions @cybermcm. I need to have netcat available to all my containers that connect to gluetun permanently. So what would you recommend for this? Is there a way to have a docker compose file execute a bash script? I found this link that talks about it a bit, but I'm not too clear if this is the best way to go: https://stackoverflow.com/questions/57840820/run-a-shell-script-from-docker-compose-command-inside-the-container Or is there a way I can just install a container that has netcat in it and have other docker containers use it to run netcat? I am able to install Ubuntu as a docker container, but not clear on how to share resources. I did set the Ubuntu container to use the Gluetun container as its network so all my containers that need access to it are in the same network - from what I understand (but this didn't work). Update: I did file the following docker compose command that seems to do the trick to execute a shell script: I found it at this post: Something like that? |
Of course, if curl (or wget) is available, you can still use it to achieve the same end result. For example, with the commands mentioned by @rakbladsvalsen: I would advise against installing anything inside the container. It is possible, but they are meant to be ephemeral. |
Thanks @Babadabupi. using curl or wget when available makes things a lot easier than trying to add netcat. I added @qdm12 deunhealth container as well. Really appreciate the feedback and suggestions. Thanks to everyone on this thread! |
Okay. So I added the deunhealth container and the healthcheck suggestions here. For some reason if I update the gluetun container the other containers in the gluetun network stop but don't start again. Is it because the gluetun stack is restarting from an update and not an unhealthy state? If so, is there a way to correct this? Or does it make more sense to leave as is if the exit status is normal? To add a little more to this: After the dependent containers stop, I have to re-run their stacks (2x) to get them to start up again and deploy. I'm assuming it's because something changed with gluetun when it restarted - possibly a network IP address etc. This is my gluetun container setup:
This is one of my dependent containers setup:
Just want to make sure I'm not missing something in my current setup. Thanks! |
Is there a way to work around this that doesn't require restarting containers? |
Wouldn't it be possible to reconnect gluetun without restarting the container to fix the problem? Anyways, I have yet another workaround (because I don't like having custom health checks or having another container that restarts the other ones). Here's the script: #!/bin/bash
# delay to let everything start in order to prevent restarting everything right after boot (because after boot the container state will also change to "healthy")
sleep 120
# Listen for docker "healthy" for the gluetun container. If state changes to healthy (which means gluetun has a connection again), restart the dependent containers based on the label filter.
docker events --filter 'event=health_status' | while read line; do if [[ ${line} = *"container health_status: healthy"* ]] && [[ ${line} = *"com.docker.compose.service=gluetun"* ]]; then docker restart $(docker ps -q --filter "label=com.docker.compose.depends_on=gluetun:service_started:true"); fi; done |
What would be the best way to use timeout before kill, in a way docker stop -t does it? docker stop has a default timeout of 10s, this means that it sends SIGTERM, waits 10s, then sends SIGKILL. My qbittorrent usually takes more than 10s to save state, which results in hours of restoring on the next start. If I stop it with -t 120, it takes about 20-40s to stop (it doesn't wait whole 120 which is convenient). How would I achieve the same behavior in bash? UPD: Looks like kill already sends only SIGTERM by default? Does it mean it has an infinite timeout, and restart will happen once it is fully done with exiting? UPD2: Found this:
|
@ThorpeJosh Could you please elaborate on this? What do you use for such monitoring? |
Since gluetun is now able to auto-heal most minor issues without fully restarting itself, looks like this issue is still relevant only in those specific cases:
I'm collecting all the info I could find in one place. Please feel free to correct it, cause I'm not very experienced in this. In the future I hope gluetun wiki could include some of this information. Basic infoHealthchecks:Many solutions require a healthcheck. Either set on all child containers, or on a single container with a main purpose to oversee the network condition. The latter is especially useful if you have a distroless container that can't run a healthcheck, or if you have a lot of containers under gluetun. Solutions that do not require a healthcheck: bash script on cron (2), or alternative solution based on 4. Commonly a curl of some website is performed for a healthcheck. But, as mentioned here, this causes unnecessary traffic for you and for the website, since it fetches every minute from every container. Checking localhost:9999 (gluetun health server) instead might be more effective in our case. Depending on container image, some commands may not work. Collected everything from the thread:
Exiting containers with graceSome containers, like qbittorrent and databases, might be sensitive to default docker stop/restart/compose down/etc. Docker commands have default timeout of 10 seconds, so if that's not enough for you, use Some ways use kill 1 to shut down the container. With the kill, SIGTERM is sent by default, so this might not be an issue, but I did non test it. Here I mentioned a way to somewhat reproduce docker stop signals while using kill:
NotificationsAs a bonus - I found dolce - it can notify you about container events to email, discord, telegram, slack, mattermost and apprise. Useful to track if the fix is working for you as intended. Solutions list, from basic to advanced:Transparent in-house solutions:1. Natively
2. Xitee1's workaroundIt avoids having custom health checks or having another container that restarts the other ones.
or you can use any other docker compose related labels from The script:
3. Healthcheck || kill internally + restart alwaysFollow up your healthcheck with This one also doesn't have extra dependencies or access to the docker socket. But I think in some cases restarting via docker from the outside might be more clean if container require something specific. You can also see modified kill command under the The ones using some external docker image and docker socket:4. Basic custom container to oversee others (from stackoverflow)It has access to docker socket, checks list of all containers for unhealthy every 60 seconds, then runs docker restart. You can apply a stop timeout to it with -t if you need to. And customize to do whatever you like.
In theory, if you have a lot of containers, you can add this one to gluetun network and check connection to it (localhost:9999) instead of doing any healthchecks, then restart everything on the same network except gluetun. 5. willfarrell/autohealThis container is keeping tabs on health states of all or only labeled containers. It uses docker socket, checks at intervals, and has customizable timings (including the stop timeout I mentioned before). It also has a webhook (useful for Discord notifications). It might have issues with auto-updating containers (watchtower), so better update and restart manually or restart all within the same compose at once. 6. qdm12/deunhealthFor those wondering about the differences with willfarrell/autoheal, it's listed here. In short, it's safer cause there's no OS (based on scratch) and no network. It streams events so there is no check period, it automagically detects unhealthy containers at the same time as the Docker daemon does. It also needs a label to be added to a child container to work.
Its roadmap also has nice features like "Trigger mechanism such that a container restart triggers other restarts" and "Inject pre-build binary doing a DNS lookup to containers labeled for it and that do not have a healthcheck built in (useful for scratch based images without healthcheck especially)" so you might want to follow its releases in case qdm12 will someday resume working on it. 7. cascandaliato/docker-restarterContainer with access to docker socket, restart containers based on events:
It has customizable timings, but I'm not sure whether it just checks periodically or also listens to docker events, since there's two different restart scenarios (dependency and unhealthy). 9. Self update to avoid Docker restarts(WIP) It was a planned gluetun feature to solve the watchtower issue. 10. ioqy/docker-start-failed-gluetun-containersA systemd service (installs on host OS) which runs 30 seconds after the docker service has started and starts all containers with the cannot join network of a non running container error message. Mentioned here. Resolves the issue that happens when you restart the server and have to manually docker compose up the gluetun stack because otherwise the other services never launch with the error 11. Notifiarr/dockwatchDockwatch is a container with docker socket access. It has a nice web ui to manage container updates and container-related notifications (via Notifiarr). It can auto-update or just check and notify. It can restart unhealthy containers, and automatically recognize if containers depend on specific network containers, for example Gluetun:
Mentioned here. |
That is a really nice summary thanks for organizing! I just wanted to add another option that's been working for me: It's a helpful tool similar to Watchtower, but it has native logic for VPN updates/restarts and specifically mentions gluetun. Somehow it knows when it updates, which other containers also need to be updated. In my case I have 3, and it appears to work correctly. I use watchtower for all my auto updates except gluetun, and use dockwatch for gluetun updates. You can test it manually first with the UI before turning on auto updates. |
How do people expect to restart the container ... which they have their container's stack networking routed through ... and still maintain connectivity? How were they doing this on bare metal? Then suggesting patterns that upgrade and auto-restart containers, how is that a fix? If you want docker to support some sort of load balanced / ha network_mode: service ... that's your RFE ... for docker. |
The issue is that after the container is restarted, the child containers do not regain connectivity. No one here is talking about a HA network model. |
I have a container attached to gluetun that fetches gluetuns public ip constantly. If ip is leaked, or internet is unreachable then it restarts gluetun. Its just a bash script that has access to Docker daemon (via proxy). It also sends notifications/alerts via apprise and for internet connection issues it has time outs and retry-backoff strategies too. I've now moved to using opnsense firewall to block and log when gluetun leaks traffic outside of its wireguard tunnel. |
@ThorpeJosh did you ever encounter any leaks with gluetun? Unless the firewall rules that glutun is setting are wrong, a leak should not be possible and your checks are unnecessary because glutun already goes unhealthy when it looses internet connection. |
Yes, but nothing in the last 9-12 months. There were a couple of instances in the past shortly after gluetun start up where attached services could get internet access outside of the tunnel. I remember reading similar gh issues from others at the time. |
@enchained Thanks for the summary of solutions. You might want to mention that some of them (autoheal) mount the docker socket inside the container which seems to be disadvised, though I don't fully understand the implications myself. |
Is this urgent?: No (kinda it is, since this causes complete connection loss if this "bug" happens)
Host OS: Tested on both Fedora 34 and (up-to-date) Arch Linux ARM (32bit/RPi 4B)
CPU arch or device name: amd64 & armv7
What VPN provider are you using: NordVPN
What are you using to run your container?: Docker Compose
What is the version of the program
Steps to reproduce issue:
-exec it
into the container and run curl/wget/ping/etc:Expected behavior:
xyz should have internet connectivity through gluetun's network stack and be accesible through gluetun's published/exposed ports, even if gluetun is restarted. This is, unfortunately not the case: xyz's network stack just dies, no data in, no data out.
Additional notes:
FIREWALL_OUTBOUND_SUBNETS
- didn't make a difference.network_mode: service:gluetun
completely disappear. b) Restarting gluetun doesn't bring back original routing tables. c)NetworkMode
seems to be okay.Terminal example
Brief
docker inspect
output from affected containerf77[...] is gluetun's container ID.
Full gluetun logs:
docker-compose.yml:
Nonetheless I'd like to thank you for creating gluetun. I'd be more than happy to help you fix this issue if this is a gluetun bug. Hopefully it's a misconfiguration in my side.
The text was updated successfully, but these errors were encountered: