Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Issue restarting containers using network in other stack #11

Open
EzekialSA opened this issue Oct 20, 2021 · 26 comments
Open

Bug: Issue restarting containers using network in other stack #11

EzekialSA opened this issue Oct 20, 2021 · 26 comments

Comments

@EzekialSA
Copy link

EzekialSA commented Oct 20, 2021

I'm trying to configure everything to be automated with updates and availability using watch tower and deunhealth. I was doing testing to see what would happen if gluetun got an update (as you know it breaks things connected to it when it restarts). I get the following errors when stopping/restarting gluetun:

2021/10/20 12:17:21 ERROR failed restarting container: Error response from daemon: Cannot restart container qbittorrent: No such container: 5bc959037ff8fceeca8dfae013347f64162fa759189421d224f07a31810f3aaf,
2021/10/20 12:17:18 INFO container qbittorrent (image ghcr.io/linuxserver/qbittorrent:latest) is unhealthy, restarting it...

I believe that the gluetun container is the one that's referenced by that hash, so it disappears and deunhealth doesn't know how to handle it.

I don't think it's worth noting, but I am using portainer for stack management. Here are my config files of what I'm trying to do:

version: "2.1"
services:
  qbittorrent:
    image: ghcr.io/linuxserver/qbittorrent:latest
    container_name: qbittorrent
    labels:
      - com.centurylinklabs.watchtower.scope=WEEKDAYS
      - deunhealth.restart.on.unhealthy=true
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/London
      - WEBUI_PORT=8095
      - UMASK=002
    healthcheck:
      test: "curl -sf -o /dev/null example.com || exit 1"
      interval: 1m
      timeout: 10s
      retries: 2
    restart: unless-stopped
    network_mode: "container:gluetun"
---
version: "3"
services:
  gluetun:
    image: qmcgaw/gluetun
    container_name: gluetun
    labels:
      - com.centurylinklabs.watchtower.scope=WEEKDAYS
      - deunhealth.restart.on.unhealthy=true
    cap_add:
      - NET_ADMIN
    ports:
      - 8888:8888/tcp # HTTP proxy
      - 8388:8388/tcp # Shadowsocks
      - 8388:8388/udp # Shadowsocks
      - 6881:6881/tcp
      - 6881:6881/udp
      - 8095:8095/tcp
    volumes:
      - /yes/config/gluetun:/gluetun
    environment:
      - VPNSP=nordvpn
      - REGION=United States
      - UPDATE_PERIOD=24h
    restart: unless-stopped
---
version: "3.7"
services:
  deunhealth:
    image: qmcgaw/deunhealth
    container_name: deunhealth
    labels:
      - com.centurylinklabs.watchtower.scope=WEEKDAYS
      - deunhealth.restart.on.unhealthy=true
    network_mode: "none"
    environment:
      - LOG_LEVEL=info
      - HEALTH_SERVER_ADDRESS=127.0.0.1:9999
      - TZ=America/New_York
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
---
version: "3"
services:
  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    labels:
      - com.centurylinklabs.watchtower.scope=WEEKDAYS
      - deunhealth.restart.on.unhealthy=true
    environment:
      - WATCHTOWER_INCLUDE_RESTARTING=true
      - WATCHTOWER_CLEANUP=true
      - WATCHTOWER_REVIVE_STOPPED=true
      - WATCHTOWER_ROLLING_RESTART=true
      - TZ=America/New_York
    command: --schedule "0 0 5 * * 1-5" --scope WEEKDAYS
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /etc/docker/daemon.json:/config.json
    restart: always
@kubax
Copy link

kubax commented Oct 20, 2021

i seccond that... Thats exactly my problem...

i have disabled updates for gluetun to stop my containers from dangling without network.

if that is fixable, i would be very glad!!!

@qdm12
Copy link
Owner

qdm12 commented Oct 20, 2021

That's really strange. So the container can no longer be found with its container ID?! I'll do some more testing.

Meanwhile I'm almost done on a cascaded restart feature which should restart containers labeled for it when a certain container starts (like gluetun).

@qdm12
Copy link
Owner

qdm12 commented Oct 20, 2021

Ah got it. It's because the container ID it was relying on (gluetun) disappeared. Ugh, that's also going to be problematic for my cascaded restart feature... I think the (connected) container config needs to be patched somehow, before being restarted 🤔

@qdm12
Copy link
Owner

qdm12 commented Oct 20, 2021

Ok so after some research... There is no way to know what the 'vpn' container was since we just have its ID and it no longer exists (the name is not accessible). I guess it could stop it, but it wouldn't be able to start it again, so that's a bit pointless sadly.

Now on my cascaded restart feature, the idea is that you would put a label on the 'connected' containers indicating the container name of the 'vpn' container. That way, this is feasible. Writing out how it should do it (also for myself):

  1. Stream events and monitor every container starting
  2. For every start event (e.g. vpn starting), get all containers labeled with the name of the container starting
  3. For each container found:
    • If it is NOT a connected container, just restart it
    • If it is a connected container:
      1. Inspect it and get its entire configuration
      2. Extract the expired ID from this config
      3. Use the container ID from the container starting and replace the expired ID with it in the config
      4. Stop the container
      5. Start a new container using the patched config

I have bits and pieces of it ready, I just need to wire everything up and try it out, but it should work fine.

@qdm12
Copy link
Owner

qdm12 commented Oct 24, 2021

So... this previous suggestion, let's call it A, won't work if Deunhealth starts and a VPN container has already been shutdown/restarted and existing containers are disconnected, before deunhealth started. The only solution, call it B, I can think of is to use labels for both the VPN container and the connected containers and not rely on container names. For example have a unique label ID for the 'vpn' container, and use it for all the connected containers.

I also came up with another solution, let's call it C, which is also more complex to implement, only relying on container names (no label), although it has the same problem mentioned above. Here's how it would work (notes to myself as well):

  1. When Deunhealth start, gather all containers that are connected to another container, extract each of the 'vpn' container IDs, and find the corresponding container name for each of these IDs (assuming the VPN container is not gone yet)
  2. Stream events and monitor every start events.
    • Check if the container is container-connected. If it is, extract the 'vpn' container ID ➡️ get its name and keep a state of the id<->name mapping
    • Check if the container name is one of the VPN name from our id<->name mapping. If it is, find all now disconnected containers using the old id (using our mapping), patch all their configurations with the new ID and stop&start them. Update the mapping id<->name.

Solutions comparison

Solution Works on previously disconnected containers at start Works without label for VPN container Works without labels for VPN connected containers Does not need state
A ✔️ ✔️
B ✔️ ✔️
C ✔️ ✔️

Now what solution do you prefer 😄 ????

I'm leaning towards B to have something that works, although it requires more user fiddling.

@EzekialSA
Copy link
Author

Personally I lean towards B as well. Involves more up front config with labels, but it allows for more verbosity with what is connected, forcing the user to make that link.

Solution A, Auto monitoring and logging container information isn't a terrific solution to me.

Solution C, dropping context of containers seems like too much effort, and could cause some issue if someone has multiple stacks with overlapping configured names over a cluster...bad practice, but could cause a headache for someone down the line.

@kubax
Copy link

kubax commented Oct 24, 2021

I pick B. I was elected to lead, not to read! (SCNR)

Labels would be perfectly fine for me.

Also it sounds like a litle less work from your side, with the labels implementation.

@oester
Copy link

oester commented Nov 11, 2021

Another vote for option B.

@lennvilardi
Copy link

+1 for option B and do you know when it will be released ?

@nlynzaad
Copy link

+1 for option B

@qdm12
Copy link
Owner

qdm12 commented Nov 28, 2021

I'm working on it right now! Hopefully we will have something today 😉

EDIT (2021-12-06): still working on it, it's a bit more convoluted than I expected code-spaghetti wise, but it's getting there!

@qdm12
Copy link
Owner

qdm12 commented Nov 29, 2021

Note if the 'network container' (aka the vpn) goes down and doesn't restart, there is no way to restart properly the connected containers since the label won't be anywhere unfortunately. I will make the program log it out as a warning if this happens.

@kubax
Copy link

kubax commented Nov 29, 2021

i'm not sure if i got this right.

you are not able to restart the "child" containers, if the vpn server did kill itself and did not restart, right?

But if the container is updated and did restart without errors, that is still possible to fix with the intended patch?

@lennvilardi
Copy link

In my case I just need to recreate containers attached to the network container when recreated by watchtower. The network container is always up and running but the others containers are orphans and cannot be restarted.

@lennvilardi
Copy link

any eta ?

@ahmaddxb
Copy link

Has this been implemented yet?

@sunbeam60
Copy link

A little late to the party here, but definitely also prefer option B and I'm very excited about this feature.

(yes, my gluetun container got updated by watchtower last night and now the whole stack is down 😄 )

@qdm12
Copy link
Owner

qdm12 commented May 1, 2022

Hello all, good news, I'm working again on this. Sorry for the immense delay I took to get back working on this.
I have some 'new uncommited' code (from like 6 months ago lol) that looks promising, I'm hoping for a solution B implementation soon! 👍

@Manfred73
Copy link

Should this already be working in a current version combined with using deunhealth?
I'm still using an older image of gluetun (v3.28.2) so it doesn't get automatically updated by watchtower.
When it does get updated, connectivity to apps using gluetun is lost (#34).
Or should I still manually update gluetun for now?

@MajorLOL
Copy link

Any update? :)

@STRAYKR
Copy link

STRAYKR commented Aug 4, 2023

I guess Quentin hasn't had time to implement the deunhealth.restart.on.unhealthy=true label yet, or else it's a more difficult task that initially thought? Doesn't work for me yet.

deunhealth log states 0 containers monitored, despite tagging several containers with deunhealth.restart.on.unhealthy=true

2023/08/04 10:44:19 INFO Monitoring 0 containers to restart when becoming unhealthy

I turn my mini-PC media server off every evening. So I've been able to use a shell script that does a docker compose down && docker compose up -d 2 mins after the server first boots up (Quentin recommends running similar as a workaround). This fixes my stack... at least for some hours. Sometimes something breaks, and if if that happens I just power it off and on again! Looking forward to a more robust solution :-)

@NaturallyAsh
Copy link

@STRAYKR Is your deun container in the same yml as gluetun? That was my issue. Logs showed "Monitoring 0 containers" when I added the label to gluetun but deun was in its own yml. When I moved deun to the same yml compose as gluetun and qbittorrent, deun registered the labels and started monitoring the containers. I'm thinking, for my case, that the issue might've been that deun couldn't reach gluetun because it wasn't on the same network.

@nolimitech
Copy link

Hello guys.
It still doesn't work.

`2023/12/30 19:07:39 INFO container qbittorrent (image lscr.io/linuxserver/qbittorrent:latest) is unhealthy, restarting it...
2023/12/30 19:07:43 ERROR failed restarting container: Error response from daemon: Cannot restart container qbittorrent: No such container: 66cfe13371d1b10781c4a0649f96c8a82044f3852a2bbd77524c6f92b1902e35

2023/12/30 19:18:51 INFO container transmission (image lscr.io/linuxserver/transmission:latest) is unhealthy, restarting it...
2023/12/30 19:18:55 ERROR failed restarting container: Error response from daemon: Cannot restart container transmission: No such container: 72a8f02b433e0b443812be3a44171ece10b9cc6191b7d9bcba8fc6cdb012d125`

@STRAYKR
Copy link

STRAYKR commented Jan 1, 2024

@STRAYKR Is your deun container in the same yml as gluetun? That was my issue. Logs showed "Monitoring 0 containers" when I added the label to gluetun but deun was in its own yml. When I moved deun to the same yml compose as gluetun and qbittorrent, deun registered the labels and started monitoring the containers. I'm thinking, for my case, that the issue might've been that deun couldn't reach gluetun because it wasn't on the same network.

Hi @NaturallyAsh, sorry for the delayed response, yes, all config for deun and gluetun is in the same yml docker compose file, I only have the one docker compose file.

@web3dopamine
Copy link

hi guys
Any update on this?

@jaredbrogan
Copy link

Alt Text

Just chiming in to keep this issue at least somewhat active. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests