Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No connectivity (V1.3.1) #3250

Open
bits-bytes opened this issue Nov 15, 2024 · 4 comments
Open

No connectivity (V1.3.1) #3250

bits-bytes opened this issue Nov 15, 2024 · 4 comments
Labels
bug Something isn't working theme:containers Theme: Containerisation topics

Comments

@bits-bytes
Copy link

Hi,
Honestly, I’m not sure if this issue is related to tedge or something else, like docker. I thought I’d start here to get the specialists' opinions.

Describe the bug
I have tedge running in a docker environment with Portainer and two additional containers. All containers (except for Portainer) use the same network, with IP addresses ranging from 172.18.0.2 to 172.18.0.4. Strangely, although the startup process should be consistent, each container repeatedly receives different IP addresses.

There are three systems, all using the same EDGE device and running exactly the same software. However, one machine occasionally fails to function. I suspect that tedge might be causing this issue. It is not always the same machine, any machine possible.

I've just discovered that, in such cases, the IP address of the tedge container can't even be reached with a ping from other containers. The only solution seems to be deleting and recreating the container, which temporarily resolves the issue.

To Reproduce
I don't have a specific procedure to reproduce the issue. The machines are powered on every morning, and while this situation doesn’t occur every time, randomly one of the machines experiences this problem.

Expected behavior
Connect to the tenant, whatever.

Screenshots

Environment (please complete the following information):

  • OS [incl. version] - Alpine 3.18.9 (?)

  • Hardware [incl. revision] - Phönix VL3

  • System-Architecture [e.g. result of "uname -a"]
    Linux 3f4154b4cd33 6.8.0-48-generic 48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 Linux

  • thin-edge.io version [e.g. 1.3.1]

Additional context

Thank you for your time and attention.
Regards, Manfred

@bits-bytes bits-bytes added the bug Something isn't working label Nov 15, 2024
@reubenmiller
Copy link
Contributor

reubenmiller commented Nov 15, 2024

@bits-bytes yeah it definitely seems like a container networking issue, though we're more than happy to discuss it here as I'm sure you won't be the only person running into this issue. In light of that, can you provide more details about your setup:

  • Your docker-compose.yaml file (if you're using one), or the docker run command your using to launch the container
  • Container inspect output for both the happy/unhappy container instances (e.g. docker inspect <container_id>)
  • Are there any firewall rules in place on the device which could be blocking specific ip addresses?
  • Are the different thin-edge.io instances connecting to the same container engine, or is it one container engine per instance/machine?

@reubenmiller reubenmiller added the theme:containers Theme: Containerisation topics label Nov 15, 2024
@bits-bytes
Copy link
Author

bits-bytes commented Nov 25, 2024

Hello,

I apologize for the late response.

Unfortunately, I no longer have the broken container running. What I do have are:

  1. The log file from the run
  2. A text comparison from container inspect between the faulty container (left side) and the container after it was recreated and is now functioning.
    Please excuse the format.

What stands out, though, is that — even though the container was restarted — the log indicates that a bridge already exists.

Also, the “Binds” in container inspect differ, which I think shouldn’t be the case.

Can anyone see the connections here?

Thank you ,
Regards, Manfred
Log-Faulty.txt

@bits-bytes
Copy link
Author

bits-bytes commented Nov 25, 2024

To answer the remaining questions:

Your docker-compose.yaml file (if you're using one), or the docker run command you're using to launch the container

We’re using prebuilt .tar images. The attached files are used for their creation.

Are there any firewall rules in place on the device which could be blocking specific IP addresses?

No, there are no firewall rules in place that would block specific IP addresses.

Are the different tedge instances connecting to the same container engine, or is it one container engine per instance/machine?

It’s one container per machine.
hmic-tedge-opcua-f5s.zip

@reubenmiller
Copy link
Contributor

There are a few things to unpack here, though nothing super obvious just yet.

What stands out, though, is that — even though the container was restarted — the log indicates that a bridge already exists.

This isn't that surprising as you're using docker and not kubernetes. Docker (and podman) will generally persiste the container's file system across container restarts, so if the bridge files under /etc/tedge/mosquitto-conf/ have been created on the first boot, if you restart the container, then they will still exist...though I still don't see this as a problem (though it might be better to do a tedge reconnect instead of tedge connect in the init script (e.g. cont-init.d/50_configure.sh)...that would at least get ride of that warning (as it would always recreate all of the bridge settings on each container start up.

Also, the “Binds” in container inspect differ, which I think shouldn’t be the case.

Yes having different mount config under the HostConfig definitely sounds a bit strange, though I'm not sure if that is problematic or not, I'd have to research to see in what scenarios that can occur. Did you happen to do any self updates? From the data you have provided it seems that you are deploying the containers using docker compose (as there are com.docker.compose.project labels on the container)...maybe the self update mechanism is conflicting with some aspect of docker compose (maybe portainer runs periodically docker compose up -d unexpectedly?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working theme:containers Theme: Containerisation topics
Projects
None yet
Development

No branches or pull requests

2 participants