Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate device registration messages when using tedge-log-plugin as a child device #2389

Closed
reubenmiller opened this issue Oct 30, 2023 · 7 comments
Assignees
Labels
bug Something isn't working theme:mqtt Theme: mqtt and mosquitto related topics theme:troubleshooting Theme: Troubleshooting and remote control
Milestone

Comments

@reubenmiller
Copy link
Contributor

reubenmiller commented Oct 30, 2023

Describe the bug

A child device is being registered twice under two different topic names which results in operation sent from Cumulocity not being mapped to the correct te/ topic (or at least not the topic which is being listened to by the "child-device").

The error was discovered during updates to the tedge-demo-container@feat-te-as-child-device project which uses the following container setup:

  • container 1: mosquitto
  • container 2: main device (with the mapper, tedge-agent, tedge-configuration-plugin, tedge-log-plugin etc.)
  • container 3: child device (with tedge-agent, tedge-configuration-plugin, tedge-log-plugin)

Once the child device container starts up, two te/ registration messages for the same @id are present but on different local te topics. An example of the two duplicated registration messages are shown below:

[te/device/child01//] {"@id":"test_20231030_03:device:child01","@type":"child-device","name":"child01"}
[te/device/test_20231030_03:device:child01//] {"@id":"test_20231030_03:device:child01","@type":"child-device","name":"test_20231030_03:device:child01"}

The error seems to be caused when the tedge-log-plugin component is registered as a child device when the tedge-mapper-c8y is not running. When the mapper starts, it creates the erroneous te/device/<child-id>// registration message.

To Reproduce

Given a main device with thin-edge.io connected to Cumulocity IoT, run the following steps:

  1. Open a console and stop the tedge-mapper-c8y service

    sudo systemctl stop tedge-mapper-c8y
    
  2. In a new console, start the tedge-log-plugin as a child device

    tedge-log-plugin --mqtt-device-topic-id device/child1//
    
  3. Go to the previous console, and start the mapper

    sudo systemctl start tedge-mapper-c8y
    
  4. Check the registration messages

    tedge mqtt sub 'te/+/+/+/+'
    

    Inspect the registration messages, and there should NOT be duplicated registration messages for the child1 device for different te/device/<child-id>// topics:

    [te/device/child1//] {"@id":"poppy:device:child1","@type":"child-device","name":"child1"}
    [te/device/poppy:device:child1//] {"@id":"poppy:device:child1","@type":"child-device","name":"poppy:device:child1"}
    

Expected behavior

When using thin-edge.io as a child device, it should only register the @id of the device under one topic on startup, and that topic should be the same as the configured mqtt.device_topic_id setting in the /etc/tedge/tedge.toml.

For example, if mqtt.device_topic_id is set to device/child01//, then the topic should be:

te/device/child01//

And there should not be a second registration message sent to a topic with this pattern:

te/device/test_20231030_03:device:child01//

Screenshots

Environment (please complete the following information):

  • OS [incl. version]: alpine linux 3.18
  • Hardware [incl. revision]: docker
  • System-Architecture [e.g. result of "uname -a"]: aarch64
  • thin-edge.io version [e.g. 0.1.0]: 0.12.1~513+g8edcba2

Additional context

@reubenmiller reubenmiller added the bug Something isn't working label Oct 30, 2023
reubenmiller added a commit to thin-edge/tedge-demo-container that referenced this issue Oct 30, 2023
Signed-off-by: Reuben Miller <reuben.d.miller@gmail.com>
@reubenmiller
Copy link
Contributor Author

reubenmiller commented Oct 31, 2023

Doing some further investigations and it seems to be a problem just in the tedge-log-plugin. I ran the test again but isolated single binaries (e.g. only tedge-agent, or only tedge-log-plugin) to check which component is causing the error:

Only using tedge-log-plugin - FAIL

[te/device/child01/service/tedge-log-plugin] {"@parent":"device/child01//","@type":"service","type":"service"}
[te/device/child01/service/tedge-log-plugin/status/health] {"pid":161,"status":"up","time":"2023-10-31T09:01:54.066112816Z"}
[te/device/child01///cmd/log_upload] {"types":["shell","software-management"]}
[te/device/child01//] {"@id":"test_20231031_16:device:child01","@type":"child-device","name":"child01"}
[te/device/test_20231031_16:device:child01//] {"@id":"test_20231031_16:device:child01","@type":"child-device","name":"test_20231031_16:device:child01"}

It is possible that the registration of the log_upload capability via te/device/child01///cmd/log_upload is done before the child device registration te/device/child01// is causing some issues?

Only using tedge-configuration-plugin - OK

[te/device/child01/service/tedge-configuration-plugin] {"@parent":"device/child01//","@type":"service","type":"service"}
[te/device/child01/service/tedge-configuration-plugin/status/health] {"pid":172,"status":"up","time":"2023-10-31T08:53:10.634069192Z"}
[te/device/child01//] {"@id":"test_20231031_13:device:child01","@type":"child-device","name":"child01"}
[te/device/child01///cmd/config_snapshot] {"types":["system.toml","tedge-configuration-plugin","tedge.toml"]}
[te/device/child01///cmd/config_update] {"types":["system.toml","tedge-configuration-plugin","tedge.toml"]}

Only using tedge-agent - OK

[te/device/child01/service/tedge-agent] {"@id":"test_20231031_14:device:child01:service:tedge-agent","@parent":"device/child01//","@type":"service","name":"tedge-agent","type":"service"}
[te/device/child01/service/tedge-agent/status/health] {"pid":163,"status":"up","time":"2023-10-31T08:57:56.671546286Z"}
[te/device/child01//] {"@id":"test_20231031_14:device:child01","@type":"child-device","name":"child01"}
[te/device/child01///cmd/restart] {}
[te/device/child01///cmd/software_list] {}
[te/device/child01///cmd/software_list/cQD96Q27EjnuojkrGf6CI] {"status":"executing"}
[te/device/child01///cmd/software_update] {}

@reubenmiller
Copy link
Contributor Author

I've found an easier way to reproduce the bug, I'll update the ticket description

@reubenmiller reubenmiller changed the title duplicate device registration messages resulting when using thin-edge.io as a child device duplicate device registration messages when using tedge-log-plugin as a child device Oct 31, 2023
@reubenmiller reubenmiller added theme:plugins Theme: Plugin interfaces or general plugin topics which don't fit in any other themes theme:troubleshooting Theme: Troubleshooting and remote control theme:mqtt Theme: mqtt and mosquitto related topics and removed theme:plugins Theme: Plugin interfaces or general plugin topics which don't fit in any other themes labels Oct 31, 2023
@albinsuresh
Copy link
Contributor

albinsuresh commented Oct 31, 2023

It is possible that the registration of the log_upload capability via te/device/child01///cmd/log_upload is done before the child device registration te/device/child01// is causing some issues?

That was my first assumption as well, but this isn't the case here. It is another message race during the startup. The child device list sync mechanism of the mapper, during its startup, where it looks at the differences between the child device list received from the cloud vs the ones known locally, and any new local ones are registered with the cloud is what's triggering that duplicate registration message. Here is the message race that's happening:

  1. The mapper on restart requests the child device list from the cloud
  2. The cloud sends an empty child list response, as no child devices are registered yet (the log plugin registration, which creates the child device, is not processed by the mapper yet).
  3. While the cloud response(empty child list) is in transit, the mapper processes the log plugin capability message, auto-registering the child device and the service in that process. This step creates the test_20231030_03:device:child01 child directory under /etc/tedge/operations/c8y.
  4. Then it receives the empty child list from the cloud, as that was sent from the cloud before the child device was registered in the previous step.
  5. When the mapper receives the empty child list, it tries register that child device named test_20231030_03:device:child01 found in the operations directory once more.

I'll update this ticket with the fix approach, once I work out something.

@reubenmiller
Copy link
Contributor Author

@albinsuresh Though are you sure that sending the child-device registration message first before any operation registration messages wouldn't fix the situation? Or is it just by chance that I didn't see this error on other components (e.g. tedge-configuration-plugin, tedge-agent etc.)?

@reubenmiller
Copy link
Contributor Author

After further testing it seems to affect all components when running as a child-device.

When starting the following components with different child ids and the tedge-mapper-c8y is stopped, it results in 3 erroneous device registration messages:

TEDGE_RUN_LOCK_FILES=false tedge-agent --mqtt-device-topic-id device/child1//
tedge-configuration-plugin --mqtt-device-topic-id device/child2//
tedge-log-plugin --mqtt-device-topic-id device/child3//

albinsuresh added a commit to albinsuresh/thin-edge.io that referenced this issue Oct 31, 2023
If the child device directory the child device operation directories
follow the default external id naming scheme, they are auto-registered
with the right topic id and name derived from that external id.
@reubenmiller reubenmiller added this to the 0.13.0 milestone Oct 31, 2023
reubenmiller added a commit that referenced this issue Oct 31, 2023
…tory-registration

Fix auto-registration of child device op dirs #2389
@reubenmiller
Copy link
Contributor Author

reubenmiller commented Oct 31, 2023

Merged PR to address the incorrect registration of the child devices.

Add systems tests here:

@reubenmiller reubenmiller self-assigned this Nov 1, 2023
@reubenmiller
Copy link
Contributor Author

QA has thoroughly checked the feature and here are the results:

  • Test for ticket exists in the test suite.
  • QA has tested the bug and could not reproduce it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working theme:mqtt Theme: mqtt and mosquitto related topics theme:troubleshooting Theme: Troubleshooting and remote control
Projects
None yet
Development

No branches or pull requests

2 participants