Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tedge-agent: move to new health topic #2271

Closed
wants to merge 1 commit into from

Conversation

Bravo555
Copy link
Contributor

@Bravo555 Bravo555 commented Sep 19, 2023

TODO

  • add test checking tedge-agent works with new topics
  • figure out how to handle health command not having a command_id

Proposed changes

tedge-agent was updated to work with new health topics:

  • it listens to health check command on a new topic: te/[DEVICE_TOPIC_ID]/service/tedge-agent/cmd/health/check
  • it publishes its health status on a new topic: `te/[DEVICE_TOPIC_ID]/service/tedge-agent/status/health

Converter

However, to still be observable and work with other components and the tests, the converter was modified to properly bridge new inputs and output topics of tedge-agent:

  • tedge/health-check and tedge/health-check/SERVICE_NAME are converted into te/device/main///cmd/health/check and te/device/main/service/SERVICE_NAME/cmd/health/check respectively, so that tedge-agent receives a health check command on a topic it expects
  • te/device/main/service/tedge-agent/status/health and te/device/CHILD_ID/service/tedge-agent/status/health are converted into tedge/health/tedge-agent and tedge/health/CHILD_ID/tedge-agent respectively.
  • in order not to cause a loop, a previous tedge/health/SERVICE_NAME -> te/... conversion was modified to not happen if SERVICE_NAME is "tedge-agent"

tedge-api

A new type, ServiceHealthTopic was introduced to denote health topics of services. Currently it is a simple string wrapper, handling both new and old health topics, but after all components are moved, it will handle only new topics.

ServiceTopicId and DeviceTopicId were added as simple wrappers of EntityTopicId that are confirmed to be either a service or a device.

Because te/device/main/service/tedge-agent/cmd/health/check doesn't have a command_id, an OperationType struct was modified to make command_id optional.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#2018 (comment)

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

tedge-agent was updated to work with new health topics:

- it listens to health check command on a new topic:
  `te/[DEVICE_TOPIC_ID]/service/tedge-agent/cmd/health/check`
- it publishes its health status on a new topic:
  `te/[DEVICE_TOPIC_ID]/service/tedge-agent/status/health

*Converter*

However, to still be observable and work with other components and the
tests, the converter was modified to properly bridge new inputs and
output topics of tedge-agent:

- `tedge/health-check` and `tedge/health-check/SERVICE_NAME` are
  converted into `te/device/main///cmd/health/check` and
  `te/device/main/service/SERVICE_NAME/cmd/health/check` respectively,
  so that tedge-agent receives a health check command on a topic it
  expects
- `te/device/main/service/tedge-agent/status/health` and
  `te/device/CHILD_ID/service/tedge-agent/status/health` are converted
  into `tedge/health/tedge-agent` and
  `tedge/health/CHILD_ID/tedge-agent` respectively.
- in order not to cause a loop, a previous `tedge/health/SERVICE_NAME`
  -> `te/...` conversion was modified to not happen if SERVICE_NAME is
  "tedge-agent"

*tedge-api*

A new type, `ServiceHealthTopic` was introduced to denote health topics
of services. Currently it is a simple string wrapper, handling both new
and old health topics, but after all components are moved, it will
handle only new topics.

`ServiceTopicId` and `DeviceTopicId` were added as simple wrappers
of `EntityTopicId` that are confirmed to be either a service or a
device.

Because `te/device/main/service/tedge-agent/cmd/health/check` doesn't
have a command_id, an OperationType struct was modified to make
command_id optional.

Signed-off-by: Marcel Guzik <marcel.guzik@inetum.com>
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Merging #2271 (9e36561) into main (179fbb9) will decrease coverage by 0.4%.
Report is 8 commits behind head on main.
The diff coverage is 51.8%.

Additional details and impacted files
Files Changed Coverage Δ
crates/core/tedge_agent/src/agent.rs 0.0% <0.0%> (ø)
crates/core/tedge_api/src/mqtt_topics.rs 68.8% <18.7%> (-10.3%) ⬇️
crates/extensions/tedge_health_ext/src/lib.rs 44.5% <28.5%> (-45.7%) ⬇️
crates/core/tedge_api/src/health.rs 38.8% <57.6%> (-32.6%) ⬇️
...tedge_agent/src/tedge_to_te_converter/converter.rs 82.6% <76.8%> (-10.0%) ⬇️
crates/extensions/tedge_health_ext/src/actor.rs 58.3% <85.7%> (+5.9%) ⬆️
crates/extensions/c8y_mapper_ext/src/converter.rs 79.9% <100.0%> (ø)
crates/extensions/c8y_mapper_ext/src/log_upload.rs 75.8% <100.0%> (ø)
crates/extensions/c8y_mapper_ext/src/tests.rs 90.4% <100.0%> (ø)

... and 7 files with indirect coverage changes

@github-actions
Copy link
Contributor

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
272 1 5 273 99.63 52m9.919s

Failed Tests

Name Message ⏱️ Duration Suite
Test if all c8y services are down AssertionError 133.078 s Service Monitoring

///
/// It's most often in a format `device/DEVICE_NAME/service/SERVICE_NAME`.
#[derive(Debug, Clone, Hash, PartialEq, Eq)]
pub struct ServiceTopicId(EntityTopicId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a type for ServiceTopicId? I don't think so: an entity topic id can be used for services as well as for devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in some places (e.g. health checks) we do actually want to operate on services, and so to make sure a given topic is a service topic, we need to either look at it (if in default MQTT scheme) or call into entity store (if not). IMO it'd be nice not to write a guard every time to make sure that it is a service.
If I'm mistaken about there being more places where a function expects an entity topic id to be a service, then I can remove it for now and do it with plain ServiceTopicIds.

Comment on lines +48 to 57
// to be able to move different services to new health topic at different times, we
// selectively map services either from old to new topics, and vice versa.
// tedge-agent publishes on new health topic, exclude it from old mapping not to produce
// a publish loop
topic
if topic.name.starts_with("tedge/health")
&& !topic.name.contains("tedge-agent") =>
{
self.convert_health_status_message(message)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not be simpler to update all the services in a single PR?

Or even simpler:

  • to start with the service consuming health check messages (the mapper and tedge-watchdog, so all the health status are consumed from the te topics.
  • to update one after the other the services (so they publish health status on the te topics).

I realized for that to work, one needs to update the tedge_to_te_converter to also converts health check requests. So, we will be able to update the services (as the agent) one after the other so they react to te/+/+/+/+/cmd/health/check.

The very last step will be to update tedge-watchdog to send health check requests on the te topics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it would be much simpler and it's something that we have to do anyway, so better to do it now.

@Bravo555
Copy link
Contributor Author

Decided to first update tedge-mapper to new health topics, work on this PR is paused until #2276 is merged.

@Bravo555 Bravo555 mentioned this pull request Sep 25, 2023
12 tasks
@Bravo555
Copy link
Contributor Author

Much of what was done here was done in #2276 instead, making this PR obsolete.

@Bravo555 Bravo555 closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants