Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Entity registration HTTP API #3230

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

albinsuresh
Copy link
Contributor

@albinsuresh albinsuresh commented Nov 6, 2024

Proposed changes

  • Refactor file-transfer logic into its own module
  • Add entity store endpoints to existing tedge HTTP server
  • Integrate the HTTP endpoints with the entity store
  • Refactor file transfer API logic to its own module
  • Publish entities registered via HTTP APIs to the MQTT broker
  • Accept registration messages via MQTT as well
  • Remove entity_store usage from the mappers, extracting only the external ID aspects from it
  • Handle auto-registration
  • Rename existing file_transfer_server module into tedge_server

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link
Contributor

github-actions bot commented Nov 13, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
549 1 4 550 99.82 1h27m40.323834s

Failed Tests

Name Message ⏱️ Duration Suite
Heartbeat is sent based on the custom health topic status Matching messages on topic 'c8y/inventory/managedObjects/update/TST_browse_vertical_hint' is less than minimum. wanted: 2 got: 1 messages: ['{}'] 106.939 s Heartbeat

@@ -1,4 +1,6 @@
//TODO Rename this module to tedge_server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer http_server, stressing this is about HTTP while tedge is obvious in the context of tedge-agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be renamed as suggested, right before merging this PR.

pub mod actor;
pub mod error;
mod file_transfer_controller;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply file_transfer?

Copy link
Contributor Author

@albinsuresh albinsuresh Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it as you suggested.

It actually started with entity_store_controller, as I didn't want to name it entity_store to differentiate this from the original entity store and hence ended up adding the controller suffix. Did the the same for file_transfer as well.

But, now I'm wondering if naming them file_transfer_service and entity_store_service would have been clearer/better.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move the responsibilities of managing the entity store from the file_transfer_service to the entity_manager, forwarding HTTP entity registration to the latter.

crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
crates/core/tedge_agent/src/agent.rs Outdated Show resolved Hide resolved
match EntityRegistrationMessage::try_from(&input) {
Ok(register_message) => {
info!("Hippo: Received reg message: {:?}", register_message);
if let Err(e) = self.entity_store.lock().unwrap().update(register_message) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing the implementation of this actor - that is really minimal, I wonder if it wouldn't be better to swap the roles of the EntityManagerActor and the file_transfer_server::entity_store ?

I.e.:

  • keep the file_transfer_server module mostly focused on file transfer with only one extra responsibility that is to forward entity registration to the entity manager.
  • give to the entity manager actor the plain responsibility of entity registration and persistence.

So:

  • the responsibilities better match the actor names
  • the use of mutex makes more sense as on the HTTP side a response is expected, while a channel would be enough with the current design as no response is expected by the MQTT clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 20f71f4

/// Creates a entity metadata for the main device.
pub fn main_device(device_id: String) -> Self {
Self {
topic_id: EntityTopicId::default_main_device(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out-of-scope of this PR, but we must really find a way to support non default topic-id schema.

}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct EntityMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now two slightly different definitions of struct EntityMetadata; the former definition has been kept in entity_store.rs.

What's the plan?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #3230 (comment), the entity store and entity cache needs to maintain their own versions of entity metadata as the latter would be much lighter with minimal metadata maintained at the mapper level. Ideally the EntityMetadata maintained by the mapper should have been as minimal as the XID. But, I was forced to store a bit more info like the entity type, display name and display type just because of the way we map service health status messages by publishing the service creation message itself(which requires all that metadata) for status updates as well. One way to eliminate caching this additional metadata at the mapper level is by making it query the entity store contents from the agent, whenever it requires all that metadata. But that extra HTTP call during the mapping complicates the conversion logic and I just chose caching extra metadata instead of that added complexity. But, open to revisit that decision in a follow-up PR dedicated for that change as the unit test impact would be quite big.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that extra HTTP call during the mapping complicates the conversion logic and I just chose caching extra metadata instead of that added complexity.

Agree, caching registration messages is simpler than late-requests to the entity store.

But could we share the type of cached data? Say raw registration messages? Even if some fields are not currently used in the mapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But could we share the type of cached data? Say raw registration messages? Even if some fields are not currently used in the mapper.

Yeah, that is an option. I introduced a different struct to clearly represent the minimal set of things that a mapper would need to do the mapping. And also, there were minor things like the external id (@id) being an Option in the actual EntityRegistrationMessage, but the same being a mandatory field in the entity cache. But yeah, if we want to avoid having too many "metadata representations" across the crates, I can reuse some of the existing ones with implicit assumptions about fields like @id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really have to reduce the number of similar but different structs.

crates/core/tedge_api/src/entity_store.rs Outdated Show resolved Hide resolved
@@ -3,7 +3,7 @@ use std::collections::VecDeque;

/// A bounded buffer that replaces older values with newer ones when full
#[derive(Debug)]
pub(crate) struct RingBuffer<T> {
pub struct RingBuffer<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One has to avoid making public internal types. This pub is required only because PendingEntityStore ::telemetry_cache is pub. Both can be pub(crate).

Suggested change
pub struct RingBuffer<T> {
pub(crate) struct RingBuffer<T> {

@@ -149,6 +152,9 @@ pub enum CumulocityMapperError {
#[error(transparent)]
FromEntityStore(#[from] tedge_api::entity_store::Error),

#[error(transparent)]
FromEntityCacheError(#[from] crate::entity_cache::Error),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to maintain two error types with the same error cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors emitted by the entity cache would be similar, but a much smaller set compared to the entity store, as it maintains far less metadata. That's why I didn't want them to share the same error type. Eventually I see the entity store structure moving from tedge_api to tedge_agent and the entity cache being mainained by the tedge_mapper. So, avoiding any direct dependency between them would be better for that move.

NonDefaultTopicScheme(EntityTopicId),
}

pub(crate) struct EntityCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment telling the purpose of this cache would be really helpful.

  • What the difference of this cache compared to the entity store?
  • How both cooperate?
  • What's the content of the pending entities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll be adding a detailed comment for the EntityCache and the newly introduced EntityMetadata structs. Here is the quick summary though:

What the difference of this cache compared to the entity store?

While EntityStore, owned by the agent, maintains all the metadata associated with an entity like its registration message, device twin data, supported operations etc, the EntityCache was envisaged as a lightweight map maintained by the mappers to maintain the external id mapping and a minimal set of additional metadata required to do the appropriate mapping to the cloud twins.

How both cooperate?

The entity cache is updated every time a new entity is successfully registered with the entity store(when an entity registration message is emitted either directly by the client or by the agent in response to a registrartion attempt over HTTP). When the mapper receives an entity registration message either way, it assumes that it was successfully registered and adds it to its entity cache.

What's the content of the pending entities?

When the mapper receives an entity registration message with a parent, and if that parent is not registered yet at that point, it is added to the pending entity store and such pending entities are added to the entity cache only when their parent is registered. So, whenever a new entity is registered, the mapper checks its entity cache to find any of its children that arrived earlier, waiting to be registered and registers them along with the parent.

crates/extensions/c8y_mapper_ext/src/service_monitor.rs Outdated Show resolved Hide resolved
Comment on lines +401 to +400
let file_transfer_server_builder = FileTransferServerBuilder::try_bind(
self.config.http_config,
&mut entity_store_actor_builder,
&mut mqtt_actor_builder,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see the file_transfer_server listening on both HTTP and MQTT.

I would expect the entity_store_actor to listen to MQTT plus a channel of requests forwarded by the file_transfer_server.

  • What has been your drivers for this design?
  • Is this because on the HTTP side a response is expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has been your drivers for this design?

It might be just a naming confusing during this transition period. Even though the entire HTTP code is currently kept under the module named file_transfer_server, the plan is to rename this module into http_server and associated structs into HttpServerBuilder,HttpServerActor, etc. Then this HttpActor would delegate HTTP requests to file_transfer_service and entity_store_service respectively. When that happens, the file_transfer_service will no have any MQTT aspects in it.

Copy link
Contributor

@didier-wenzek didier-wenzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see that you reach a point with a working implementation. Moving the entity store from the mapper to the mapper was really not an easy task.

I'm still a bit confused by:

  • actor naming and branching in the agent
  • the two definitions of struct EntityMetadata
  • the two similar error types c8y_mapper_ext::CumulocityMapperError and c8y_mapper_ext::ConversionError

Comment on lines -316 to +318
mqtt.skip(1).await; // Skip child device registration messages
let msg = mqtt.recv().await.unwrap();
dbg!(&msg);
// mqtt.skip(1).await; // Skip child device registration messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed before merge.

EntityStoreResponse::Delete(deleted_entities)
}
EntityStoreRequest::MqttMessage(mqtt_message) => {
self.process_mqtt_message(mqtt_message).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that there is a lot more to do when a message is received from MQTT vs HTTP.

Why that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is in the "pending entities" caching aspect. When the registration is done over MQTT, due to the lack of feedback to the client, we are forced to cache any early child reg messages, until the parents are also registered. But, with the HTTP, when a client tries to register before its parent, we can always immediately send the response back that the parent is not available so that the client can retry later. We don't need to do caching in this case. But, we can if we want to.

@albinsuresh
Copy link
Contributor Author

  • actor naming and branching in the agent

Yes, we can discuss the details. Renaming some of the modules are already planned as described here

  • the two definitions of struct EntityMetadata

The EntityMetadata struct used by the entity_cache was envisaged to be a lot simpler than what is it today. Now that both the structs are awfully similar barring some minor differences, I'll just re-use the existing one itself.

  • the two similar error types c8y_mapper_ext::CumulocityMapperError and c8y_mapper_ext::ConversionError

Yeah, this has been a legacy issue that can be fixed. But, I'd prefer to do that in a separate PR as this PR is already huge and resolving merge conflicts is a pain every time someone else touches the mapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants