Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Mesh: Network management #20259

Closed
tsvehagen opened this issue Oct 31, 2019 · 6 comments
Closed

Bluetooth: Mesh: Network management #20259

tsvehagen opened this issue Oct 31, 2019 · 6 comments
Assignees
Labels
area: Bluetooth Mesh area: Bluetooth RFC Request For Comments: want input from the community

Comments

@tsvehagen
Copy link
Collaborator

tsvehagen commented Oct 31, 2019

Introduction

The following is taken from the Bluetooth Mesh Profile specification, 3.10.1 Mesh Network Creation procedure

To create a mesh network, a Provisioner is required. A Provisioner shall generate a network key, provide an IV Index, and allocate a unicast address.
...
The Provisioner can then find unprovisioned devices by scanning for Unprovisioned Device beacons using active or passive scanning. The Provisioner can then provision these devices to become nodes within the mesh network. Once these nodes have been provisioned, the Configuration Client can then configure the nodes by providing them application keys and setting publish and subscribe addresses so that the nodes can communicate with each other.

As I can't find any term in the Mesh specification for the node that uses the Configuration Client, I will refer to it as the Configurator.

Problem description

  1. There is no good separation of the local node's data and the data that is related to the network as a whole, ie network management.
  2. There is a public API for the Configuration Client but no API to actually receive the keys etc. that should be used for the Configuration Client API.
  3. There is no API to get info about the nodes that are part of the network.

Proposed change

Add a network management component that can be used by the Provisioner and Configurator. This component will be responsible for adding and providing access to all information that is needed to manage the network.

This management component should have no need for the local device to be provisioned and it should have no special handling for the local node.

Detailed RFC

The way that network information is stored today is in struct bt_mesh_net and related structures in net.c. The way I see it is that the information stored in there are used for the local node, ie when the local device has been provisioned to a network.

Since we would like to keep the local node data separate from the management data I suggest adding a mgmt.c and mgmt.h. The interface and data structures will probably be quite similar to the ones in net.h.

I also suggest adding a public API that can be used by an application to get all the information needed to manage the network.

Proposed change (Detailed)

So this is just a very rough idea just to get the ball rolling.

struct bt_mesh_mgmt_node {
	u16_t addr;
	u16_t net_idx;
	u8_t  dev_key[16];
	u8_t  num_elem;
};

struct bt_mesh_mgmt_subnet {
	u16_t net_idx;

	bool kr_flag;
	u8_t kr_phase;

	struct {
		u8_t net_key[16];
	} keys[2];
};

struct bt_mesh_mgmt_app_key {
	u16_t net_idx;
	u16_t app_idx;
	struct {
		u8_t id;
		u8_t key[16];
	} keys[2];
};

struct bt_mesh_mgmt_net {
	u32_t iv_index;

	struct bt_mesh_mgmt_node nodes[CONFIG_BT_MESH_MGMT_NODES_COUNT];
	struct bt_mesh_mgmt_subnet subnets[CONFIG_BT_MESH_MGMT_SUBNET_COUNT];
	struct bt_mesh_mgmt_app_key app_keys[CONFIG_BT_MESH_MGMT_APP_KEY_COUNT];
};

extern struct bt_mesh_mgmt_net bt_mesh_mgmt[CONFIG_BT_MESH_MGMT_NET_COUNT];
  • The management API would need to include functions like add/remove/find/list for nodes, subnets and app keys.

  • The struct bt_mesh_node array in struct bt_mesh_net should be removed and this interface should be used instead.

  • The struct bt_mesh_mgmt_node could be extended to actually contain "all" the information about the node composition, e.g. all its models, subscriptions etc.

Dependencies

  • Being a Provisioner or a Configurator would depend on having the network management code.

  • The provisioning code (prov.c) and some access code (access.c) would probably need to be changed to use this new management API to store/ get nodes and keys.

Concerns and Unresolved Questions

  1. Is it okay to implement this as a shared resource between the Provisioner and the Configurator?
  2. What are the needs that a public API would have to provide?

Alternatives

One alternative would be to have on module that the Provisioner uses and one that the Configurator uses. The reason why I think that it might be a good idea to share is that it seems that the Configurator actually needs almost everything that the Provisioner needs.

  • The Provisioner will need to know the address and number of elements of each node to be able allocate an address for new nodes. It also needs to know the network index, network key and IV index to use.
  • The Configurator will need to know at least the Primary Element of each node but it also needs to have access to the device keys and all other keys that are in use in the network.
@tsvehagen tsvehagen added the RFC Request For Comments: want input from the community label Oct 31, 2019
@trond-snekvik
Copy link
Contributor

Overall, I think this is the right approach.

The way I imagine it, this management module would essentially be a database that you store addresses, keys and nodes in. Then, when you call the config client and provisioner APIs, you use the entries in this database as parameters, but you'd still be calling the config client and provisioner APIs directly. Is this what you want this module to cover, or do you think it should wrap the config client and provisioner?

I think we should discuss the scope of what we want to store in this module. Should it be everything we know about the nodes and the network, or just the stuff the other nodes won't be able to tell us if we ask? For instance, should we store all the network keys each node has? Should we store the composition data of each node, so that we can give each model the right application keys? If we don't, how will the user know which model ID it should call bt_mesh_cfg_mod_app_bind on?

Your draft for the structures looks pretty good to me, but there are some details I'd like to highlight:

  • The nodes can be part of several networks. Should they have a list of network indexes, or should we just list which net idx we'll use for communicating with the config server?
  • Should the nodes have a list of app indexes?
  • There should probably be a list of allocated groups and virtual addresses in the network
  • The nodes should probably hold some KR state, as the key refresh state is local to each node's instance of each key. Each net key and app key entry in the node could have an update_pending flag attached to it, that is set in all whitelisted nodes at the start of key refresh?
  • Should we have metadata for the nodes, like a human readable name, the header info of the composition data, and node-global state?

I assume you have some kind of endgame in mind for the configurator here, where it can be used to configure all aspects of the entire mesh. Are you envisioning this as a shell application, where the user is in the loop, or are you looking for a way to establish an automated configurator that can set up a network according to some user defined recipe?

We made an automated configurator for the Nordic Mesh SDK, but I'm not very happy with it, to be honest. The module I linked to works directly on top of the config client, provisioner and management module, and defines the configuration as a set of steps that must be done for each device "type". IMO, the first mistake we made was to make this module a part of the application, and not a generic module with a high level API. Everything it does is generic, and could be covered by an API that closer resembles what the user wants to do ("give all onoff models this application key", "make all nodes use TTL=6"). The second mistake was to make the configuration "process"-centric, instead of data-centric. It shouldn't be "first, add this key to this device, then make this model subscribe to that group", it should be "when the configuration is complete, the device should look like this". I think there's value in an automated configurator, but it should have a high level data-centric API, otherwise it'll make for horribly complex applications.

Regardless of whether the next step is to create a configurator shell application or an automated configurator, I think we need some module that wraps the provisioner, config client and mesh network database, as the API for these modules is very low level and not very close to what I think the users actually want to accomplish.

@tsvehagen
Copy link
Collaborator Author

@trond-snekvik Thanks for the feedback!

The way I see it we are discussing two things here:

  1. A database for storing the keys and nodes to support the more "process"-centric approach
  2. A higher level API to easier to support the data-centric approach

I think these should be implemented as two separate things and since 2 has a dependency on 1, my first goal would be to have this sort of database in place. Solving 1 would also help make a distiction between the local node and the rest of the network as discussed in #17729. Having said that, it is always good to stay one step ahead and think a discussion about 2 can maybe shed some light on specific requirements on 1 :)

Anyway, I think it would make sense to have a small and simple module in the stack itself, ie subsys/bluetooth/mesh, that only handles what is necessary to actually support the different roles/models while still separating the local node data from the network as a whole. There could then be some helper library/module that can save all the other stuff that a more advanced provisioner/configurator would need. I don't think the stack really would need to know about this module though.

Maybe this module should be called something like bt_mesh_netdb instead of bt_mesh_mgmt 🤔


The way I imagine it, this management module would essentially be a database that you store addresses, keys and nodes in. Then, when you call the config client and provisioner APIs, you use the entries in this database as parameters, but you'd still be calling the config client and provisioner APIs directly. Is this what you want this module to cover, or do you think it should wrap the config client and provisioner?

I was thinking you would still call the config client and provisioner API yes. Using the entries as parameters I haven't really thought about...

I think we should discuss the scope of what we want to store in this module. Should it be everything we know about the nodes and the network, or just the stuff the other nodes won't be able to tell us if we ask? For instance, should we store all the network keys each node has? Should we store the composition data of each node, so that we can give each model the right application keys? If we don't, how will the user know which model ID it should call bt_mesh_cfg_mod_app_bind on?

To begin with I think we should just store the necessary info, ie the things that the node can't tell us. Not really sure what that invloves exactly though :p

If we don't, how will the user know which model ID it should call bt_mesh_cfg_mod_app_bind on?

I'm thinking that the "user" here is some sort of network admin that knows about model id's and such.

  • The nodes can be part of several networks. Should they have a list of network indexes, or should we just list which net idx we'll use for communicating with the config server?

Good question. I assume you don't talk about subnets and I don't really understand how it would actually work when a device becomes part of two networks? Can a new network be created with the config client or must the device be provisioned again somehow?

There's nothing preventing two networks from using the same NetKey indices so if a node can be part of two networks a NetKey index would not be enough to select a key so some other network index would be necessary.

I'm thinking that a node on two different networks can be seen as two completely different nodes from a configurator perspective? Is it necessary to know that node X on network A is actually the same physical device as node Y on network B?

  • The nodes should probably hold some KR state, as the key refresh state is local to each node's instance of each key. Each net key and app key entry in the node could have an update_pending flag attached to it, that is set in all whitelisted nodes at the start of key refresh?

Yea that sounds right.

  • Should we have metadata for the nodes, like a human readable name, the header info of the composition data, and node-global state?

I don't think that would need to be part of the stack itself but could be a nice thing in some kind of helper library.

I assume you have some kind of endgame in mind for the configurator here, where it can be used to configure all aspects of the entire mesh. Are you envisioning this as a shell application, where the user is in the loop, or are you looking for a way to establish an automated configurator that can set up a network according to some user defined recipe?

Actually my "endgame" is quite simple. I want to be able to provision and configure a node but I'm totally okay with doing it the way that is done in the configurator that you linked to. For simple use cases like having one type of node with one or two models I think that approach is fine (albeit I haven't done it yet :p).

We made an automated configurator for the Nordic Mesh SDK, but I'm not very happy with it, to be honest. The module I linked to works directly on top of the config client, provisioner and management module, and defines the configuration as a set of steps that must be done for each device "type". IMO, the first mistake we made was to make this module a part of the application, and not a generic module with a high level API.

TBH that was sort of what I was thinking that I would do in my application as well :O. I don't think that there is anything wrong with that but I can see that it could be nice to have a higher level API to ease the work a bit.

Everything it does is generic, and could be covered by an API that closer resembles what the user wants to do ("give all onoff models this application key", "make all nodes use TTL=6").

Sounds interesting, have you thought about what an API like that might look like? I guess you would need to be able to specify some kind of 'filters' and 'actions'?

The second mistake was to make the configuration "process"-centric, instead of data-centric. It shouldn't be "first, add this key to this device, then make this model subscribe to that group", it should be "when the configuration is complete, the device should look like this".

That sounds like a quite advanced implementation. Do you mean something like:

  1. Specify a node template
  2. Reading a configuration from a node
  3. Doing some sort of diff between template and config
  4. Figuring out what actions must be taken
  5. Figure out in what order the actions must be done
  6. Execute those actions in correct order

Wouldn't it be possible to create some sort of data-centric API on top of the API that you have developed?

@trond-snekvik
Copy link
Contributor

Good question. I assume you don't talk about subnets and I don't really understand how it would actually work when a device becomes part of two networks? Can a new network be created with the config client or must the device be provisioned again somehow?

I'm sorry, I think I used outdated terminology, I actually meant subnets. I agree with you, we don't need to keep track of a device's presence in multiple networks. Also, if the scope is just storing the stuff we need to know about the nodes, we probably just need to know the network index we want to use with the device key for configuration, like in your draft.

Sounds interesting, have you thought about what an API like that might look like?

I made a prototype with this API over a year ago, but it ended up like more of a weekend project than anything.
I tried a couple of approaches, but a variation of this API showed the most promise. The general idea is to create a set of node templates tied to specific URIs. During provisioning, the provisioner will use the unprovisioned beacon's URI to determine which node this is, and assign it to a cfg_node object. After provisioning, we'll go through the element entries, and issue the required config client commands to make the mesh node match its template. We'll scan through the models to determine which subnets we need, and send them to the node. Then, we'll determine which appkeys we need, and send those, before we start configuring the models, heartbeat and other parameters.

The application code for this API mirrors the composition data API pretty well, which is a big advantage 🙂 I didn't really see the need for an API between this one and the config client when I implemented it, but I didn't bring this all the way to a release, so I probably definitely didn't consider all aspects. It would add some flexibility, particularly when changing configurations.

I think I'm derailing a bit here though, this first step should focus on the database module.

The only thing I can think of that hasn't been mentioned yet is persistent storage. @jhedberg any comments on this RFC?

@tsvehagen
Copy link
Collaborator Author

Hey, sorry for the late answer. I've started doing some initial implementation and it seems to be pretty easy to separate the stack from the db at least. I've ran into some questions quite quickly though.

  1. Should we support handling of multiple networks? Provisioning to multiple networks might be possible but if we want to configure multiple networks I guess the whole stack needs to support it which feels like quite a big change. Do you think it is a common use case that one device acts as provisioner or configurator for multiple networks?

  2. It relates to persistent storage like @trond-snekvik mentions. What could be a good storage strategy here? Should we continue to store the db under bt/mesh/? Do we need to use the pending strategy for the db?

  3. Just a general question about naming. I see some functions in the stack use _find(), e.g. bt_mesh_app_key_find(u16_t app_idx) and some use _get(), e.g. bt_mesh_subnet_get(u16_t net_idx). The same is true for _alloc() and _create(). Is there any convention that is used in the stack?

@jhedberg
Copy link
Member

jhedberg commented Dec 9, 2019

Should we support handling of multiple networks?

@tsvehagen I'd exclude this until someone comes with a really strong use case. Zephyr is intended for constrained devices, whereas multiple networks sounds like something you'd have on a more powerful system (e.g. something capable of running Linux).

I see some functions in the stack use _find(), e.g. bt_mesh_app_key_find(u16_t app_idx) and some use _get(), e.g. bt_mesh_subnet_get(u16_t net_idx).

I might not have done a very good job here... to me the difference is mainly in terms of perceived probability of success, i.e. find() implies a higher probability of the lookup failing, whereas get() means that it will likely succeed. That's without looking at the code though - it's possible this doesn't match reality, and I apologise for the inconsistency in that case. For new code I'd suggest to go with what feels most intuitive and we can then discuss it in the PR if there's disagreement.

The same is true for _alloc() and _create(). Is there any convention that is used in the stack?

At least in my mind alloc() is mainly for allocating the needed memory for the object, whereas create() is assumed to do more extensive initialisation as well. But again, I might not have been completely consistent with this.

@tsvehagen
Copy link
Collaborator Author

Please move the conversation to #21544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Mesh area: Bluetooth RFC Request For Comments: want input from the community
Projects
None yet
Development

No branches or pull requests

3 participants