-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
Describe the bug
The current mechanism for subscribing to net_mgmt events is currently broken, in that code that register for callbacks (net_mgmt_init_event_callback) on certain events will also result in callbacks being run on other, seemingly random events. This also affects the event waiting functions, net_mgmt_event_wait and net_mgmt_event_wait_on_iface.
The core of the problem is that these APIs are defined at the bitmask level, i.e. the internal code is checking bitmasks to determine whether an event matches the requested events:
zephyr/subsys/net/ip/net_mgmt.c
Lines 203 to 204 in 781011b
| !(NET_MGMT_GET_COMMAND(mgmt_event->event) & | |
| NET_MGMT_GET_COMMAND(cb->event_mask)))) { |
The problem is that ALL of the events are defined as integer values, not bitmasks. The following example is from the Wi-Fi layer, but it is the same for all the others:
zephyr/include/zephyr/net/wifi_mgmt.h
Lines 320 to 325 in 781011b
| /** @brief Wi-Fi management events */ | |
| enum net_event_wifi_cmd { | |
| /** Scan results available */ | |
| NET_EVENT_WIFI_CMD_SCAN_RESULT = 1, | |
| /** Scan done */ | |
| NET_EVENT_WIFI_CMD_SCAN_DONE, |
zephyr/include/zephyr/net/wifi_mgmt.h
Lines 360 to 366 in 781011b
| /** Event emitted for Wi-Fi scan result */ | |
| #define NET_EVENT_WIFI_SCAN_RESULT \ | |
| (_NET_WIFI_EVENT | NET_EVENT_WIFI_CMD_SCAN_RESULT) | |
| /** Event emitted when Wi-Fi scan is done */ | |
| #define NET_EVENT_WIFI_SCAN_DONE \ | |
| (_NET_WIFI_EVENT | NET_EVENT_WIFI_CMD_SCAN_DONE) |
As a result, if you register for callbacks like so, net_mgmt_init_event_callback(&cb, wifi_mgmt_event_handler, NET_EVENT_WIFI_CMD_AP_STA_CONNECTED);, where NET_EVENT_WIFI_CMD_AP_STA_CONNECTED == 15, you end up being notified when any of the following events occur, since they all share bits with 15.
NET_EVENT_WIFI_CMD_SCAN_RESULT
NET_EVENT_WIFI_CMD_SCAN_DONE
NET_EVENT_WIFI_CMD_CONNECT_RESULT
NET_EVENT_WIFI_CMD_DISCONNECT_RESULT
NET_EVENT_WIFI_CMD_IFACE_STATUS
NET_EVENT_WIFI_CMD_TWT
NET_EVENT_WIFI_CMD_TWT_SLEEP_STATE
NET_EVENT_WIFI_CMD_RAW_SCAN_RESULT
NET_EVENT_WIFI_CMD_DISCONNECT_COMPLETE
NET_EVENT_WIFI_CMD_SIGNAL_CHANGE
NET_EVENT_WIFI_CMD_NEIGHBOR_REP_RECEIVED
NET_EVENT_WIFI_CMD_NEIGHBOR_REP_COMPLETE
NET_EVENT_WIFI_CMD_AP_ENABLE_RESULT
NET_EVENT_WIFI_CMD_AP_DISABLE_RESULT
NET_EVENT_WIFI_CMD_AP_STA_CONNECTED
Potential Solutions
- Convert events to bitmasks
Converting events from integers to bitmasks (as implied by the API), would resolve the issue.
Unfortunately, the command mask only has space for 16 unique events per layer:
zephyr/include/zephyr/net/net_mgmt.h
Line 44 in 781011b
| #define NET_MGMT_COMMAND_MASK 0x0000FFFF |
The Wi-Fi management layer already defines 17 unique events, and the IPv6 layer defines over 20.
- Convert management event mask to 64 bits, convert events to bitmasks
Converting the event mask to 64 bits would give up to 32 additional possible masks to each layer.
This would be a rather intrusive change, requiring updates to every callback definition to change the API signature to accept the 64bit parameter.
Concerns have also been raise that we will eventually run out of events again: #88495 (comment)
- Convert
net_mgmt_init_event_callbackto register for layers, not events
Update the semantics of net_mgmt_init_event_callback so that instead of registering for individual events, users register for the base layer and perform their own event filtering in the callback. This would allow the event IDs to remain integers, but it a significant API change.
It additionally would make the waiting functions (net_mgmt_event_wait, net_mgmt_event_wait_on_iface) not possible to implement for more than a single event (not that they currently work properly).
Additional context
Initial attempt to fix: #88495