-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network and cloud diagnostics #1424
Conversation
CLOUD_DISCONNECT_REASON_ERROR = 1, // Disconnected due to an error | ||
CLOUD_DISCONNECT_REASON_USER = 2, // Disconnected at the user's request | ||
CLOUD_DISCONNECT_REASON_NETWORK_DISCONNECT = 3, // Disconnected due to the network disconnection | ||
CLOUD_DISCONNECT_REASON_LISTENING = 4 // Disconnected due to the listening mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the network reports disconnect due to listening mode, do we need to duplicate it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Electron doesn't seem to disconnect the network when entering the listening mode: https://github.com/spark/firmware/blob/feature/net_cloud_diag/system/src/system_network_cellular.h#L36 (it disconnects the cloud though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see..hmm that's quite strange. Ok we can leave it as is for now. Going forward, when we have AP mode implemented that allows AP and STA modes to operate concurrently, then the cloud will not be disconnected either when entering listening mode.
system/src/system_network_internal.h
Outdated
class NetworkDiagnostics { | ||
public: | ||
enum Status { | ||
TURNED_OFF = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make an encoding here. Use bit 0 to indicate if the state is reached or in transition: 0 means the state is reached, 1 means in transition. And then we can use the upper bits to encode the states ON, CONNECTED, DISCONNECTED. OFF as values 2,4,6, and 8..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but what's the use case of such an encoded status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a little more consistent, and makes the code to determine if we are transitioning vs stable easier to write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think such an encoding would only complicate things, as the DISCONNECTED
state, for example, can be reached from two states: TURNING_ON
and DISCONNECTING
, i.e. we actually need 2 additional bits to encode a transitional state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the same thing - I'm not asking for us to encode the state we have come from, just to distinguish between an active transition and a final state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. just use odd numbers to encode transitional states without any relation to source/destination states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! exactly that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds good 👍
@@ -416,13 +483,19 @@ class ManagedNetworkInterface : public NetworkInterface | |||
WLAN_CONNECTING = 1; | |||
INFO("ARM_WLAN_WD 1"); | |||
ARM_WLAN_WD(CONNECT_TO_ADDRESS_MAX); // reset the network if it doesn't connect within the timeout | |||
const auto diag = NetworkDiagnostics::instance(); | |||
diag->status(NetworkDiagnostics::CONNECTING); | |||
system_notify_event(network_status, network_status_connecting); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure if we should be explicity setting the network diagnostic status, or instead register a listener to the system event, and update the diagnostic status from that. While it might be cleaner to use the event, I feel we may have a limitation on how many listeners we can attach to the system event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep the diagnostics stuff separated from the system event logic for now, since the current eventing system is not designed for anything else than notifying the application code of system events (all handlers are processed in the application thread, user handlers may introduce an arbitrary delay, etc). Definitely it would be nice to track all the states defined by the system in a centralized manner, it just feels more like a task that should be done as part of the system loop refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@sergeuz - could you fix up the conflicts please! |
62e9a61
to
de88de9
Compare
Done! |
…r of unexpected disconnections, last connection error
de88de9
to
20b7912
Compare
system/inc/system_cloud.h
Outdated
@@ -85,17 +85,17 @@ String spark_deviceID(void); | |||
extern "C" { | |||
#endif | |||
|
|||
enum cloud_disconnect_reason { | |||
CLOUD_DISCONNECT_REASON_OTHER = 0, | |||
typedef enum cloud_disconnect_reason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a cloud disconnect reason CLOUD_DISCONNECT_NETWORK_DOWN
or similar?
Problem
This PR implements the following diagnostics:
Network
Cloud
Steps to Test
The testing can be tricky since it can only be done manually in most of the cases. A typical testing procedure for an enum diagnostic data source would be to recreate a number of situations specific to the diagnostic and ensure that all enum values defined by the spec get reported correctly depending on a situation. For the error code diagnostics it's probably easier to fake result codes returned by WLAN/Cellular HAL and the comms library directly in the code.
I'm attaching a simple application that dumps the diagnostic data every 5 seconds to start with.
Example App
References
Completeness