Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network and cloud diagnostics #1424

Merged
merged 9 commits into from
Jan 22, 2018
Merged

Network and cloud diagnostics #1424

merged 9 commits into from
Jan 22, 2018

Conversation

sergeuz
Copy link
Member

@sergeuz sergeuz commented Nov 2, 2017

Problem

This PR implements the following diagnostics:

Network

  • Connection status (ID 8)
  • Last connection error (ID 9)
  • Number of unanticipated disconnects (ID 12)
  • Number of connection attempts (ID 27)
  • Disconnection reason (ID 28)

Cloud

  • Connection status (ID 10)
  • Last connection error (ID 13)
  • Number of unanticipated disconnects (ID 14)
  • Number of connection attempts (ID 29)
  • Disconnection reason (ID 30)

Steps to Test

The testing can be tricky since it can only be done manually in most of the cases. A typical testing procedure for an enum diagnostic data source would be to recreate a number of situations specific to the diagnostic and ensure that all enum values defined by the spec get reported correctly depending on a situation. For the error code diagnostics it's probably easier to fake result codes returned by WLAN/Cellular HAL and the comms library directly in the code.

I'm attaching a simple application that dumps the diagnostic data every 5 seconds to start with.

Example App

#include "application.h"

SYSTEM_THREAD(ENABLED)

namespace {

const Serial1LogHandler logHandler(115200, LOG_LEVEL_WARN, {
    { "app", LOG_LEVEL_ALL }
});

unsigned logMillis = 0;
unsigned logCount = 0;

bool logDiagnosticData(void* appender, const uint8_t* data, size_t size) {
    Log.write(LOG_LEVEL_INFO, (const char*)data, size);
    return true;
}

} // namespace

void setup() {
    logMillis = millis();
}

void loop() {
    if (millis() - logMillis >= 5000) {
        ++logCount;
        Log.printf(LOG_LEVEL_INFO, "%u: ", logCount);
        system_format_diag_data(nullptr, 0, 0, logDiagnosticData, nullptr, nullptr);
        Log.print(LOG_LEVEL_INFO, "\r\n");
        logMillis = millis();
    }
}

References

  • [CH8761]

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation
  • Added to CHANGELOG.md after merging (add links to docs and issues)

@sergeuz sergeuz closed this Nov 2, 2017
@sergeuz sergeuz added this to the 0.8.0 milestone Nov 2, 2017
@sergeuz sergeuz reopened this Nov 2, 2017
CLOUD_DISCONNECT_REASON_ERROR = 1, // Disconnected due to an error
CLOUD_DISCONNECT_REASON_USER = 2, // Disconnected at the user's request
CLOUD_DISCONNECT_REASON_NETWORK_DISCONNECT = 3, // Disconnected due to the network disconnection
CLOUD_DISCONNECT_REASON_LISTENING = 4 // Disconnected due to the listening mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the network reports disconnect due to listening mode, do we need to duplicate it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Electron doesn't seem to disconnect the network when entering the listening mode: https://github.com/spark/firmware/blob/feature/net_cloud_diag/system/src/system_network_cellular.h#L36 (it disconnects the cloud though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see..hmm that's quite strange. Ok we can leave it as is for now. Going forward, when we have AP mode implemented that allows AP and STA modes to operate concurrently, then the cloud will not be disconnected either when entering listening mode.

class NetworkDiagnostics {
public:
enum Status {
TURNED_OFF = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make an encoding here. Use bit 0 to indicate if the state is reached or in transition: 0 means the state is reached, 1 means in transition. And then we can use the upper bits to encode the states ON, CONNECTED, DISCONNECTED. OFF as values 2,4,6, and 8..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but what's the use case of such an encoded status?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a little more consistent, and makes the code to determine if we are transitioning vs stable easier to write.

Copy link
Member Author

@sergeuz sergeuz Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think such an encoding would only complicate things, as the DISCONNECTED state, for example, can be reached from two states: TURNING_ON and DISCONNECTING, i.e. we actually need 2 additional bits to encode a transitional state

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the same thing - I'm not asking for us to encode the state we have come from, just to distinguish between an active transition and a final state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. just use odd numbers to encode transitional states without any relation to source/destination states?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! exactly that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good 👍

@@ -416,13 +483,19 @@ class ManagedNetworkInterface : public NetworkInterface
WLAN_CONNECTING = 1;
INFO("ARM_WLAN_WD 1");
ARM_WLAN_WD(CONNECT_TO_ADDRESS_MAX); // reset the network if it doesn't connect within the timeout
const auto diag = NetworkDiagnostics::instance();
diag->status(NetworkDiagnostics::CONNECTING);
system_notify_event(network_status, network_status_connecting);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if we should be explicity setting the network diagnostic status, or instead register a listener to the system event, and update the diagnostic status from that. While it might be cleaner to use the event, I feel we may have a limitation on how many listeners we can attach to the system event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the diagnostics stuff separated from the system event logic for now, since the current eventing system is not designed for anything else than notifying the application code of system events (all handlers are processed in the application thread, user handlers may introduce an arbitrary delay, etc). Definitely it would be nice to track all the states defined by the system in a centralized manner, it just feels more like a task that should be done as part of the system loop refactoring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@m-mcgowan m-mcgowan changed the base branch from develop to release/v0.7.0-rc.4 November 6, 2017 21:32
@m-mcgowan m-mcgowan changed the base branch from release/v0.7.0-rc.4 to develop November 6, 2017 21:32
avtolstoy added a commit that referenced this pull request Nov 9, 2017
@avtolstoy avtolstoy mentioned this pull request Nov 9, 2017
6 tasks
m-mcgowan pushed a commit that referenced this pull request Nov 16, 2017
@m-mcgowan
Copy link
Contributor

@sergeuz - could you fix up the conflicts please!

@sergeuz
Copy link
Member Author

sergeuz commented Nov 21, 2017

Done!

@@ -85,17 +85,17 @@ String spark_deviceID(void);
extern "C" {
#endif

enum cloud_disconnect_reason {
CLOUD_DISCONNECT_REASON_OTHER = 0,
typedef enum cloud_disconnect_reason {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a cloud disconnect reason CLOUD_DISCONNECT_NETWORK_DOWN or similar?

@m-mcgowan m-mcgowan modified the milestones: 0.8.0, 0.8.0-rc.2 Jan 18, 2018
@m-mcgowan m-mcgowan merged commit 587830c into develop Jan 22, 2018
@m-mcgowan m-mcgowan deleted the feature/net_cloud_diag branch January 22, 2018 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants