Skip to content

Latest commit

 

History

History
304 lines (221 loc) · 20.5 KB

monitor-iot-hub.md

File metadata and controls

304 lines (221 loc) · 20.5 KB
title description author ms.author ms.topic ms.service ms.custom ms.date
Monitoring Azure IoT Hub
Start here to learn how to monitor Azure IoT Hub
robinsh
robinsh
conceptual
iot-hub
subject-monitoring
11/06/2020

Monitoring Azure IoT Hub

When you have critical applications and business processes relying on Azure resources, you want to monitor those resources for their availability, performance, and operation. This article describes the monitoring data generated by Azure IoT Hub and how you can use the features of Azure Monitor to analyze and alert on this data.

Monitor overview

The Overview page in the Azure portal for each IoT hub includes charts that provide some usage metrics, such as the number of messages used and the number of devices connected to the IoT hub.

:::image type="content" source="media/monitor-iot-hub/overview-portal.png" alt-text="Default metric charts on IoT hub Overview page.":::

Be aware that the message count value can be delayed by 1 minute, and that, for reasons having to do with the IoT Hub service infrastructure, the value can sometimes bounce between higher and lower values on refresh. This counter should only be incorrect for values accrued over the last minute.

The information presented on the Overview pane is useful, but represents only a small amount of the monitoring data that is available for an IoT hub. Some monitoring data is collected automatically and is available for analysis as soon as you create your IoT hub. You can enable additional types of data collection with some configuration.

What is Azure Monitor?

Azure IoT Hub creates monitoring data using Azure Monitor, which is a full stack monitoring service in Azure that provides a complete set of features to monitor your Azure resources in addition to resources in other clouds and on-premises.

Start with the article Monitoring Azure resources with Azure Monitor, which describes the following concepts:

  • What is Azure Monitor?
  • Costs associated with monitoring
  • Monitoring data collected in Azure
  • Configuring data collection
  • Standard tools in Azure for analyzing and alerting on monitoring data

The following sections build on this article by describing the specific data gathered for Azure IoT Hub and providing examples for configuring data collection and analyzing this data with Azure tools.

Monitoring data

Azure IoT Hub collects the same kinds of monitoring data as other Azure resources that are described in Monitoring data from Azure resources.

See Monitoring Azure IoT Hub data reference for detailed information on the metrics and logs created by Azure IoT Hub.

Important

The events emitted by the IoT Hub service using Azure Monitor resource logs are not guaranteed to be reliable or ordered. Some events might be lost or delivered out of order. Resource logs also aren't meant to be real-time, and it may take several minutes for events to be logged to your choice of destination.

Collection and routing

Platform metrics and the Activity log are collected and stored automatically, but can be routed to other locations by using a diagnostic setting.

Resource logs are not collected and stored until you create a diagnostic setting and route them to one or more locations.

Metrics and logs can be routed to several locations including:

  • The Azure Monitor Logs store via an associated Log Analytics workspace. There they can be analyzed using Log Analytics.
  • Azure Storage for archiving and offline analysis
  • An Event Hubs endpoint where they can be read by external applications, for example, third-party SIEM tools.

In Azure portal, you can select Diagnostic settings under Monitoring on the left-pane of your IoT hub followed by Add diagnostic setting to create diagnostic settings scoped to the logs and platform metrics emitted by your IoT hub.

The following screenshot shows a diagnostic setting for routing the resource log type Connection Operations and all platform metrics to a Log Analytics workspace.

:::image type="content" source="media/monitor-iot-hub/diagnostic-setting-portal.png" alt-text="Diagnostic Settings pane for an IoT hub.":::

See Create diagnostic setting to collect platform logs and metrics in Azure for the detailed process for creating a diagnostic setting using the Azure portal, CLI, or PowerShell. When you create a diagnostic setting, you specify which categories of logs to collect. The categories for Azure IoT Hub are listed under Resource logs in the Monitoring Azure IoT Hub data reference.

When routing IoT Hub platform metrics to other locations, be aware that:

  • The following platform metrics are not exportable via diagnostic settings: Connected devices (preview) and Total devices (preview).

  • Multi-dimensional metrics, for example some routing metrics, are currently exported as flattened single dimensional metrics aggregated across dimension values. For more detail, see Exporting platform metrics to other locations.

Analyzing metrics

You can analyze metrics for Azure IoT Hub with metrics from other Azure services using metrics explorer by opening Metrics from the Azure Monitor menu. See Getting started with Azure Metrics Explorer for details on using this tool.

In Azure portal, you can select Metrics under Monitoring on the left-pane of your IoT hub to open metrics explorer scoped, by default, to the platform metrics emitted by your IoT hub:

:::image type="content" source="media/monitor-iot-hub/metrics-portal.png" alt-text="Metrics explorer page for an IoT hub.":::

For a list of the platform metrics collected for Azure IoT Hub, see Metrics in the Monitoring Azure IoT Hub data reference. For a list of the platform metrics collected for all Azure services, see Supported metrics with Azure Monitor.

For IoT Hub platform metrics that are collected in units of count, some aggregations may not be available or usable. To learn more, see Supported aggregations in the Monitoring Azure IoT Hub data reference.

Some IoT Hub metrics, like routing metrics, are multi-dimensional. For these metrics, you can apply filters and splitting to your charts based on a dimension.

Analyzing logs

Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties. The data in these tables are associated with a Log Analytics workspace and can be queried in Log Analytics. To learn more about Azure Monitor Logs, see Azure Monitor Logs overview in the Azure Monitor documentation.

To route data to Azure Monitor Logs, you must create a diagnostic setting to send resource logs or platform metrics to a Log Analytics workspace. To learn more, see Collection and routing.

In Azure portal, you can select Logs under Monitoring on the left-pane of your IoT hub to perform Log Analytics queries scoped, by default, to the logs and metrics collected in Azure Monitor Logs for your IoT hub.

:::image type="content" source="media/monitor-iot-hub/logs-portal.png" alt-text="Logs page for an IoT hub.":::

For a list of the tables used by Azure Monitor Logs and queryable by Log Analytics, see Azure Monitor Logs tables in the Monitoring Azure IoT Hub data reference.

All resource logs in Azure Monitor have the same fields followed by service-specific fields. The common schema is outlined in Azure Monitor resource log schema. You can find the schema and categories of resource logs collected for Azure IoT Hub in Resource logs in the Monitoring Azure IoT Hub data reference.

The Activity log is a platform log in Azure that provides insight into subscription-level events. You can view it independently or route it to Azure Monitor Logs, where you can do much more complex queries using Log Analytics.

When routing IoT Hub platform metrics to Azure Monitor Logs, be aware that:

  • The following platform metrics are not exportable via diagnostic settings: Connected devices (preview) and Total devices (preview).

  • Multi-dimensional metrics, for example some routing metrics, are currently exported as flattened single dimensional metrics aggregated across dimension values. For more detail, see Exporting platform metrics to other locations.

For some common queries with IoT Hub, see Sample Kusto queries. For detailed information on using Log Analytics queries, see Overview of log queries in Azure Monitor.

SDK version in IoT Hub logs

Some operations in IoT Hub resource logs return an sdkVersion property in their properties object. For these operations, when a device or backend app is using one of the Azure IoT SDKs, this property contains information about the SDK being used, the SDK version, and the platform on which the SDK is running. The following example shows the sdkVersion property emitted for a deviceConnect operation when using the Node.js device SDK: "azure-iot-device/1.17.1 (node v10.16.0; Windows_NT 10.0.18363; x64)". Here's an example of the value emitted for the .NET (C#) SDK: ".NET/1.21.2 (.NET Framework 4.8.4200.0; Microsoft Windows 10.0.17763 WindowsProduct:0x00000004; X86)".

The following table shows the SDK name used for different Azure IoT SDKs:

SDK name in sdkVersion property Language
.NET .NET (C#)
microsoft.azure.devices .NET (C#) service SDK
microsoft.azure.devices.client .NET (C#) device SDK
iothubclient C or Python v1 (deprecated) device SDK
iothubserviceclient C or Python v1 (deprecated) service SDK
azure-iot-device-iothub-py Python device SDK
azure-iot-device Node.js device SDK
azure-iothub Node.js service SDK
com.microsoft.azure.iothub-java-client Java device SDK
com.microsoft.azure.iothub.service.sdk Java service SDK
com.microsoft.azure.sdk.iot.iot-device-client Java device SDK
com.microsoft.azure.sdk.iot.iot-service-client Java service SDK
C Embedded C
C + (OSSimplified = Azure RTOS) Azure RTOS

You can extract the SDK version property when you perform queries against IoT Hub resource logs. For example, the following query extracts the SDK version property (and device ID) from the properties returned by Connections operations. These two properties are written to the results along with the time of the operation and the resource ID of the IoT hub that the device is connecting to.

// SDK version of devices
// List of devices and their SDK versions that connect to IoT Hub
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
| where Category == "Connections"
| extend parsed_json = parse_json(properties_s) 
| extend SDKVersion = tostring(parsed_json.sdkVersion) , DeviceId = tostring(parsed_json.deviceId)
| distinct DeviceId, SDKVersion, TimeGenerated, _ResourceId

Sample Kusto queries

Important

When you select Logs from the IoT hub menu, Log Analytics is opened with the query scope set to the current IoT hub. This means that log queries will only include data from that resource. If you want to run a query that includes data from other IoT hubs or data from other Azure services, select Logs from the Azure Monitor menu. See Log query scope and time range in Azure Monitor Log Analytics for details.

Following are queries that you can use to help you monitor your IoT hub.

  • Connectivity Errors: Identify device connection errors.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" and Level == "Error"
  • Throttling Errors: Identify devices that made the most requests resulting in throttling errors.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where ResultType == "429001"
    | extend DeviceId = tostring(parse_json(properties_s).deviceId)
    | summarize count() by DeviceId, Category, _ResourceId
    | order by count_ desc
  • Dead Endpoints: Identify dead or unhealthy endpoints by the number times the issue was reported, as well as the reason why.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Routes" and OperationName in ("endpointDead", "endpointUnhealthy")
    | extend parsed_json = parse_json(properties_s)
    | extend Endpoint = tostring(parsed_json.endpointName), Reason = tostring(parsed_json.details) 
    | summarize count() by Endpoint, OperationName, Reason, _ResourceId
    | order by count_ desc
  • Error summary: Count of errors across all operations by type.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Level == "Error"
    | summarize count() by ResultType, ResultDescription, Category, _ResourceId
  • Recently connected devices: List of devices that IoT Hub saw connect in the specified time period.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" and OperationName == "deviceConnect"
    | extend DeviceId = tostring(parse_json(properties_s).deviceId)
    | summarize max(TimeGenerated) by DeviceId, _ResourceId
  • SDK version of devices: List of devices and their SDK versions for device connections or device to cloud twin operations.

    AzureDiagnostics
    | where ResourceProvider == "MICROSOFT.DEVICES" and ResourceType == "IOTHUBS"
    | where Category == "Connections" or Category == "D2CTwinOperations"
    | extend parsed_json = parse_json(properties_s)
    | extend SDKVersion = tostring(parsed_json.sdkVersion) , DeviceId = tostring(parsed_json.deviceId)
    | distinct DeviceId, SDKVersion, TimeGenerated, _ResourceId

Read logs from Azure Event Hubs

After you set up event logging through diagnostics settings, you can create applications that read out the logs so that you can take action based on the information in them. This sample code retrieves logs from an event hub:

class Program
{ 
    static string connectionString = "{your AMS eventhub endpoint connection string}";
    static string monitoringEndpointName = "{your AMS event hub endpoint name}";
    static EventHubClient eventHubClient;
    //This is the Diagnostic Settings schema
    class AzureMonitorDiagnosticLog
    {
        string time { get; set; }
        string resourceId { get; set; }
        string operationName { get; set; }
        string category { get; set; }
        string level { get; set; }
        string resultType { get; set; }
        string resultDescription { get; set; }
        string durationMs { get; set; }
        string callerIpAddress { get; set; }
        string correlationId { get; set; }
        string identity { get; set; }
        string location { get; set; }
        Dictionary<string, string> properties { get; set; }
    };

    static void Main(string[] args)
    {
        Console.WriteLine("Monitoring. Press Enter key to exit.\n");
        eventHubClient = EventHubClient.CreateFromConnectionString(connectionString, monitoringEndpointName);
        var d2cPartitions = eventHubClient.GetRuntimeInformationAsync().PartitionIds;
        CancellationTokenSource cts = new CancellationTokenSource();
        var tasks = new List<Task>();
        foreach (string partition in d2cPartitions)
        {
            tasks.Add(ReceiveMessagesFromDeviceAsync(partition, cts.Token));
        }
        Console.ReadLine();
        Console.WriteLine("Exiting...");
        cts.Cancel();
        Task.WaitAll(tasks.ToArray());
    }

    private static async Task ReceiveMessagesFromDeviceAsync(string partition, CancellationToken ct)
    {
        var eventHubReceiver = eventHubClient.GetDefaultConsumerGroup().CreateReceiver(partition, DateTime.UtcNow);
        while (true)
        {
            if (ct.IsCancellationRequested)
            {
                await eventHubReceiver.CloseAsync();
                break;
            }
            EventData eventData = await eventHubReceiver.ReceiveAsync(new TimeSpan(0,0,10));
            if (eventData != null)
            {
                string data = Encoding.UTF8.GetString(eventData.GetBytes());
                Console.WriteLine("Message received. Partition: {0} Data: '{1}'", partition, data);
                var deserializer = new JavaScriptSerializer();
                //deserialize json data to azure monitor object
                AzureMonitorDiagnosticLog message = new JavaScriptSerializer().Deserialize<AzureMonitorDiagnosticLog>(result);
            }
        }
    }
}

Alerts

Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your customers notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have benefits and drawbacks.

When creating an alert rule based on platform metrics, be aware that for IoT Hub platform metrics that are collected in units of count, some aggregations may not be available or usable. To learn more, see Supported aggregations in the Monitoring Azure IoT Hub data reference.

Monitor per-device disconnects with Event Grid

Azure Monitor provides a metric, Connected devices, that you can use to monitor the number of devices connected to your IoT Hub and trigger an alert when number of connected devices drops below a threshold value. While this may be sufficient for some scenarios, Azure Event Grid provides a low-latency, per-device monitoring solution that you can use to track device connections for critical devices and infrastructure.

With Event Grid, you can subscribe to the IoT Hub DeviceConnected and DeviceDisconnected events to trigger alerts and monitor device connection state. Event Grid provides much lower event latency than Azure Monitor, and you can monitor on a per-device basis, rather than for the total number of connected devices. These factors make Event Grid the preferred method for monitoring connections for critical devices and infrastructure. We highly recommend using Event Grid to monitor device connections in production environments.

For more detailed information about monitoring device connections with Event Grid and Azure Monitor, see Monitor, diagnose, and troubleshoot disconnects with Azure IoT Hub.

Next steps