From 5f6f73a8c32a089f0c9dc195062f4c57d58f0378 Mon Sep 17 00:00:00 2001 From: Apoorv Kudesia Date: Tue, 26 Aug 2025 17:28:52 +0530 Subject: [PATCH 1/2] SUMO-267358 | Apoorv | Update. Azure Machine Learning Doc for new app --- .../microsoft-azure/azure-machine-learning.md | 155 ++++++++++++++++-- 1 file changed, 139 insertions(+), 16 deletions(-) diff --git a/docs/integrations/microsoft-azure/azure-machine-learning.md b/docs/integrations/microsoft-azure/azure-machine-learning.md index d0d8f71dd3..74ba66c47a 100644 --- a/docs/integrations/microsoft-azure/azure-machine-learning.md +++ b/docs/integrations/microsoft-azure/azure-machine-learning.md @@ -23,33 +23,156 @@ For more information on supported metrics and dimensions, refer to the [Azure do Azure service sends monitoring data to Azure Monitor, which can then [stream data to Eventhub](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/stream-monitoring-data-event-hubs). Sumo Logic supports: * Logs collection from [Azure Monitor](https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-get-started) using our [Azure Event Hubs source](/docs/send-data/collect-from-other-data-sources/azure-monitoring/ms-azure-event-hubs-source/). -* Metrics collection using our [HTTP Logs and Metrics source](/docs/send-data/collect-from-other-data-sources/azure-monitoring/collect-metrics-azure-monitor/) via Azure Functions deployed using the ARM template. +* Metrics collection using our [Azure Metrics Source](/docs/send-data/hosted-collectors/microsoft-source/azure-metrics-source). -You must explicitly enable diagnostic settings for each machine-learning workspace you want to monitor. You can forward logs to the same event hub provided they satisfy the limitations and permissions as described [here](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal#destination-limitations). +You must explicitly enable diagnostic settings for each Machine Learning Workspace you want to monitor. You can forward logs to the same event hub provided they satisfy the limitations and permissions as described [here](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal#destination-limitations). -When you configure the event hubs source or HTTP source, plan your source category to ease the querying process. A hierarchical approach allows you to make use of wildcards. For example: `Azure/MachineLearning/Logs`, `Azure/MachineLearning/Metrics`. +When you configure the event hubs source or HTTP source, plan your source category to ease the querying process. A hierarchical approach allows you to make use of wildcards. For example: `Azure/KeyVault/Logs`, `Azure/KeyVault/Metrics`. + +### Configure collector + +Create a hosted collector if not already configured and tag the `tenant_name` field. You can get the tenant name using the instructions [here](https://learn.microsoft.com/en-us/azure/active-directory-b2c/tenant-management-read-tenant-name#get-your-tenant-name). Make sure you create the required sources in this collector.
Azure Tag Tenant Name ### Configure metrics collection -In this section, you will configure a pipeline for shipping metrics from Azure Monitor to an Event Hub, onto an Azure Function, and finally to an HTTP Source on a hosted collector in Sumo Logic. +import MetricsSourceBeta from '../../reuse/metrics-source-beta.md'; -1. Create a hosted collector and tag the `tenant_name` field. You can get the tenant name using the instructions [here](https://learn.microsoft.com/en-us/azure/active-directory-b2c/tenant-management-read-tenant-name#get-your-tenant-name).
Azure Tag Tenant Name -1. [Configure an HTTP Source](/docs/send-data/collect-from-other-data-sources/azure-monitoring/collect-metrics-azure-monitor/#step-1-configure-an-http-source). -1. [Configure and deploy the ARM Template](/docs/send-data/collect-from-other-data-sources/azure-monitoring/collect-metrics-azure-monitor/#step-2-configure-azure-resources-using-arm-template). -1. [Export metrics to Event Hub](/docs/send-data/collect-from-other-data-sources/azure-monitoring/collect-metrics-azure-monitor/#step-3-export-metrics-for-a-particular-resource-to-event-hub). Perform the steps below for each machine learning workspace that you want to monitor. - * Choose `Stream to an event hub` as the destination. - * Select `AllMetrics`. - * Use the Event hub namespace created by the ARM template in Step 2 above. You can create a new Event hub or use the one created by the ARM template. You can use the default policy `RootManageSharedAccessKey` as the policy name. + ### Configure logs collection In this section, you will configure a pipeline for shipping diagnostic logs from Azure Monitor to an Event Hub. - +#### Diagnostic logs 1. To set up the Azure Event Hubs source in Sumo Logic, refer to [Azure Event Hubs Source for Logs](/docs/send-data/collect-from-other-data-sources/azure-monitoring/ms-azure-event-hubs-source/). -2. To create the Diagnostic settings in the Azure portal, refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings). Perform the steps below for each machine learning workspace that you want to monitor. - * Choose `Stream to an event hub` as the destination. - * Select `allLogs` or individual log categories. - * Use the Event hub namespace and Event hub name configured in the previous step in the destination details section. You can use the default policy `RootManageSharedAccessKey` as the policy name. +1. To create the Diagnostic settings in Azure portal, refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal#create-diagnostic-settings). Perform the steps below for each Azure Machine Learning namespace that you want to monitor. + 1. Choose `Stream to an event hub` as the destination. + 1. Select `allLogs`. + 1. Use the Event Hub namespace and Event Hub name configured in the previous step in the destination details section. You can use the default policy `RootManageSharedAccessKey` as the policy name.
Azure Event Grid logs +1. Tag the location field in the source with the right location value.
Azure Machine Learning Tag Location + +#### Activity Logs + +To collect activity logs, refer to the [Collecting Logs for the Azure Audit App from Event Hub](/docs/integrations/microsoft-azure/audit) section in the Azure Audit documentation. Do not perform this step in case you are already collecting activity logs for a subscription. +:::note +Since this source contains logs from multiple regions, make sure that you do not tag this source with the location tag. +::: + +## Installing the Azure Machine Learning app + +import AppInstallIndexV2 from '../../reuse/apps/app-install-index-option.md'; + + + +As part of the app installation process, the following fields will be created by default: + +- `tenant_name`. This field is tagged at the collector level. You can get the tenant name using the instructions [here](https://learn.microsoft.com/en-us/azure/active-directory-b2c/tenant-management-read-tenant-name#get-your-tenant-name). +- `location`. The region to which the resource name belongs to. +- `subscription_id`. ID associated with a subscription where the resource is present. +- `resource_group`. The resource group name where the Azure resource is present. +- `provider_name`. Azure resource provider name (for example, Microsoft.Network). +- `resource_type`. Azure resource type (for example, storage accounts). +- `resource_name`. The name of the resource (for example, storage account name). +- `service_type`. Type of the service that can be accessed with a Azure resource. +- `service_name`. Services that can be accessed with an Azure resource (for example, in Azure Container Instances the service is Subscriptions). + +## Viewing the Azure Machine Learning dashboards + +import ViewDashboardsIndex from '../../reuse/apps/view-dashboards-index.md'; + + + +### Overview + +The **Azure Machine Learning - Overview** dashboard provides comprehensive details on events, operations and details such as overall number of runs, models deployed and quota utilization, operation types, ingress and egress of network data + +Azure Machine Learning - Overview dashboard + +### Model + +This **Azure Machine Learning - Model** dashboard provides details on Model details related to your Azure Machine Learning. + +Azure Machine Learning - Model dashboard + +### Compute + +The **Azure Machine Learning - Compute** dashboard provides details on Compute operations, events and usage such as CPU, Disk or memory to your Azure Machine Learning. + +Azure Machine Learning - Compute dashboard + +### Data Events + +The **Azure Machine Learning - Data Events** dashboard provides details on data events and results details related to your Azure Machine Learning. + +Azure Machine Learning - Data Events dashboard + +### Administrative Operations + +The **Azure Machine Learning - Administrative Operations** dashboard provides details on the operational activities and status of your Azure Machine Learning resources. + +Use this dashboard to: +* Monitor the distribution of operation types and their success rates to ensure proper functioning of your Machine Learning. +* Identify potential issues by analyzing the top operations causing errors and correlating them with specific users or applications. +* Track recent write and delete operations to maintain an audit trail of changes made to your Machine Learning. + +Azure Machine Learning - Administrative Operations dashboard + +### Policy and Recommendations + +The **Azure Machine Learning - Policy and Recommendations** dashboard provides details on policy events and recommendations for your Azure Machine Learning resources. + +Use this dashboard to: +* Monitor the success and failure rates of policy events to ensure proper configuration and compliance. +* Track and analyse recent recommendations to improve the performance and security of your VM setup. +* Identify trends in policy events and recommendations over time to proactively address potential issues. + +Azure Machine Learning - Policy and Recommendations dashboard + +### Jobs and Pipelines + +The **Azure Machine Learning - Jobs and Pipelines** dashboard provides details on Operations, events and failures in jobs and pipelines of your Azure Machine Learning. + +Azure Machine Learning - Jobs and Pipelines dashboard + +### Quota + +The **Azure Machine Learning - Quota** dashboard provides details on Quota related to your Azure Machine Learning such as Quota Utilization, Active Node, Active Cores, Idle Cores, etc. + +Azure Machine Learning - Quota dashboard + +### Run + +The **Azure Machine Learning - Run** dashboard provides details on Running experiments such as failed runs, errors in runs, completed or in-progress or started run + +Azure Machine Learning - Run dashboard + +## Create monitors for Azure Machine Learning app + +import CreateMonitors from '../../reuse/apps/create-monitors.md'; + + + +### Azure Machine Learning alerts +These alerts are metric based and will work for all Machine Learning. + +| Alert Name | Description | Alert Condition | Recover Condition | +|:---------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------|:----------------|:------------------| +| `Azure Machine Learning - CPU Utilization` | This alert is triggered when CPU usage spikes above 80% are detected for any Machine Learning CPU. | Count >= 80 | Count < 80 | +| `Azure Machine Learning - Failed Runs` | This alert is triggered when Failed Runs are detected in Machine Learning workspace. | Count > 1 | Count =< 1 | +| `Azure Machine Learning - GPU Memory Utilization` | This alert is triggered when GPU memory utilization spikes above 80% are detected for any Machine Learning GPU. | Count >= 80 | Count < 80 | +| `Azure Machine Learning - GPU Utilization` | This alert is triggered when GPU utilization spikes above 80% are detected for any Machine Learning GPU. | Count >= 80 | Count < 80 | +| `Azure Machine Learning - Quota Utilization` | This alert is triggered when consumed Quota goes above 80% for any Machine Learning workspace. | Count >= 80 | Count < 80 | + +## Upgrade/Downgrade the Azure Machine Learning app (optional) + +import AppUpdate from '../../reuse/apps/app-update.md'; + + + +## Uninstalling the Azure Machine Learning app (optional) + +import AppUninstall from '../../reuse/apps/app-uninstall.md'; + + ## Troubleshooting From 639c64b16659f875b7f4788f6fd8f158f5d91467 Mon Sep 17 00:00:00 2001 From: John Pipkin Date: Tue, 26 Aug 2025 15:21:17 -0500 Subject: [PATCH 2/2] Updates from review --- .../microsoft-azure/azure-machine-learning.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/integrations/microsoft-azure/azure-machine-learning.md b/docs/integrations/microsoft-azure/azure-machine-learning.md index 74ba66c47a..e91ee6c44e 100644 --- a/docs/integrations/microsoft-azure/azure-machine-learning.md +++ b/docs/integrations/microsoft-azure/azure-machine-learning.md @@ -44,7 +44,7 @@ import MetricsSourceBeta from '../../reuse/metrics-source-beta.md'; In this section, you will configure a pipeline for shipping diagnostic logs from Azure Monitor to an Event Hub. #### Diagnostic logs 1. To set up the Azure Event Hubs source in Sumo Logic, refer to [Azure Event Hubs Source for Logs](/docs/send-data/collect-from-other-data-sources/azure-monitoring/ms-azure-event-hubs-source/). -1. To create the Diagnostic settings in Azure portal, refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal#create-diagnostic-settings). Perform the steps below for each Azure Machine Learning namespace that you want to monitor. +1. To create the diagnostic settings in Azure portal, refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/diagnostic-settings?tabs=portal#create-diagnostic-settings). Perform the steps below for each Azure Machine Learning namespace that you want to monitor. 1. Choose `Stream to an event hub` as the destination. 1. Select `allLogs`. 1. Use the Event Hub namespace and Event Hub name configured in the previous step in the destination details section. You can use the default policy `RootManageSharedAccessKey` as the policy name.
Azure Event Grid logs @@ -52,7 +52,7 @@ In this section, you will configure a pipeline for shipping diagnostic logs from #### Activity Logs -To collect activity logs, refer to the [Collecting Logs for the Azure Audit App from Event Hub](/docs/integrations/microsoft-azure/audit) section in the Azure Audit documentation. Do not perform this step in case you are already collecting activity logs for a subscription. +To collect activity logs, refer to the [Collecting Logs for the Azure Audit App from Event Hub](/docs/integrations/microsoft-azure/audit#collecting-logs-for-the-azure-audit-app-from-event-hub) section in the Azure Audit documentation. Do not perform this step in case you are already collecting activity logs for a subscription. :::note Since this source contains logs from multiple regions, make sure that you do not tag this source with the location tag. ::: @@ -83,19 +83,19 @@ import ViewDashboardsIndex from '../../reuse/apps/view-dashboards-index.md'; ### Overview -The **Azure Machine Learning - Overview** dashboard provides comprehensive details on events, operations and details such as overall number of runs, models deployed and quota utilization, operation types, ingress and egress of network data +The **Azure Machine Learning - Overview** dashboard provides comprehensive information on events, operations and details such as overall number of runs, models deployed and quota utilization, operation types, ingress and egress of network data Azure Machine Learning - Overview dashboard ### Model -This **Azure Machine Learning - Model** dashboard provides details on Model details related to your Azure Machine Learning. +This **Azure Machine Learning - Model** dashboard provides information on model details related to your Azure Machine Learning. Azure Machine Learning - Model dashboard ### Compute -The **Azure Machine Learning - Compute** dashboard provides details on Compute operations, events and usage such as CPU, Disk or memory to your Azure Machine Learning. +The **Azure Machine Learning - Compute** dashboard provides details on compute operations, events and usage such as CPU, Disk or memory to your Azure Machine Learning. Azure Machine Learning - Compute dashboard @@ -129,19 +129,19 @@ Use this dashboard to: ### Jobs and Pipelines -The **Azure Machine Learning - Jobs and Pipelines** dashboard provides details on Operations, events and failures in jobs and pipelines of your Azure Machine Learning. +The **Azure Machine Learning - Jobs and Pipelines** dashboard provides details on operations, events and failures in jobs and pipelines of your Azure Machine Learning. Azure Machine Learning - Jobs and Pipelines dashboard ### Quota -The **Azure Machine Learning - Quota** dashboard provides details on Quota related to your Azure Machine Learning such as Quota Utilization, Active Node, Active Cores, Idle Cores, etc. +The **Azure Machine Learning - Quota** dashboard provides details on quota related to your Azure Machine Learning such as Quota Utilization, Active Node, Active Cores, Idle Cores, etc. Azure Machine Learning - Quota dashboard ### Run -The **Azure Machine Learning - Run** dashboard provides details on Running experiments such as failed runs, errors in runs, completed or in-progress or started run +The **Azure Machine Learning - Run** dashboard provides details on running experiments such as failed runs, errors in runs, completed or in-progress or started run Azure Machine Learning - Run dashboard