diff --git a/e2e_samples/parking_sensors/README.md b/e2e_samples/parking_sensors/README.md index 68a61935b..3a4f940f0 100644 --- a/e2e_samples/parking_sensors/README.md +++ b/e2e_samples/parking_sensors/README.md @@ -133,8 +133,9 @@ The following summarizes key learnings and best practices demonstrated by this s ### 7. Monitor infrastructure, pipelines and data -- A proper monitoring solution should be in-place to ensure failures are identified, diagnosed and addressed in a timely manner. Aside from the base infrastructure and pipeline runs, data should also be monitored. A common area that should have data monitoring is the malformed record store. - +- A proper monitoring solution should be in-place to ensure failures are identified, diagnosed and addressed in a timely manner. Aside from the base infrastructure and pipeline runs, data quality should also be monitored. A common area that should have data monitoring is the malformed record store. +- As an example this repository showcases how to use open source framework [Great Expectations](https://docs.greatexpectations.io/docs/) to define, measure and report data quality metrics at different stages of the data pipeline. Captured Data Quality metrics are reported to Azure Monitor for further visualizing and alerting. Take a look at sample [Data Quality report](docs/images/data_quality_report.png) generated with Azure Monitor workbook. Great Expectations can be configured to generate HTML reports and host directly as static site on Azure Blob Storage. Read more on [How to host and share Data Docs on Azure Blob Storage](https://legacy.docs.greatexpectations.io/en/latest/guides/how_to_guides/configuring_data_docs/how_to_host_and_share_data_docs_on_azure_blob_storage.html). + ## Key Concepts ### Build and Release Pipeline @@ -194,6 +195,8 @@ More resources: ### Observability / Monitoring + **Observability-as-Code** - Few key components of Observability and Monitoring are deployed and configured through Observability-as-Code at the time on Azure resources deployment. This includes log analytics workspace to collect monitoring data from key resources, central Azure dashboard to monitor key metrics and alerts to monitor the data pipelines. To learn more on monitoring specific service read below. + #### Databricks - [Monitoring Azure Databricks with Azure Monitor](https://docs.microsoft.com/en-us/azure/architecture/databricks-monitoring/) @@ -260,9 +263,8 @@ More resources: - **DEPLOYMENT_ID** - string appended to all resource names. This is to ensure uniqueness of azure resource names. *Default*: random five character string. - **AZDO_PIPELINES_BRANCH_NAME** - git branch where Azure DevOps pipelines definitions are retrieved from. *Default*: main. - **AZURESQL_SERVER_PASSWORD** - Password of the SQL Server instance. *Default*: random string. - - To further customize the solution, set parameters in `arm.parameters` files located in the `infrastructure` folder. - + 4. To further customize the solution, set parameters in `arm.parameters` files located in the `infrastructure` folder. + - To enable Observability and Monitoring components through code(Observability-as-code), please set enable_monitoring parameter to true in `arm.parameters` files located in the `infrastructure` folder. This will deploy log analytics workspace to collect monitoring data from key resources, setup an Azure dashboards to monitor key metrics and configure alerts for ADF pipelines. 2. **Deploy Azure resources** 1. Clone locally the imported Github Repo, then `cd` into the `e2e_samples/parking_sensors` folder of the repo 2. Run `./deploy.sh`. @@ -332,7 +334,7 @@ After a successful deployment, you should have the following resources: - SparkSQL tables created - ADLS Gen2 mounted at `dbfs:/mnt/datalake` using the Storage Service Principal. - Databricks KeyVault secrets scope created - - **Log Analytics Workspace** - including a kusto query on Query explorer -> Saved queries, to verify results that will be looged on Synapse notebooks (notebooks are not deployed yet). + - **Log Analytics Workspace** - including a kusto query on Query explorer -> Saved queries, to verify results that will be logged on Synapse notebooks (notebooks are not deployed yet). - **Azure Synapse SQL Dedicated Pool (formerly SQLDW)** - currently, empty. The Release Pipeline will deploy the SQL Database objects. - **Azure Synapse Spark Pool** - currently, empty. Configured to point the deployed Log Analytics workspace, under "Apache Spark Configuration". - **Azure Synapse Workspace** - currently, empty. diff --git a/e2e_samples/parking_sensors/databricks/notebooks/02_standardize.py b/e2e_samples/parking_sensors/databricks/notebooks/02_standardize.py index 8daacc623..bc8da3e13 100644 --- a/e2e_samples/parking_sensors/databricks/notebooks/02_standardize.py +++ b/e2e_samples/parking_sensors/databricks/notebooks/02_standardize.py @@ -1,4 +1,9 @@ # Databricks notebook source +# MAGIC %pip install great-expectations==0.14.12 +# MAGIC %pip install opencensus-ext-azure==1.1.3 + +# COMMAND ---------- + dbutils.widgets.text("infilefolder", "", "In - Folder Path") infilefolder = dbutils.widgets.get("infilefolder") @@ -7,16 +12,13 @@ # COMMAND ---------- -from applicationinsights import TelemetryClient -tc = TelemetryClient(dbutils.secrets.get(scope = "storage_scope", key = "applicationInsightsKey")) - -# COMMAND ---------- - import os import datetime # For testing -# infilefolder = 'datalake/data/lnd/2019_03_11_01_38_00/' +# infilefolder = '2022_03_23_10_28_02/' +# loadid = 1 + load_id = loadid loaded_on = datetime.datetime.now() base_path = os.path.join('dbfs:/mnt/datalake/data/lnd/', infilefolder) @@ -61,26 +63,146 @@ # COMMAND ---------- # MAGIC %md -# MAGIC ### Metrics +# MAGIC ### Data Quality +# MAGIC The following uses the [Great Expectations](https://greatexpectations.io/) library. See [Great Expectation Docs](https://docs.greatexpectations.io/docs/) for more info. +# MAGIC +# MAGIC **Note**: for simplication purposes, the [Expectation Suite](https://docs.greatexpectations.io/docs/terms/expectation_suite) is created inline. Generally this should be created prior to data pipeline execution, and only loaded during runtime and executed against a data [Batch](https://docs.greatexpectations.io/docs/terms/batch/) via [Checkpoint](https://docs.greatexpectations.io/docs/terms/checkpoint/). + +# COMMAND ---------- + +import datetime +import pandas as pd +from ruamel import yaml +from great_expectations.core.batch import RuntimeBatchRequest +from great_expectations.data_context import BaseDataContext +from great_expectations.data_context.types.base import ( + DataContextConfig, + DatasourceConfig, + FilesystemStoreBackendDefaults, +) +from pyspark.sql import SparkSession, Row + + +root_directory = "/dbfs/great_expectations/" + +# 1. Configure DataContext +# https://docs.greatexpectations.io/docs/terms/data_context +data_context_config = DataContextConfig( + datasources={ + "parkingbay_data_source": DatasourceConfig( + class_name="Datasource", + execution_engine={"class_name": "SparkDFExecutionEngine"}, + data_connectors={ + "parkingbay_data_connector": { + "module_name": "great_expectations.datasource.data_connector", + "class_name": "RuntimeDataConnector", + "batch_identifiers": [ + "environment", + "pipeline_run_id", + ], + } + } + ) + }, + store_backend_defaults=FilesystemStoreBackendDefaults(root_directory=root_directory) +) +context = BaseDataContext(project_config=data_context_config) + + +# 2. Create a BatchRequest based on parkingbay_sdf dataframe. +# https://docs.greatexpectations.io/docs/terms/batch +batch_request = RuntimeBatchRequest( + datasource_name="parkingbay_data_source", + data_connector_name="parkingbay_data_connector", + data_asset_name="paringbaydataaset", # This can be anything that identifies this data_asset for you + batch_identifiers={ + "environment": "stage", + "pipeline_run_id": "pipeline_run_id", + }, + runtime_parameters={"batch_data": parkingbay_sdf}, # Your dataframe goes here +) + + +# 3. Define Expecation Suite and corresponding Data Expectations +# https://docs.greatexpectations.io/docs/terms/expectation_suite +expectation_suite_name = "parkingbay_data_exception_suite_basic" +context.create_expectation_suite(expectation_suite_name=expectation_suite_name, overwrite_existing=True) +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name=expectation_suite_name, +) +# Add Validatons to suite +# Check available expectations: validator.list_available_expectation_types() +# https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/expectations/index.html +# https://legacy.docs.greatexpectations.io/en/latest/reference/core_concepts/expectations/standard_arguments.html#meta +validator.expect_column_values_to_not_be_null(column="meter_id") +validator.expect_column_values_to_not_be_null(column="marker_id") +validator.expect_column_values_to_be_of_type(column="rd_seg_dsc", type_="StringType") +validator.expect_column_values_to_be_of_type(column="rd_seg_id", type_="IntegerType") +# validator.validate() # To run run validations without checkpoint +validator.save_expectation_suite(discard_failed_expectations=False) + + + +# 4. Configure a checkpoint and run Expectation suite using checkpoint +# https://docs.greatexpectations.io/docs/terms/checkpoint +my_checkpoint_name = "Parkingbay Data DQ" +checkpoint_config = { + "name": my_checkpoint_name, + "config_version": 1.0, + "class_name": "SimpleCheckpoint", + "run_name_template": "%Y%m%d-%H%M%S-my-run-name-template", +} +my_checkpoint = context.test_yaml_config(yaml.dump(checkpoint_config)) +context.add_checkpoint(**checkpoint_config) +# Run Checkpoint passing in expectation suite. +checkpoint_result = context.run_checkpoint( + checkpoint_name=my_checkpoint_name, + validations=[ + { + "batch_request": batch_request, + "expectation_suite_name": expectation_suite_name, + } + ], +) + + +# COMMAND ---------- + +# MAGIC %md +# MAGIC ### Data Quality Metric Reporting +# MAGIC +# MAGIC This parses the results of the checkpoint and sends it to AppInsights / Azure Monitor for reporting. # COMMAND ---------- -parkingbay_count = t_parkingbay_sdf.count() -sensordata_count = t_sensordata_sdf.count() -parkingbay_malformed_count = t_parkingbay_malformed_sdf.count() -sensordata_malformed_count = t_sensordata_malformed_sdf.count() - -tc.track_event('Standardize : Completed load', - properties={'parkingbay_filepath': parkingbay_filepath, - 'sensors_filepath': sensors_filepath, - 'load_id': load_id - }, - measurements={'parkingbay_count': parkingbay_count, - 'sensordata_count': sensordata_count, - 'parkingbay_malformed_count': parkingbay_malformed_count, - 'sensordata_malformed_count': sensordata_malformed_count - }) -tc.flush() +import logging +import time +from opencensus.ext.azure.log_exporter import AzureLogHandler + +logger = logging.getLogger(__name__) +logger.addHandler(AzureLogHandler(connection_string=dbutils.secrets.get(scope = "storage_scope", key = "applicationInsightsConnectionString"))) + +result_dic = checkpoint_result.to_json_dict() +key_name=[key for key in result_dic['run_results'].keys()][0] +results = result_dic['run_results'][key_name]['validation_result']['results'] + +checks = {'check_name':checkpoint_result['checkpoint_config']['name'],'pipelinerunid':loadid} +for i in range(len(results)): + validation_name= results[i]['expectation_config']['expectation_type'] + "_on_" + results[i]['expectation_config']['kwargs']['column'] + checks[validation_name]=results[i]['success'] + +properties = {'custom_dimensions': checks} + +if checkpoint_result.success is True: + logger.setLevel(logging.INFO) + logger.info('verifychecks', extra=properties) +else: + logger.setLevel(logging.ERROR) + logger.error('verifychecks', extra=properties) + +time.sleep(16) + # COMMAND ---------- diff --git a/e2e_samples/parking_sensors/databricks/notebooks/03_transform.py b/e2e_samples/parking_sensors/databricks/notebooks/03_transform.py index 63279ea0f..842158eb5 100644 --- a/e2e_samples/parking_sensors/databricks/notebooks/03_transform.py +++ b/e2e_samples/parking_sensors/databricks/notebooks/03_transform.py @@ -1,11 +1,11 @@ # Databricks notebook source -dbutils.widgets.text("loadid", "", "Load Id") -loadid = dbutils.widgets.get("loadid") +# MAGIC %pip install great-expectations==0.14.12 +# MAGIC %pip install opencensus-ext-azure==1.1.3 # COMMAND ---------- -from applicationinsights import TelemetryClient -tc = TelemetryClient(dbutils.secrets.get(scope = "storage_scope", key = "applicationInsightsKey")) +dbutils.widgets.text("loadid", "", "Load Id") +loadid = dbutils.widgets.get("loadid") # COMMAND ---------- @@ -66,24 +66,145 @@ # COMMAND ---------- # MAGIC %md -# MAGIC ### Metrics +# MAGIC ### Data Quality +# MAGIC The following uses the [Great Expectations](https://greatexpectations.io/) library. See [Great Expectation Docs](https://docs.greatexpectations.io/docs/) for more info. +# MAGIC +# MAGIC **Note**: for simplication purposes, the [Expectation Suite](https://docs.greatexpectations.io/docs/terms/expectation_suite) is created inline. Generally this should be created prior to data pipeline execution, and only loaded during runtime and executed against a data [Batch](https://docs.greatexpectations.io/docs/terms/batch/) via [Checkpoint](https://docs.greatexpectations.io/docs/terms/checkpoint/). # COMMAND ---------- -new_dim_parkingbay_count = spark.read.table("dw.dim_parking_bay").count() -new_dim_location_count = spark.read.table("dw.dim_location").count() -new_dim_st_marker_count = spark.read.table("dw.dim_st_marker").count() -nr_fact_parking_count = nr_fact_parking.count() +import pandas as pd +from ruamel import yaml +from great_expectations.core.batch import RuntimeBatchRequest +from great_expectations.data_context import BaseDataContext +from great_expectations.data_context.types.base import ( + DataContextConfig, + DatasourceConfig, + FilesystemStoreBackendDefaults, +) +from pyspark.sql import SparkSession, Row + +root_directory = "/dbfs/great_expectations/" + +# 1. Configure DataContext +# https://docs.greatexpectations.io/docs/terms/data_context +data_context_config = DataContextConfig( + datasources={ + "parkingbay_data_source": DatasourceConfig( + class_name="Datasource", + execution_engine={"class_name": "SparkDFExecutionEngine"}, + data_connectors={ + "parkingbay_data_connector": { + "module_name": "great_expectations.datasource.data_connector", + "class_name": "RuntimeDataConnector", + "batch_identifiers": [ + "environment", + "pipeline_run_id", + ], + } + } + ) + }, + store_backend_defaults=FilesystemStoreBackendDefaults(root_directory=root_directory) +) +context = BaseDataContext(project_config=data_context_config) + + +# 2. Create a BatchRequest based on parkingbay_sdf dataframe. +# https://docs.greatexpectations.io/docs/terms/batch +batch_request = RuntimeBatchRequest( + datasource_name="transformed_data_source", + data_connector_name="transformed_data_connector", + data_asset_name="paringbaydataaset", # This can be anything that identifies this data_asset for you + batch_identifiers={ + "environment": "stage", + "pipeline_run_id": "pipeline_run_id", + }, + runtime_parameters={"batch_data": nr_fact_parking}, # Your dataframe goes here +) + + +# 3. Define Expecation Suite and corresponding Data Expectations +# https://docs.greatexpectations.io/docs/terms/expectation_suite +expectation_suite_name = "Transfomed_data_exception_suite_basic" +context.create_expectation_suite(expectation_suite_name=expectation_suite_name, overwrite_existing=True) +validator = context.get_validator( + batch_request=batch_request, + expectation_suite_name=expectation_suite_name, +) +# Add Validatons to suite +# Check available expectations: validator.list_available_expectation_types() +# https://legacy.docs.greatexpectations.io/en/latest/autoapi/great_expectations/expectations/index.html +# https://legacy.docs.greatexpectations.io/en/latest/reference/core_concepts/expectations/standard_arguments.html#meta +validator.expect_column_values_to_not_be_null(column="status") +validator.expect_column_values_to_be_of_type(column="status", type_="StringType") +validator.expect_column_values_to_not_be_null(column="dim_time_id") +validator.expect_column_values_to_be_of_type(column="dim_time_id", type_="IntegerType") +validator.expect_column_values_to_not_be_null(column="dim_parking_bay_id") +validator.expect_column_values_to_be_of_type(column="dim_parking_bay_id", type_="StringType") +#validator.validate() # To run run validations without checkpoint +validator.save_expectation_suite(discard_failed_expectations=False) + + +# 4. Configure a checkpoint and run Expectation suite using checkpoint +# https://docs.greatexpectations.io/docs/terms/checkpoint +my_checkpoint_name = "Transformed Data" +checkpoint_config = { + "name": my_checkpoint_name, + "config_version": 1.0, + "class_name": "SimpleCheckpoint", + "run_name_template": "%Y%m%d-%H%M%S-my-run-name-template", +} +my_checkpoint = context.test_yaml_config(yaml.dump(checkpoint_config,default_flow_style=False)) +context.add_checkpoint(**checkpoint_config) +# Run Checkpoint passing in expectation suite +checkpoint_result = context.run_checkpoint( + checkpoint_name=my_checkpoint_name, + validations=[ + { + "batch_request": batch_request, + "expectation_suite_name": expectation_suite_name, + } + ], +) + +# COMMAND ---------- + +# MAGIC %md +# MAGIC ### Data Quality Metric Reporting +# MAGIC +# MAGIC This parses the results of the checkpoint and sends it to AppInsights / Azure Monitor for reporting. + +# COMMAND ---------- + +## Report Data Quality Metrics to Azure Monitor using python Azure Monitor open-census exporter +import logging +import time +from opencensus.ext.azure.log_exporter import AzureLogHandler + +logger = logging.getLogger(__name__) +logger.addHandler(AzureLogHandler(connection_string=dbutils.secrets.get(scope = "storage_scope", key = "applicationInsightsConnectionString"))) + +result_dic = checkpoint_result.to_json_dict() +key_name=[key for key in result_dic['run_results'].keys()][0] +results = result_dic['run_results'][key_name]['validation_result']['results'] + +checks = {'check_name':checkpoint_result['checkpoint_config']['name'],'pipelinerunid':loadid} +for i in range(len(results)): + validation_name= results[i]['expectation_config']['expectation_type'] + "_on_" + results[i]['expectation_config']['kwargs']['column'] + checks[validation_name]=results[i]['success'] + +properties = {'custom_dimensions': checks} + +if checkpoint_result.success is True: + logger.setLevel(logging.INFO) + logger.info('verifychecks', extra=properties) +else: + logger.setLevel(logging.ERROR) + logger.error('verifychecks', extra=properties) +time.sleep(16) -tc.track_event('Transform : Completed load', - properties={'load_id': load_id}, - measurements={'new_dim_parkingbay_count': new_dim_parkingbay_count, - 'new_dim_location_count': new_dim_location_count, - 'new_dim_st_marker_count': new_dim_st_marker_count, - 'newrecords_fact_parking_count': nr_fact_parking_count - }) -tc.flush() # COMMAND ---------- diff --git a/e2e_samples/parking_sensors/docs/images/data_quality_report.png b/e2e_samples/parking_sensors/docs/images/data_quality_report.png new file mode 100644 index 000000000..da79160f7 Binary files /dev/null and b/e2e_samples/parking_sensors/docs/images/data_quality_report.png differ diff --git a/e2e_samples/parking_sensors/infrastructure/main.bicep b/e2e_samples/parking_sensors/infrastructure/main.bicep index 84e31d677..67d290ccd 100644 --- a/e2e_samples/parking_sensors/infrastructure/main.bicep +++ b/e2e_samples/parking_sensors/infrastructure/main.bicep @@ -1,10 +1,12 @@ param project string = 'mdwdo' param env string = 'dev' +param email_id string = 'support@domain.com' param location string = resourceGroup().location param deployment_id string param keyvault_owner_object_id string @secure() param sql_server_password string +param enable_monitoring bool module datafactory './modules/datafactory.bicep' = { @@ -66,6 +68,7 @@ module keyvault './modules/keyvault.bicep' = { ] } + module appinsights './modules/appinsights.bicep' = { name: 'appinsights_deploy_${deployment_id}' params: { @@ -76,6 +79,86 @@ module appinsights './modules/appinsights.bicep' = { } } +module loganalytics './modules/log_analytics.bicep' = if (enable_monitoring) { + name: 'log_analytics_deploy_${deployment_id}' + params: { + project: project + env: env + location: location + deployment_id: deployment_id + } +} + + +module diagnostic './modules/diagnostic_settings.bicep' = if (enable_monitoring) { + name: 'diagnostic_settings_deploy_${deployment_id}' + params: { + project: project + env: env + deployment_id: deployment_id + loganalytics_workspace_name: loganalytics.outputs.loganalyticswsname + datafactory_name: datafactory.outputs.datafactory_name + } + dependsOn: [ + loganalytics + datafactory + ] +} + + +module dashboard './modules/dashboard.bicep' = if (enable_monitoring) { + name: 'dashboard_${deployment_id}' + params: { + project: project + env: env + location: location + deployment_id: deployment_id + datafactory_name: datafactory.outputs.datafactory_name + sql_server_name: synapse_sql_pool.outputs.synapse_sql_pool_output.name + sql_database_name: synapse_sql_pool.outputs.synapse_sql_pool_output.synapse_pool_name + } +} + +module actiongroup './modules/actiongroup.bicep' = if (enable_monitoring) { + name: 'actiongroup_${deployment_id}' + params: { + project: project + env: env + location: location + deployment_id: deployment_id + email_id: email_id + } +} + +module alerts './modules/alerts.bicep' = if (enable_monitoring) { + name: 'alerts_${deployment_id}' + params: { + project: project + env: env + location: location + deployment_id: deployment_id + datafactory_name: datafactory.outputs.datafactory_name + action_group_id: actiongroup.outputs.actiongroup_id + } + dependsOn: [ + loganalytics + datafactory + actiongroup + ] +} + +module data_quality_workbook './modules/data_quality_workbook.bicep' = if (enable_monitoring) { + name: 'wb_${deployment_id}' + params: { + appinsights_name: appinsights.outputs.appinsights_name + } + dependsOn: [ + loganalytics + appinsights + ] +} + + output storage_account_name string = storage.outputs.storage_account_name output synapse_sql_pool_output object = synapse_sql_pool.outputs.synapse_sql_pool_output @@ -85,3 +168,4 @@ output appinsights_name string = appinsights.outputs.appinsights_name output keyvault_name string = keyvault.outputs.keyvault_name output keyvault_resource_id string = keyvault.outputs.keyvault_resource_id output datafactory_name string = datafactory.outputs.datafactory_name +output loganalytics_name string = loganalytics.outputs.loganalyticswsname diff --git a/e2e_samples/parking_sensors/infrastructure/main.parameters.dev.json b/e2e_samples/parking_sensors/infrastructure/main.parameters.dev.json index 982550f3f..5a399a126 100644 --- a/e2e_samples/parking_sensors/infrastructure/main.parameters.dev.json +++ b/e2e_samples/parking_sensors/infrastructure/main.parameters.dev.json @@ -4,6 +4,12 @@ "parameters": { "env": { "value": "dev" - } + }, + "email_id": { + "value": "support@domain.com" + }, + "enable_monitoring": { + "value": true + } } } \ No newline at end of file diff --git a/e2e_samples/parking_sensors/infrastructure/main.parameters.prod.json b/e2e_samples/parking_sensors/infrastructure/main.parameters.prod.json index 4db482e5b..c1f454c5c 100644 --- a/e2e_samples/parking_sensors/infrastructure/main.parameters.prod.json +++ b/e2e_samples/parking_sensors/infrastructure/main.parameters.prod.json @@ -4,6 +4,12 @@ "parameters": { "env": { "value": "prod" + }, + "email_id": { + "value": "support@domain.com" + }, + "enable_monitoring": { + "value": true } } } \ No newline at end of file diff --git a/e2e_samples/parking_sensors/infrastructure/main.parameters.stg.json b/e2e_samples/parking_sensors/infrastructure/main.parameters.stg.json index bdb52d66b..e59f4df14 100644 --- a/e2e_samples/parking_sensors/infrastructure/main.parameters.stg.json +++ b/e2e_samples/parking_sensors/infrastructure/main.parameters.stg.json @@ -4,6 +4,12 @@ "parameters": { "env": { "value": "stg" + }, + "email_id": { + "value": "support@domain.com" + }, + "enable_monitoring": { + "value": true } } } \ No newline at end of file diff --git a/e2e_samples/parking_sensors/infrastructure/modules/actiongroup.bicep b/e2e_samples/parking_sensors/infrastructure/modules/actiongroup.bicep new file mode 100644 index 000000000..70081cb04 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/actiongroup.bicep @@ -0,0 +1,32 @@ +param project string +@allowed([ + 'dev' + 'stg' + 'prod' +]) +param env string +param location string = resourceGroup().location +param deployment_id string +param email_id string + +resource actiongroup 'Microsoft.Insights/actionGroups@2021-09-01' = { + name: '${project}-emailactiongroup-${env}-${deployment_id}' + location: 'global' + tags: { + DisplayName: 'Action Group' + Environment: env + } + properties: { + groupShortName: 'emailgroup' + emailReceivers: [ + { + emailAddress: email_id + name: 'emailaction' + useCommonAlertSchema: true + } + ] + enabled: true + } +} + +output actiongroup_id string = actiongroup.id diff --git a/e2e_samples/parking_sensors/infrastructure/modules/alerts.bicep b/e2e_samples/parking_sensors/infrastructure/modules/alerts.bicep new file mode 100644 index 000000000..acee99171 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/alerts.bicep @@ -0,0 +1,52 @@ +param project string +@allowed([ + 'dev' + 'stg' + 'prod' +]) +param env string +param location string = resourceGroup().location +param deployment_id string +param datafactory_name string +param action_group_id string + +resource adpipelinefailed 'Microsoft.Insights/metricAlerts@2018-03-01' = { + name: '${project}-adffailedalert-${env}-${deployment_id}' + location: 'global' + tags: { + DisplayName: 'ADF Pipeline Failed' + Environment: env + } + properties: { + actions: [ + { + actionGroupId: action_group_id + } + ] + autoMitigate: false + criteria: { + 'odata.type': 'Microsoft.Azure.Monitor.MultipleResourceMultipleMetricCriteria' + allOf: [ + { + threshold : 1 + name : 'Metric1' + metricNamespace: 'Microsoft.DataFactory/factories' + metricName: 'PipelineFailedRuns' + operator: 'GreaterThan' + timeAggregation: 'Total' + criterionType: 'StaticThresholdCriterion' + } + ] + } + description: 'ADF pipeline failed' + enabled: true + evaluationFrequency: 'PT1M' + scopes: [ + '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/Microsoft.DataFactory/factories/${datafactory_name}' + ] + severity: 1 + targetResourceRegion: location + targetResourceType: 'Microsoft.DataFactory/factories' + windowSize: 'PT5M' + } +} diff --git a/e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep b/e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep new file mode 100644 index 000000000..026e5f1b9 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/dashboard.bicep @@ -0,0 +1,235 @@ +param project string +@allowed([ + 'dev' + 'stg' + 'prod' +]) +param env string +param location string = resourceGroup().location +param deployment_id string +param datafactory_name string +param sql_server_name string +param sql_database_name string + +resource dashboard 'Microsoft.Portal/dashboards@2020-09-01-preview' = { + name: '${project}-dashboard-${env}-${deployment_id}' + location: location + tags: { + DisplayName: 'Azure Dashboard' + Environment: env + } + properties : { + lenses : [ + { + order : 0 + parts : [ + { + position : { + x : 0 + y : 0 + rowSpan : 4 + colSpan : 6 + } + metadata : { + inputs : [ + { + name: 'options' + isOptional: true + } + { + name: 'sharedTimeRange' + isOptional: true + } + ] + type : 'Extension/HubsExtension/PartType/MonitorChartPart' + settings : { + content : { + options: { + chart: { + metrics: [ + { + resourceMetadata: { + id : '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/Microsoft.DataFactory/factories/${datafactory_name}' + } + name : 'PipelineFailedRuns' + aggregationType : 1 + namespace : 'microsoft.datafactory/factories' + metricVisualization : { + displayName : 'Failed pipeline runs metrics' + resourceDisplayName : datafactory_name + } + } + ] + title : 'Count Failed activity runs metrics for ${datafactory_name}' + titleKind : 1 + visualization : { + chartType : 2 + legendVisualization : { + isVisible : true + position : 2 + hideSubtitle : false + } + axisVisualization : { + x : { + isVisible : true + axisType : 2 + } + y : { + isVisible : true + axisType : 1 + } + } + disablePinning : true + } + } + } + } + } + } + } + { + position : { + x : 6 + y : 0 + rowSpan : 4 + colSpan : 6 + } + metadata : { + inputs : [ + { + name: 'options' + isOptional: true + } + { + name: 'sharedTimeRange' + isOptional: true + } + ] + type : 'Extension/HubsExtension/PartType/MonitorChartPart' + settings : { + content : { + options: { + chart: { + metrics: [ + { + resourceMetadata: { + id : '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/Microsoft.DataFactory/factories/${datafactory_name}' + } + name : 'PipelineSucceededRuns' + aggregationType : 1 + namespace : 'microsoft.datafactory/factories' + metricVisualization : { + displayName : 'Succeeded pipeline runs metrics' + resourceDisplayName : datafactory_name + } + } + ] + title : 'Sum Succeeded pipeline runs metrics for ${datafactory_name}' + titleKind : 1 + visualization : { + chartType : 2 + legendVisualization : { + isVisible : true + position : 2 + hideSubtitle : false + } + axisVisualization : { + x : { + isVisible : true + axisType : 2 + } + y : { + isVisible : true + axisType : 1 + } + } + disablePinning : true + } + } + } + } + } + } + } + { + position : { + x : 0 + y : 4 + rowSpan : 4 + colSpan : 6 + } + metadata : { + inputs : [ + { + name: 'options' + isOptional: true + } + { + name: 'sharedTimeRange' + isOptional: true + } + ] + type : 'Extension/HubsExtension/PartType/MonitorChartPart' + settings : { + content : { + options: { + chart: { + metrics: [ + { + resourceMetadata: { + id : '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Sql/servers/${sql_server_name}/databases/${sql_database_name}' + } + name : 'cpu_percent' + aggregationType : 4 + namespace : 'microsoft.sql/servers/databases' + metricVisualization : { + displayName : 'CPU percentage' + } + } + { + resourceMetadata: { + id : '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/Microsoft.Sql/servers/${sql_server_name}/databases/${sql_database_name}' + } + name : 'connection_failed' + aggregationType : 1 + namespace : 'microsoft.sql/servers/databases' + metricVisualization : { + displayName : 'Failed Connections' + } + } + ] + title : 'Avg CPU percentage and Sum Failed Connections for ${sql_database_name}' + titleKind : 1 + visualization : { + chartType : 2 + legendVisualization : { + isVisible : true + position : 2 + hideSubtitle : false + } + axisVisualization : { + x : { + isVisible : true + axisType : 2 + } + y : { + isVisible : true + axisType : 1 + } + } + disablePinning : true + } + } + } + } + } + } + } + ] + } + ] + metadata : { + model : {} + } +} +} diff --git a/e2e_samples/parking_sensors/infrastructure/modules/data_quality_workbook.bicep b/e2e_samples/parking_sensors/infrastructure/modules/data_quality_workbook.bicep new file mode 100644 index 000000000..8f349ca80 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/data_quality_workbook.bicep @@ -0,0 +1,31 @@ +param workbookDisplayName string = 'DQ Report' +param workbookType string = 'workbook' +param appinsights_name string +var workbookSourceId = '${subscription().id}/resourceGroups/${resourceGroup().name}/providers/microsoft.insights/components/${appinsights_name}' +var workbookId = guid(workbookSourceId) +var serializedData = '{"version":"Notebook/1.0","items":[{"type":3,"content":{"version":"KqlItem/1.0","query":"traces\\r\\n| where message==\\"verifychecks\\"\\r\\n| where customDimensions.check_name==\\"Parkingbay Data DQ\\" or customDimensions.check_name==\\"Transformed Data\\" \\r\\n| where severityLevel==\\"1\\" or severityLevel==\\"3\\"\\r\\n| where notempty(customDimensions.pipelinerunid)\\r\\n| project Status = iif(severityLevel==\\"1\\", \\"success\\", \\"failed\\"),CheckName=customDimensions.check_name,RunID = customDimensions.pipelinerunid, Details=customDimensions,Timestamp=timestamp","size":0,"aggregation":3,"timeContext":{"durationMs":604800000},"queryType":0,"resourceType":"microsoft.insights/components","visualization":"table","gridSettings":{"formatters":[{"columnMatch":"Status","formatter":11},{"columnMatch":"status","formatter":11}]}},"name":"query - 0"},{"type":3,"content":{"version":"KqlItem/1.0","query":"traces\\r\\n| where message==\\"verifychecks\\"\\r\\n| where customDimensions.check_name==\\"DQ checks\\"\\r\\n| where severityLevel==\\"1\\" or severityLevel==\\"3\\"\\r\\n| where notempty(customDimensions.pipelinerunid)\\r\\n| project Status = iif(severityLevel==\\"1\\", \\"Success\\", \\"Failed\\"),CheckName=customDimensions.check_name,RunID = customDimensions.pipelinerunid, Details=customDimensions,Timestamp=timestamp\\r\\n| summarize count() by Status \\r\\n| render piechart","size":0,"timeContext":{"durationMs":604800000},"queryType":0,"resourceType":"microsoft.insights/components","visualization":"piechart","tileSettings":{"showBorder":false,"titleContent":{"columnMatch":"Status","formatter":1},"leftContent":{"columnMatch":"count_","formatter":12,"formatOptions":{"palette":"auto"},"numberFormat":{"unit":17,"options":{"maximumSignificantDigits":3,"maximumFractionDigits":2}}}},"graphSettings":{"type":0,"topContent":{"columnMatch":"Status","formatter":1},"centerContent":{"columnMatch":"count_","formatter":1,"numberFormat":{"unit":17,"options":{"maximumSignificantDigits":3,"maximumFractionDigits":2}}}},"chartSettings":{"seriesLabelSettings":[{"seriesName":"success","label":"","color":"greenDark"},{"seriesName":"failed","color":"red"}]},"mapSettings":{"locInfo":"LatLong","sizeSettings":"count_","sizeAggregation":"Sum","legendMetric":"count_","legendAggregation":"Sum","itemColorSettings":{"type":"heatmap","colorAggregation":"Sum","nodeColorField":"count_","heatmapPalette":"greenRed"}}},"name":"query - 1"}],"fallbackResourceIds":["/subscriptions/XXX-XXX-XXX-XX-XXX/resourceGroups/XXXX/providers/microsoft.insights/components/XXXX"],"$schema":"https://github.com/Microsoft/Application-Insights-Workbooks/blob/master/schema/workbook.json"}' +var parsedData = json(serializedData) +var updatedWorkbookData = { + version: parsedData.version + items: parsedData.items + fallbackResourceIds: [ + workbookSourceId + ] +} +var reserializedData = string(updatedWorkbookData) + +resource data_quality_workbook_resource 'microsoft.insights/workbooks@2018-06-17-preview' = { + name: workbookId + location: resourceGroup().location + kind: 'shared' + properties: { + displayName: workbookDisplayName + serializedData: reserializedData + version: '1.0' + sourceId: workbookSourceId + category: workbookType + } + dependsOn: [] +} + +output workbookId string = data_quality_workbook_resource.id diff --git a/e2e_samples/parking_sensors/infrastructure/modules/diagnostic_settings.bicep b/e2e_samples/parking_sensors/infrastructure/modules/diagnostic_settings.bicep new file mode 100644 index 000000000..4fbff5802 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/diagnostic_settings.bicep @@ -0,0 +1,43 @@ +param project string +param env string +param deployment_id string +param loganalytics_workspace_name string +param datafactory_name string + +var commonPrefix = '${project}-diag-${env}-${deployment_id}' + +resource datafactoryworkspace 'Microsoft.DataFactory/factories@2018-06-01' existing = { + name: datafactory_name +} + +resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2020-08-01' existing = { + name: loganalytics_workspace_name +} + +resource diagnosticSetting1 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = { + scope: datafactoryworkspace + name: '${commonPrefix}-${datafactoryworkspace.name}' + properties: { + workspaceId: logAnalyticsWorkspace.id + logs: [ + { + category: 'PipelineRuns' + enabled: true + } + { + category: 'TriggerRuns' + enabled: true + } + { + category: 'ActivityRuns' + enabled: true + } + ] + metrics: [ + { + category: 'AllMetrics' + enabled: true + } + ] + } +} diff --git a/e2e_samples/parking_sensors/infrastructure/modules/log_analytics.bicep b/e2e_samples/parking_sensors/infrastructure/modules/log_analytics.bicep new file mode 100644 index 000000000..9d0bd6ed7 --- /dev/null +++ b/e2e_samples/parking_sensors/infrastructure/modules/log_analytics.bicep @@ -0,0 +1,32 @@ +param project string +@allowed([ + 'dev' + 'stg' + 'prod' +]) +param env string +param location string = resourceGroup().location +param deployment_id string +param retentionInDays int = 31 + + +resource loganalyticsworkspace 'Microsoft.OperationalInsights/workspaces@2020-08-01' = { + name: '${project}-log-${env}-${deployment_id}' + location: location + tags: { + DisplayName: 'Log Analytics' + Environment: env + } + properties: { + sku: { + name: 'PerGB2018' + } + retentionInDays: retentionInDays + features: { + searchVersion: 1 + legacy: 0 + } + } +} + +output loganalyticswsname string = loganalyticsworkspace.name diff --git a/e2e_samples/parking_sensors/infrastructure/modules/synapse_sql_pool.bicep b/e2e_samples/parking_sensors/infrastructure/modules/synapse_sql_pool.bicep index ae6a98634..f552b0d08 100644 --- a/e2e_samples/parking_sensors/infrastructure/modules/synapse_sql_pool.bicep +++ b/e2e_samples/parking_sensors/infrastructure/modules/synapse_sql_pool.bicep @@ -48,6 +48,5 @@ resource sql_server 'Microsoft.Sql/servers@2021-02-01-preview' = { output synapse_sql_pool_output object = { name: sql_server.name username: sql_server_username - password: sql_server_password synapse_pool_name: sql_server::synapse_dedicated_sql_pool.name } diff --git a/e2e_samples/parking_sensors/scripts/deploy_infrastructure.sh b/e2e_samples/parking_sensors/scripts/deploy_infrastructure.sh index b9a8372c6..7f5133111 100755 --- a/e2e_samples/parking_sensors/scripts/deploy_infrastructure.sh +++ b/e2e_samples/parking_sensors/scripts/deploy_infrastructure.sh @@ -137,7 +137,6 @@ echo "Retrieving SQL Server information from the deployment." # Retrieve SQL creds sql_server_name=$(echo "$arm_output" | jq -r '.properties.outputs.synapse_sql_pool_output.value.name') sql_server_username=$(echo "$arm_output" | jq -r '.properties.outputs.synapse_sql_pool_output.value.username') -sql_server_password=$(echo "$arm_output" | jq -r '.properties.outputs.synapse_sql_pool_output.value.password') sql_dw_database_name=$(echo "$arm_output" | jq -r '.properties.outputs.synapse_sql_pool_output.value.synapse_pool_name') # SQL Connection String @@ -145,12 +144,12 @@ sql_dw_connstr_nocred=$(az sql db show-connection-string --client ado.net \ --name "$sql_dw_database_name" --server "$sql_server_name" --output json | jq -r .) sql_dw_connstr_uname=${sql_dw_connstr_nocred//$sql_server_username} -sql_dw_connstr_uname_pass=${sql_dw_connstr_uname//$sql_server_password} +sql_dw_connstr_uname_pass=${sql_dw_connstr_uname//$AZURESQL_SERVER_PASSWORD} # Store in Keyvault az keyvault secret set --vault-name "$kv_name" --name "sqlsrvrName" --value "$sql_server_name" az keyvault secret set --vault-name "$kv_name" --name "sqlsrvUsername" --value "$sql_server_username" -az keyvault secret set --vault-name "$kv_name" --name "sqlsrvrPassword" --value "$sql_server_password" +az keyvault secret set --vault-name "$kv_name" --name "sqlsrvrPassword" --value "$AZURESQL_SERVER_PASSWORD" az keyvault secret set --vault-name "$kv_name" --name "sqldwDatabaseName" --value "$sql_dw_database_name" az keyvault secret set --vault-name "$kv_name" --name "sqldwConnectionString" --value "$sql_dw_connstr_uname_pass" @@ -165,9 +164,15 @@ appinsights_key=$(az monitor app-insights component show \ --resource-group "$resource_group_name" \ --output json | jq -r '.instrumentationKey') +appinsights_connstr=$(az monitor app-insights component show \ + --app "$appinsights_name" \ + --resource-group "$resource_group_name" \ + --output json | + jq -r '.connectionString') # Store in Keyvault az keyvault secret set --vault-name "$kv_name" --name "applicationInsightsKey" --value "$appinsights_key" +az keyvault secret set --vault-name "$kv_name" --name "applicationInsightsConnectionString" --value "$appinsights_connstr" # ########################### # # RETRIEVE DATABRICKS INFORMATION AND CONFIGURE WORKSPACE @@ -232,6 +237,7 @@ tmpfile=.tmpfile adfLsDir=$adfTempDir/linkedService jq --arg kvurl "$kv_dns_name" '.properties.typeProperties.baseUrl = $kvurl' $adfLsDir/Ls_KeyVault_01.json > "$tmpfile" && mv "$tmpfile" $adfLsDir/Ls_KeyVault_01.json jq --arg databricksWorkspaceUrl "$databricks_host" '.properties.typeProperties.domain = $databricksWorkspaceUrl' $adfLsDir/Ls_AzureDatabricks_01.json > "$tmpfile" && mv "$tmpfile" $adfLsDir/Ls_AzureDatabricks_01.json +jq --arg databricksWorkspaceResourceId "$databricks_workspace_resource_id" '.properties.typeProperties.workspaceResourceId = $databricksWorkspaceResourceId' $adfLsDir/Ls_AzureDatabricks_01.json > "$tmpfile" && mv "$tmpfile" $adfLsDir/Ls_AzureDatabricks_01.json jq --arg datalakeUrl "https://$azure_storage_account.dfs.core.windows.net" '.properties.typeProperties.url = $datalakeUrl' $adfLsDir/Ls_AdlsGen2_01.json > "$tmpfile" && mv "$tmpfile" $adfLsDir/Ls_AdlsGen2_01.json datafactory_name=$(echo "$arm_output" | jq -r '.properties.outputs.datafactory_name.value') @@ -283,7 +289,7 @@ DATABRICKS_HOST=$databricks_host \ DATABRICKS_WORKSPACE_RESOURCE_ID=$databricks_workspace_resource_id \ SQL_SERVER_NAME=$sql_server_name \ SQL_SERVER_USERNAME=$sql_server_username \ -SQL_SERVER_PASSWORD=$sql_server_password \ +SQL_SERVER_PASSWORD=$AZURESQL_SERVER_PASSWORD \ SQL_DW_DATABASE_NAME=$sql_dw_database_name \ AZURE_STORAGE_KEY=$azure_storage_key \ AZURE_STORAGE_ACCOUNT=$azure_storage_account \ @@ -306,7 +312,7 @@ RESOURCE_GROUP_NAME=${resource_group_name} AZURE_LOCATION=${AZURE_LOCATION} SQL_SERVER_NAME=${sql_server_name} SQL_SERVER_USERNAME=${sql_server_username} -SQL_SERVER_PASSWORD=${sql_server_password} +SQL_SERVER_PASSWORD=${AZURESQL_SERVER_PASSWORD} SQL_DW_DATABASE_NAME=${sql_dw_database_name} AZURE_STORAGE_ACCOUNT=${azure_storage_account} AZURE_STORAGE_KEY=${azure_storage_key} diff --git a/e2e_samples/parking_sensors/sql/ddo_azuresqldw_dw/ddo_azuresqldw_dw/ddo_azuresqldw_dw.sqlproj b/e2e_samples/parking_sensors/sql/ddo_azuresqldw_dw/ddo_azuresqldw_dw/ddo_azuresqldw_dw.sqlproj index c229cea02..733a67288 100644 --- a/e2e_samples/parking_sensors/sql/ddo_azuresqldw_dw/ddo_azuresqldw_dw/ddo_azuresqldw_dw.sqlproj +++ b/e2e_samples/parking_sensors/sql/ddo_azuresqldw_dw/ddo_azuresqldw_dw/ddo_azuresqldw_dw.sqlproj @@ -1,124 +1,125 @@ - - - - Debug - AnyCPU - ddo_azuresqldw_dw - 2.0 - 4.1 - {aa416cf5-f184-4573-b591-7ed42a294421} - Microsoft.Data.Tools.Schema.Sql.SqlDwDatabaseSchemaProvider - Database - - - ddo_azuresqldw_dw - ddo_azuresqldw_dw - 1033, CI - BySchemaAndSchemaType - True - v4.5 - CS - Properties - False - True - True - - - bin\Release\ - $(MSBuildProjectName).sql - False - pdbonly - true - false - true - prompt - 4 - - - bin\Debug\ - $(MSBuildProjectName).sql - false - true - full - false - true - true - prompt - 4 - - - 11.0 - - True - 11.0 - - - - - - - - - - - - - - - - - - Off - - - - - - - Off - - - Off - - - - Off - - - - - - - - - - - Off - - - Off - - - - - - - $(SqlCmdVar__2) - - - - - $(SqlCmdVar__1) - - - - - - - - $(DacPacRootPath)\Extensions\Microsoft\SQLDB\Extensions\SqlServer\AzureDw\SqlSchemas\master.dacpac - False - master - - + + + + Debug + AnyCPU + ddo_azuresqldw_dw + 2.0 + 4.1 + {aa416cf5-f184-4573-b591-7ed42a294421} + Microsoft.Data.Tools.Schema.Sql.SqlDwDatabaseSchemaProvider + Database + + + ddo_azuresqldw_dw + ddo_azuresqldw_dw + 1033, CI + BySchemaAndSchemaType + True + v4.7.2 + CS + Properties + False + True + True + + + + bin\Release\ + $(MSBuildProjectName).sql + False + pdbonly + true + false + true + prompt + 4 + + + bin\Debug\ + $(MSBuildProjectName).sql + false + true + full + false + true + true + prompt + 4 + + + 11.0 + + True + 11.0 + + + + + + + + + + + + + + + + + + Off + + + + + + + Off + + + Off + + + + Off + + + + + + + + + + + Off + + + Off + + + + + + + $(SqlCmdVar__2) + + + + + $(SqlCmdVar__1) + + + + + + + + $(DacPacRootPath)\Extensions\Microsoft\SQLDB\Extensions\SqlServer\AzureDw\SqlSchemas\master.dacpac + False + master + + \ No newline at end of file diff --git a/single_tech_samples/streamanalytics/README.md b/single_tech_samples/streamanalytics/README.md index fbd07e997..e5612fda2 100644 --- a/single_tech_samples/streamanalytics/README.md +++ b/single_tech_samples/streamanalytics/README.md @@ -10,7 +10,7 @@ 1. __Azure Cli__ Will be necessary for various tasks. Please follow the instructions found [here](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli). -1. __Bicep__ This project uses `Bicep` templates to setup `Azure` infrastructure. Please follow the steps under [Install and manage via Azure CLI (easiest)](https://github.com/Azure/bicep/blob/main/docs/installing.md#install-and-manage-via-azure-cli-easiest) to install the `Azure Cli` extension. +1. __Bicep__ This project uses `Bicep` templates to setup `Azure` infrastructure. Please follow the steps under [Install Bicep Tools](https://docs.microsoft.com/en-us/azure/azure-resource-manager/bicep/install) to install the `Azure Cli` extension. For an introduction to `Bicep`, you can find more information in the `Bicep` repo under [Get started with Bicep](https://github.com/Azure/bicep/#get-started-with-bicep).