Skip to content

Latest commit

 

History

History
51 lines (37 loc) · 3.73 KB

README.md

File metadata and controls

51 lines (37 loc) · 3.73 KB

Azure Automation Data Lake Analytics Scheduled Job

Using Azure Automation, Azure Scheduler and Data Lake Analytics Job to execute a U-SQL query against a Data Lake Store.

Scenario

An iterative task need to be executed against data stored in a Data Lake Store. The task in this example is an append on files stored in the data lake store. A Data Lake Anaytics Job is submitted using Azure Automation. Azure Scheduler is used, since the minimal interval for the built in scheduler for Azure Automation is 1 hour. This will allow better granularity with 10 minute intervals. In this scanario I assume that you have already setup a Data Lake Store.

alt tag

Deployment Setup Flow

The deployment flow is implemented in automateDataLakeJob.ps1 powershell script, and consists of 4 parts:

  1. Create a storage account and upload all the neccecary assets to it:
  2. Deploy an ARM template automationAccountDeployment.json to create an Azure Automation account with:
    • Runbook with script
    • DataLakeAnaytics powershell module
    • Variables to be used by the script in the runbook
    • Credentials object with Azure AD user to execute the automation scripts with.
  3. Create a webhook in the Automation runbook (at the moment this can't be done via ARM template)
  4. Create an Azure Scheduler collection and an HTTP job with the Automation runbook webhook as the POST uri

Authenticating with Azure Automation

Azure Automation requires an Azure Active Directoy organizational user to authenticate. There are some limitations - the user can not have multi factor authentiation enabled, and as of now service principle authentication (for Azure Resource Manager) is not supported. I would recommnd to create a user specificly for the Automation jobs. This user info (username and passowrd) will be stored in Credentials assets in the Automation Account. For more information, read this tutorial.

Executing the Script

Forks this repository, and edit automateDataLakeJob.ps1 with your information:

  1. If you have more han one subcription in your account, set the right id
  2. Set the automation ADD user name
  3. Set the automation AAD password
  4. Set the Data Lake account name
  5. Set the Data Lake resource group
  6. Set the automation account webhook expiry date
  7. Set the scheduler job start time

Note that in this scanario the scheduler job interval is 10 minutes.

References

Scheduling Azure Automation with Azure Scheduler

Azure Automation Authentication

Azure Automation ARM Powerhsell Modules

Azure Data Lake Analytics Powershell