Ready to boost your machine learning projects? Azure Machine Learning (Azure ML) offers a centralised repository called Feature Store to simplify feature engineering. It lets you store, manage and efficiently reuse features across projects, saving time and efforts.
This guide will show you how to set up an Azure ML Feature Store with online and offline materialisation using the Azure CLI (Az CLI). The online store is powered by the in-memory capacity of Azure Redis Cache, while the offline store utilises the 2nd generation of Azure Data Lake Store (ADLS Gen2).
Here's what you'll need:
- An Azure subscription,
- Az CLI installed (we'll cover its ML extension installation in the first step).
- Step 1: Installing Az CLI's ML extension
- Step 2: Creating ADLS Gen2 storage account
- Step 3: Creating container on ADLS Gen2 storage
- Step 4: Creating Redis Cache instance
- Step 5: Creating user-assigned Managed Identity
- Step 6: Creating Azure ML feature store
- HOUSEKEEPING: Deleting feature store
To interact with Azure ML through Az CLI, you need to install the ML extension:
az extension add --name ml
Now create an ADLS Gen2 account, that will be used as the default storage account by your feature store:
az storage account create --name <STORAGE_ACCOUNT_NAME> --enable-hierarchical-namespace true --resource-group <RESOURCE_GROUP_NAME> --location <AZ_REGION> --subscription <AZ_SUBSCRIPTION_ID>
Note: Replace
<STORAGE_ACCOUNT_NAME>
,<RESOURCE_GROUP_NAME>
,<AZ_REGION>
and<AZ_SUBSCRIPTION_ID>
with required Storage account values.
Once you've created a Storage account, you can setup a blob container that will be used by the feature store for offline materialisation:
az storage fs create --name <STORAGE_CONTAINER_NAME> --account-name <STORAGE_ACCOUNT_NAME> --subscription <AZ_SUBSCRIPTION_ID> --connection-string <CONNECTION_STRING>
Note: Replace
<STORAGE_CONTAINER_NAME>
,<STORAGE_ACCOUNT_NAME>
,<CONNECTION_STRING>
and<AZ_SUBSCRIPTION_ID>
with required Storage account's container values.
If you want your ML models to access features with low latency, create a Redis Cache instance for online materialisation:
az redis create --name <REDIS_CACHE_NAME> --resource-group <RESOURCE_GROUP_NAME> --location <AZ_REGION> --sku <REDIS_CACHE_SKU_TIER> --vm-size <REDIS_CACHE_SKU_FAMILY>
Note: Replace
<REDIS_CACHE_NAME>
,<RESOURCE_GROUP_NAME>
,<AZ_REGION>
,<REDIS_CACHE_SKU_TIER>
and<REDIS_CACHE_SKU_FAMILY>
with required Redis Cache resource's values.
Azure can automatically create a managed identity for your feature store. Alternatively, you can pre-provision your own, e.g to follow corporate naming conventions:
az identity create --name <MI_NAME> --resource-group <RESOURCE_GROUP_NAME> --location <AZ_REGION> --subscription <AZ_SUBSCRIPTION_ID>
Note: Replace
<MI_NAME>
,<RESOURCE_GROUP_NAME>
,<AZ_REGION>
and<AZ_SUBSCRIPTION_ID>
with required managed identity's values.
The last step is to create the feature store itself.
- Download the provided YAML template and update it with details from your previous steps (storage, Redis Cache and managed identity):
$schema: http://azureml/sdk-2-0/FeatureStore.json
# General configuration of Azure ML feature store
name: feature-store-online-offline
display_name: "Feature store with both online and offline materialisation"
resource_group: <rg>
location: <az-region>
tags:
author: Laziz_Turakulov
# Apache Spark configuration
compute_runtime:
spark_runtime_version: "3.4"
# Details of user-assigned managed identity
materialization_identity:
client_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
principal_id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
resource_id: "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<mi-name>"
# Details of Azure Redis Cache (online materialisation)
online_store:
type: redis
target: "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Cache/Redis/<redis-name>"
# Details of ADLS Gen2 Storage account (offline materialisation)
offline_store:
type: azure_data_lake_gen2
target: "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage_name>/blobServices/default/containers/<container_name>"
# Details of default Storage account
storage_account: "/subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage_name>"
- Then run the following Az CLI command to create the feature store:
az ml feature-store create --resource-group <RESOURCE_GROUP_NAME> --file FeatureStore_Online_Offline.yaml
- Verify successful creation by checking the Azure ML Studio UI for the newly created feature store.
- Your managed identity should have been assigned the following roles:
AzureML Data Scientist
to Azure ML feature store,Storage Blob Data Contributor
to default Storage account,Storage Blob Data Contributor
to offline store's Blob container,Contributor
to online store's Redis Cache resource.
While the Azure ML Studio UI currently doesn't support deleting feature stores directly, you can use the following Az CLI command:
az ml feature-store delete --name <FEATURE_STORE_NAME> --resource-group <RESOURCE_GROUP_NAME> --all-resources