Skip to content

Commit

Permalink
Create Azure machine learning related docs (feathr-ai#574)
Browse files Browse the repository at this point in the history
* Create AML docs

* update docs

* Update feathr-azure-machine-learning.md

* Update materializing-features.md

* Update azure-deployment-cli.md

* Update product_recommendation_demo_advanced.ipynb

* Update README.md

* move the UDF doc to concept folder

* Update how-to-guides.md

* Update feature-definition.md

* address comments
  • Loading branch information
xiaoyongzhu authored and ahlag committed Aug 26, 2022
1 parent 6eb99c7 commit 057aa25
Show file tree
Hide file tree
Showing 13 changed files with 54 additions and 14 deletions.
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo.
| Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 |
| Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake |
| Streaming Source | Kafka, EventHub |
| Online store | Azure Cache for Redis |
| Online store | Redis, Azure Cosmos DB (coming soon), Aerospike (coming soon) |
| Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server |
| Compute Engine | Azure Synapse Spark Pools, Databricks |
| Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Feathr User Defined Functions (UDFs)
parent: How-to Guides
parent: Feathr Concepts
---

# Feathr User Defined Functions (UDFs)
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/feature-definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ f_trip_time_duration = Feature(name="f_trip_time_duration",
transform="time_duration(lpep_pickup_datetime, lpep_dropoff_datetime, 'minutes')")
```

Note that for `transform` section, you can put a simple expression to transform your features. For more information, please refer to [Feathr User Defined Functions (UDFs)](../how-to-guides/feathr-udfs.md).
Note that for `transform` section, you can put a simple expression to transform your features. For more information, please refer to [Feathr User Defined Functions (UDFs)](./feathr-udfs.md).

### Anchor features with aggregations

Expand Down
8 changes: 4 additions & 4 deletions docs/concepts/materializing-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,9 +107,9 @@ client.materialize_features(settings)
This will generate features on latest date(assuming it's `2022/05/21`) and output data to the following path:
`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2022/05/21`

You can also specify a `BackfillTime` so the features will be generated only for those dates. For example:
You can also specify a `BackfillTime` which will specify a cutoff time for feature materialization. For example:

```Python
```python
backfill_time = BackfillTime(start=datetime(
2020, 5, 10), end=datetime(2020, 5, 20), step=timedelta(days=1))
offline_sink = HdfsSink(output_path="abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/")
Expand All @@ -120,8 +120,8 @@ settings = MaterializationSettings("nycTaxiTable",
backfill_time=backfill_time)
```

This will generate features from `2020/05/10` to `2020/05/20` and the output will have 11 folders, from
`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/10` to `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20`. Note that currently Feathr only supports materializing data in daily step (i.e. even if you specify an hourly step, the generated features in offline store will still be presented in a daily hierarchy).
This will materialize features with cutoff time from `2020/05/10` to `2020/05/20` correspondingly, and the output will have 11 folders, from
`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/10` to `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20`. Note that currently Feathr only supports materializing data in daily step (i.e. even if you specify an hourly step, the generated features in offline store will still be presented in a daily hierarchy). For more details on how `BackfillTime` works, refer to the [BackfillTime section](#feature-backfill) above.

You can also specify the format of the materialized features in the offline store by using `execution_configurations` like below. Please refer to the [documentation](../how-to-guides/feathr-job-configuration.md) here for those configuration details.

Expand Down
2 changes: 0 additions & 2 deletions docs/how-to-guides/azure-deployment-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,6 @@ az synapse workspace firewall-rule create --name allowAll --workspace-name $syna
# sleep for a few seconds for the change to take effect
sleep 2
az synapse role assignment create --workspace-name $synapse_workspace_name --role "Synapse Contributor" --assignee $service_principal_name


```

Alternatively, you can use your Azure account ("User Principal Name") like below:
Expand Down
38 changes: 38 additions & 0 deletions docs/how-to-guides/feathr-azure-machine-learning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
layout: default
title: Using Feathr in Azure Machine Learning
parent: How-to Guides
---

# Using Feathr in Azure Machine Learning

Feathr has native integration with Azure Machine Learning (AML). However due to a few known issues, users have to do a little bit more on using Feathr in Azure Machine Learning.

## Installing Feathr in Azure Machine Learning

1. Switch python version. By default, Azure Machine Learning Notebooks uses an old Python version (3.6) which is not supported by Feathr. You should use the latest Python version in Azure Machine Learning. Switch it by using the button below:
![Switch Python Version](../images/aml-environment-switch.png)
2. Install Feathr using the following command. Instead using `!pip install feathr` in Azure Machine Learning, you should use the following command to install Feathr, to make sure that Feathr is available in the current active Python environment:

```python
import pip
pip.main(['install', 'feathr'])
pip.main(['install', 'azure-identity>=1.8.0']) #fixing Azure Machine Learning authentication issue per https://stackoverflow.com/a/72262694/3193073
```

## Authentication in Azure Machine Learning

Azure Machine Learning has native integration to allow you authenticate. All the [Feathr sample notebooks](../samples/) will be able to seamlessly use the credentials that you have logged in.

When logged into Azure Machine Learning, you will see a prompt like this to ask you to login:

![Switch Python Version](../images/aml-authentication.png)

And after you have logged in, for all [Feathr sample notebooks](../samples/), simply remove those two lines because they are duplicated:

```bash
! pip install feathr azure-cli pandavro scikit-learn
! az login --use-device-code
```

And that's it! enjoy the rest of the capabilities that Azure Machine Learning brings to you, include distributed machine learning training and managed compute, etc.
4 changes: 2 additions & 2 deletions docs/how-to-guides/feathr-job-configuration.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
layout: default
title: Feathr Job Configuration
title: Feathr Job Configuration during Run Time
parent: How-to Guides
---

# Feathr Job Configuration
# Feathr Job Configuration during Run Time

Since Feathr uses Spark as the underlying execution engine, there's a way to override Spark configuration by `FeathrClient.get_offline_features()` with `execution_configurations` parameters. The complete list of the available spark configuration is located in [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html) (though not all of those are honored for cloud hosted Spark platforms such as Databricks), and there are a few Feathr specific ones that are documented here:

Expand Down
2 changes: 1 addition & 1 deletion docs/how-to-guides/how-to-guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ permalink: docs/how-to-guides

# How-to Guides

This folder includes important Feathr how-to guides.
This folder includes important Feathr how-to guides and will help hands-on customers with detailed step-by-step guide.
2 changes: 2 additions & 0 deletions docs/how-to-guides/local-feature-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ parent: How-to Guides

# Local Feature Testing Guide

> :warning: This document is out of date and will be updated in the future.
> **Local testing supports .csv and .parquet source format.**
# What's Local Feature Testing
Expand Down
2 changes: 2 additions & 0 deletions docs/how-to-guides/troubleshoot-feature-definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ parent: How-to Guides

# Feature Definition Troubleshooting Guide

> :warning: This document is out of date and will be updated in the future.
You may come across some errors while creating your feature definition config. This guide will help you troubleshoot those errors.

## Prerequisite
Expand Down
Binary file added docs/images/aml-authentication.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/aml-environment-switch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions feathr_project/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
'Click',
"azure-storage-file-datalake>=12.5.0",
"azure-synapse-spark",
"azure-identity",
"azure-identity>=1.8.0", #fixing Azure Machine Learning authentication issue per https://stackoverflow.com/a/72262694/3193073
"py4j",
"loguru",
"pandas",
Expand All @@ -49,7 +49,7 @@
# In 1.23.0, azure-core is using ParamSpec which might cause issues in some of the databricks runtime.
# see this for more details:
# https://github.com/Azure/azure-sdk-for-python/pull/22891
# using a version lower than that to workaround this issue
# using a version lower than that to workaround this issue.
"azure-core<=1.22.1",
"typing_extensions>=4.2.0"
],
Expand Down

0 comments on commit 057aa25

Please sign in to comment.