diff --git a/docs/README.md b/docs/README.md index 9fcf91ece..718288668 100644 --- a/docs/README.md +++ b/docs/README.md @@ -160,7 +160,7 @@ Follow the [quick start Jupyter Notebook](./samples/product_recommendation_demo. | Offline store – Object Store | Azure Blob Storage, Azure ADLS Gen2, AWS S3 | | Offline store – SQL | Azure SQL DB, Azure Synapse Dedicated SQL Pools, Azure SQL in VM, Snowflake | | Streaming Source | Kafka, EventHub | -| Online store | Azure Cache for Redis | +| Online store | Redis, Azure Cosmos DB (coming soon), Aerospike (coming soon) | | Feature Registry and Governance | Azure Purview, ANSI SQL such as Azure SQL Server | | Compute Engine | Azure Synapse Spark Pools, Databricks | | Machine Learning Platform | Azure Machine Learning, Jupyter Notebook, Databricks Notebook | diff --git a/docs/how-to-guides/feathr-udfs.md b/docs/concepts/feathr-udfs.md similarity index 99% rename from docs/how-to-guides/feathr-udfs.md rename to docs/concepts/feathr-udfs.md index a28ae5aac..06f892d18 100644 --- a/docs/how-to-guides/feathr-udfs.md +++ b/docs/concepts/feathr-udfs.md @@ -1,7 +1,7 @@ --- layout: default title: Feathr User Defined Functions (UDFs) -parent: How-to Guides +parent: Feathr Concepts --- # Feathr User Defined Functions (UDFs) diff --git a/docs/concepts/feature-definition.md b/docs/concepts/feature-definition.md index 1aca4af89..51ddb6742 100644 --- a/docs/concepts/feature-definition.md +++ b/docs/concepts/feature-definition.md @@ -72,7 +72,7 @@ f_trip_time_duration = Feature(name="f_trip_time_duration", transform="time_duration(lpep_pickup_datetime, lpep_dropoff_datetime, 'minutes')") ``` -Note that for `transform` section, you can put a simple expression to transform your features. For more information, please refer to [Feathr User Defined Functions (UDFs)](../how-to-guides/feathr-udfs.md). +Note that for `transform` section, you can put a simple expression to transform your features. For more information, please refer to [Feathr User Defined Functions (UDFs)](./feathr-udfs.md). ### Anchor features with aggregations diff --git a/docs/concepts/materializing-features.md b/docs/concepts/materializing-features.md index a1d714730..13466427c 100644 --- a/docs/concepts/materializing-features.md +++ b/docs/concepts/materializing-features.md @@ -107,9 +107,9 @@ client.materialize_features(settings) This will generate features on latest date(assuming it's `2022/05/21`) and output data to the following path: `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2022/05/21` -You can also specify a `BackfillTime` so the features will be generated only for those dates. For example: +You can also specify a `BackfillTime` which will specify a cutoff time for feature materialization. For example: -```Python +```python backfill_time = BackfillTime(start=datetime( 2020, 5, 10), end=datetime(2020, 5, 20), step=timedelta(days=1)) offline_sink = HdfsSink(output_path="abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/") @@ -120,8 +120,8 @@ settings = MaterializationSettings("nycTaxiTable", backfill_time=backfill_time) ``` -This will generate features from `2020/05/10` to `2020/05/20` and the output will have 11 folders, from -`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/10` to `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20`. Note that currently Feathr only supports materializing data in daily step (i.e. even if you specify an hourly step, the generated features in offline store will still be presented in a daily hierarchy). +This will materialize features with cutoff time from `2020/05/10` to `2020/05/20` correspondingly, and the output will have 11 folders, from +`abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/10` to `abfss://feathrazuretest3fs@feathrazuretest3storage.dfs.core.windows.net/materialize_offline_test_data/df0/daily/2020/05/20`. Note that currently Feathr only supports materializing data in daily step (i.e. even if you specify an hourly step, the generated features in offline store will still be presented in a daily hierarchy). For more details on how `BackfillTime` works, refer to the [BackfillTime section](#feature-backfill) above. You can also specify the format of the materialized features in the offline store by using `execution_configurations` like below. Please refer to the [documentation](../how-to-guides/feathr-job-configuration.md) here for those configuration details. diff --git a/docs/how-to-guides/azure-deployment-cli.md b/docs/how-to-guides/azure-deployment-cli.md index 462b09fd6..3762f7b3f 100644 --- a/docs/how-to-guides/azure-deployment-cli.md +++ b/docs/how-to-guides/azure-deployment-cli.md @@ -160,8 +160,6 @@ az synapse workspace firewall-rule create --name allowAll --workspace-name $syna # sleep for a few seconds for the change to take effect sleep 2 az synapse role assignment create --workspace-name $synapse_workspace_name --role "Synapse Contributor" --assignee $service_principal_name - - ``` Alternatively, you can use your Azure account ("User Principal Name") like below: diff --git a/docs/how-to-guides/feathr-azure-machine-learning.md b/docs/how-to-guides/feathr-azure-machine-learning.md new file mode 100644 index 000000000..a0587979c --- /dev/null +++ b/docs/how-to-guides/feathr-azure-machine-learning.md @@ -0,0 +1,38 @@ +--- +layout: default +title: Using Feathr in Azure Machine Learning +parent: How-to Guides +--- + +# Using Feathr in Azure Machine Learning + +Feathr has native integration with Azure Machine Learning (AML). However due to a few known issues, users have to do a little bit more on using Feathr in Azure Machine Learning. + +## Installing Feathr in Azure Machine Learning + +1. Switch python version. By default, Azure Machine Learning Notebooks uses an old Python version (3.6) which is not supported by Feathr. You should use the latest Python version in Azure Machine Learning. Switch it by using the button below: + ![Switch Python Version](../images/aml-environment-switch.png) +2. Install Feathr using the following command. Instead using `!pip install feathr` in Azure Machine Learning, you should use the following command to install Feathr, to make sure that Feathr is available in the current active Python environment: + + ```python + import pip + pip.main(['install', 'feathr']) + pip.main(['install', 'azure-identity>=1.8.0']) #fixing Azure Machine Learning authentication issue per https://stackoverflow.com/a/72262694/3193073 + ``` + +## Authentication in Azure Machine Learning + +Azure Machine Learning has native integration to allow you authenticate. All the [Feathr sample notebooks](../samples/) will be able to seamlessly use the credentials that you have logged in. + +When logged into Azure Machine Learning, you will see a prompt like this to ask you to login: + +![Switch Python Version](../images/aml-authentication.png) + +And after you have logged in, for all [Feathr sample notebooks](../samples/), simply remove those two lines because they are duplicated: + +```bash +! pip install feathr azure-cli pandavro scikit-learn +! az login --use-device-code +``` + +And that's it! enjoy the rest of the capabilities that Azure Machine Learning brings to you, include distributed machine learning training and managed compute, etc. diff --git a/docs/how-to-guides/feathr-job-configuration.md b/docs/how-to-guides/feathr-job-configuration.md index 70a2e9505..84a0b4d4d 100644 --- a/docs/how-to-guides/feathr-job-configuration.md +++ b/docs/how-to-guides/feathr-job-configuration.md @@ -1,10 +1,10 @@ --- layout: default -title: Feathr Job Configuration +title: Feathr Job Configuration during Run Time parent: How-to Guides --- -# Feathr Job Configuration +# Feathr Job Configuration during Run Time Since Feathr uses Spark as the underlying execution engine, there's a way to override Spark configuration by `FeathrClient.get_offline_features()` with `execution_configurations` parameters. The complete list of the available spark configuration is located in [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html) (though not all of those are honored for cloud hosted Spark platforms such as Databricks), and there are a few Feathr specific ones that are documented here: diff --git a/docs/how-to-guides/how-to-guides.md b/docs/how-to-guides/how-to-guides.md index 81ddab6b4..7239a50ab 100644 --- a/docs/how-to-guides/how-to-guides.md +++ b/docs/how-to-guides/how-to-guides.md @@ -7,4 +7,4 @@ permalink: docs/how-to-guides # How-to Guides -This folder includes important Feathr how-to guides. \ No newline at end of file +This folder includes important Feathr how-to guides and will help hands-on customers with detailed step-by-step guide. diff --git a/docs/how-to-guides/local-feature-testing.md b/docs/how-to-guides/local-feature-testing.md index e473a9c86..6b3713984 100644 --- a/docs/how-to-guides/local-feature-testing.md +++ b/docs/how-to-guides/local-feature-testing.md @@ -6,6 +6,8 @@ parent: How-to Guides # Local Feature Testing Guide +> :warning: This document is out of date and will be updated in the future. + > **Local testing supports .csv and .parquet source format.** # What's Local Feature Testing diff --git a/docs/how-to-guides/troubleshoot-feature-definition.md b/docs/how-to-guides/troubleshoot-feature-definition.md index 1e60ce205..c42b7ef87 100644 --- a/docs/how-to-guides/troubleshoot-feature-definition.md +++ b/docs/how-to-guides/troubleshoot-feature-definition.md @@ -6,6 +6,8 @@ parent: How-to Guides # Feature Definition Troubleshooting Guide +> :warning: This document is out of date and will be updated in the future. + You may come across some errors while creating your feature definition config. This guide will help you troubleshoot those errors. ## Prerequisite diff --git a/docs/images/aml-authentication.png b/docs/images/aml-authentication.png new file mode 100644 index 000000000..e7f3bb462 Binary files /dev/null and b/docs/images/aml-authentication.png differ diff --git a/docs/images/aml-environment-switch.png b/docs/images/aml-environment-switch.png new file mode 100644 index 000000000..244b56f84 Binary files /dev/null and b/docs/images/aml-environment-switch.png differ diff --git a/feathr_project/setup.py b/feathr_project/setup.py index fa01b46af..6076d75ba 100644 --- a/feathr_project/setup.py +++ b/feathr_project/setup.py @@ -23,7 +23,7 @@ 'Click', "azure-storage-file-datalake>=12.5.0", "azure-synapse-spark", - "azure-identity", + "azure-identity>=1.8.0", #fixing Azure Machine Learning authentication issue per https://stackoverflow.com/a/72262694/3193073 "py4j", "loguru", "pandas", @@ -49,7 +49,7 @@ # In 1.23.0, azure-core is using ParamSpec which might cause issues in some of the databricks runtime. # see this for more details: # https://github.com/Azure/azure-sdk-for-python/pull/22891 - # using a version lower than that to workaround this issue + # using a version lower than that to workaround this issue. "azure-core<=1.22.1", "typing_extensions>=4.2.0" ],