From 41070cb8339b0971574375ef418301fe4d3ad94e Mon Sep 17 00:00:00 2001 From: Achal Shah Date: Tue, 2 Aug 2022 12:17:32 -0700 Subject: [PATCH] docs: Add docs for repo-upgrade and update architecture stuff (#2989) * docs: Add docs for repo-upgrade and update architecture stuff Signed-off-by: Achal Shah * describe Signed-off-by: Achal Shah * Fixes Signed-off-by: Felix Wang * Update docs Signed-off-by: Felix Wang Co-authored-by: Felix Wang --- docs/SUMMARY.md | 2 +- .../batch-materialization-engine.md | 2 +- .../architecture-and-components/overview.md | 5 +- .../stream-processor.md | 8 ++ docs/how-to-guides/automated-feast-upgrade.md | 78 +++++++++++++++++++ 5 files changed, 92 insertions(+), 3 deletions(-) create mode 100644 docs/getting-started/architecture-and-components/stream-processor.md create mode 100644 docs/how-to-guides/automated-feast-upgrade.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 88691d82f9..b0e88b413f 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -52,7 +52,7 @@ * [Read features from the online store](how-to-guides/feast-snowflake-gcp-aws/read-features-from-the-online-store.md) * [Running Feast in production](how-to-guides/running-feast-in-production.md) * [Upgrading from Feast 0.9](https://docs.google.com/document/u/1/d/1AOsr\_baczuARjCpmZgVd8mCqTF4AZ49OEyU4Cn-uTT0/edit) -* [Adding a custom provider](how-to-guides/creating-a-custom-provider.md) +* [Upgrading for Feast 0.20+](how-to-guides/automated-feast-upgrade.md) * [Adding a custom batch materialization engine](how-to-guides/creating-a-custom-materialization-engine.md) * [Adding a new online store](how-to-guides/adding-support-for-a-new-online-store.md) * [Adding a new offline store](how-to-guides/adding-a-new-offline-store.md) diff --git a/docs/getting-started/architecture-and-components/batch-materialization-engine.md b/docs/getting-started/architecture-and-components/batch-materialization-engine.md index da21bd4c59..fb3c83ccb4 100644 --- a/docs/getting-started/architecture-and-components/batch-materialization-engine.md +++ b/docs/getting-started/architecture-and-components/batch-materialization-engine.md @@ -6,5 +6,5 @@ A materialization engine abstracts over specific technologies or frameworks that If the built-in engines are not sufficient, you can create your own custom materialization engine. Please see [this guide](../../how-to-guides/creating-a-custom-materialization-engine.md) for more details. -Please see [feature\_store.yaml](../../reference/feature-repository/feature-store-yaml.md#overview) for configuring providers. +Please see [feature\_store.yaml](../../reference/feature-repository/feature-store-yaml.md#overview) for configuring engines. diff --git a/docs/getting-started/architecture-and-components/overview.md b/docs/getting-started/architecture-and-components/overview.md index 0c47fb2753..97bd779503 100644 --- a/docs/getting-started/architecture-and-components/overview.md +++ b/docs/getting-started/architecture-and-components/overview.md @@ -5,6 +5,7 @@ ## Functionality * **Create Batch Features:** ELT/ETL systems like Spark and SQL are used to transform data in the batch store. +* **Create Stream Features:** Stream features are created from streaming services such as Kafka or Kinesis, and can be pushed directly into Feast. * **Feast Apply:** The user (or CI) publishes versioned controlled feature definitions using `feast apply`. This CLI command updates infrastructure and persists definitions in the object store registry. * **Feast Materialize:** The user (or scheduler) executes `feast materialize` which loads features from the offline store into the online store. * **Model Training:** A model training pipeline is launched. It uses the Feast Python SDK to retrieve a training dataset and trains a model. @@ -23,8 +24,10 @@ A complete Feast deployment contains the following components: * Materialize (load) feature values into the online store. * Build and retrieve training datasets from the offline store. * Retrieve online features. +* **Stream Processor:** The Stream Processor can be used to ingest feature data from streams and write it into the online or offline stores. Currently, there's an experimental Spark processor that's able to consume data from Kafka. +* **Batch Materialization Engine:** The [Batch Materialization Engine](batch-materialization-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process. * **Online Store:** The online store is a database that stores only the latest feature values for each entity. The online store is populated by materialization jobs and from [stream ingestion](../../reference/data-sources/push.md). -* **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. Feast does not manage the offline store directly, but runs queries against it. +* **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to write data to the offline store if Feast is configured to log served features and the offline store supports this functionality. {% hint style="info" %} Java and Go Clients are also available for online feature retrieval. diff --git a/docs/getting-started/architecture-and-components/stream-processor.md b/docs/getting-started/architecture-and-components/stream-processor.md new file mode 100644 index 0000000000..13b6e5b304 --- /dev/null +++ b/docs/getting-started/architecture-and-components/stream-processor.md @@ -0,0 +1,8 @@ +# Stream Processor + +A Stream Processor is responsible for consuming data from stream sources (such as Kafka, Kinesis, etc.) and loading it directly into the online (and optionally the offline store). + +A Stream Processor abstracts over specific technologies or frameworks that are used to materialize data. An experimental Spark Processor for Kafka is available in Feast. + +If the built-in processor is not sufficient, you can create your own custom processor. Please see [this tutorial](../../tutorials/building-streaming-features.md) for more details. + diff --git a/docs/how-to-guides/automated-feast-upgrade.md b/docs/how-to-guides/automated-feast-upgrade.md new file mode 100644 index 0000000000..ff17748537 --- /dev/null +++ b/docs/how-to-guides/automated-feast-upgrade.md @@ -0,0 +1,78 @@ +# Automated upgrades for Feast 0.20+ + +## Overview + +Starting with Feast 0.20, the APIs of many core objects (e.g. feature views and entities) have been changed. +For example, many parameters have been renamed. +These changes were made in a backwards-compatible fashion; existing Feast repositories will continue to work until Feast 0.23, without any changes required. +However, Feast 0.24 will fully deprecate all of the old parameters, so in order to use Feast 0.24+ users must modify their Feast repositories. + +There are currently deprecation warnings that indicate to users exactly how to modify their repos. +In order to make the process somewhat easier, Feast 0.23 also introduces a new CLI command, `repo-upgrade`, that will partially automate the process of upgrading Feast repositories. + +The upgrade command aims to automatically modify the object definitions in a feature repo to match the API required by Feast 0.24+. When running the command, the Feast CLI analyzes the source code in the feature repo files using [bowler](https://pybowler.io/), and attempted to rewrite the files in a best-effort way. It's possible for there to be parts of the API that are not upgraded automatically. + +The `repo-upgrade` command is specifically meant for upgrading Feast repositories that were initially created in versions 0.23 and below to be compatible with versions 0.24 and above. +It is not intended to work for any future upgrades. + +## Usage + +At the root of a feature repo, you can run `feast repo-upgrade`. By default, the CLI only echos the changes it's planning on making, and does not modify any files in place. If the changes look reasonably, you can specify the `--write` flag to have the changes be written out to disk. + +An example: +```bash +$ feast repo-upgrade --write +--- /Users/achal/feast/prompt_dory/example.py ++++ /Users/achal/feast/prompt_dory/example.py +@@ -13,7 +13,6 @@ + path="/Users/achal/feast/prompt_dory/data/driver_stats.parquet", + event_timestamp_column="event_timestamp", + created_timestamp_column="created", +- date_partition_column="created" + ) + + # Define an entity for the driver. You can think of entity as a primary key used to +--- /Users/achal/feast/prompt_dory/example.py ++++ /Users/achal/feast/prompt_dory/example.py +@@ -3,7 +3,7 @@ + from google.protobuf.duration_pb2 import Duration + import pandas as pd + +-from feast import Entity, Feature, FeatureView, FileSource, ValueType, FeatureService, OnDemandFeatureView ++from feast import Entity, FeatureView, FileSource, ValueType, FeatureService, OnDemandFeatureView + + # Read data from parquet files. Parquet is convenient for local development mode. For + # production, you can use your favorite DWH, such as BigQuery. See Feast documentation +--- /Users/achal/feast/prompt_dory/example.py ++++ /Users/achal/feast/prompt_dory/example.py +@@ -4,6 +4,7 @@ + import pandas as pd + + from feast import Entity, Feature, FeatureView, FileSource, ValueType, FeatureService, OnDemandFeatureView ++from feast import Field + + # Read data from parquet files. Parquet is convenient for local development mode. For + # production, you can use your favorite DWH, such as BigQuery. See Feast documentation +--- /Users/achal/feast/prompt_dory/example.py ++++ /Users/achal/feast/prompt_dory/example.py +@@ -28,9 +29,9 @@ + entities=["driver_id"], + ttl=Duration(seconds=86400 * 365), + features=[ +- Feature(name="conv_rate", dtype=ValueType.FLOAT), +- Feature(name="acc_rate", dtype=ValueType.FLOAT), +- Feature(name="avg_daily_trips", dtype=ValueType.INT64), ++ Field(name="conv_rate", dtype=ValueType.FLOAT), ++ Field(name="acc_rate", dtype=ValueType.FLOAT), ++ Field(name="avg_daily_trips", dtype=ValueType.INT64), + ], + online=True, + batch_source=driver_hourly_stats, +``` +--- +To write these changes out, you can run the same command with the `--write` flag: +```bash +$ feast repo-upgrade --write +``` + +You should see the same output, but also see the changes reflected in your feature repo on disk. \ No newline at end of file