We use RFCs and GitHub issues to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our RFCs in the Feast Google Drive or our GitHub issues.
Please communicate your ideas through a GitHub issue or through our Slack Channel before starting development.
Please submit a PR to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.
PRs that are submitted by the general public need to be identified as ok-to-test
. Once enabled, Prow will run a range of tests to verify the submission, after which community members will help to review the pull request.
{% hint style="success" %} Please sign the Google CLA in order to have your code merged into the Feast repository. {% endhint %}
The following guide will help you quickly run Feast in your local machine.
The main components of Feast are:
-
Feast Core: Handles feature registration, starts and manages ingestion jobs and ensures that Feast internal metadata is consistent.
-
Feast Ingestion Jobs: Subscribes to streams of FeatureRows and writes these as feature
values to registered databases (online, historical) that can be read by Feast Serving.
-
Feast Serving: Service that handles requests for features values, either online or batch.
The following software is required for Feast development
- Java SE Development Kit 11
- Python version 3.6 (or above) and pip
- Maven version 3.6.x
Additionally, grpc_cli is useful for debugging and quick testing of gRPC endpoints.
The following components/services are required to develop Feast:
- Feast Core: Requires PostgreSQL (version 11 and above) to store state, and requires a Kafka (tested on version 2.x) setup to allow for ingestion of FeatureRows.
- Feast Serving: Requires Redis (tested on version 5.x).
These services should be running before starting development. The following snippet will start the services using Docker.
# Start Postgres
docker run --name postgres --rm -it -d --net host -e POSTGRES_DB=postgres -e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=password postgres:12-alpine
# Start Redis
docker run --name redis --rm -it --net host -d redis:5-alpine
# Start Zookeeper (needed by Kafka)
docker run --rm \
--net=host \
--name=zookeeper \
--env=ZOOKEEPER_CLIENT_PORT=2181 \
--detach confluentinc/cp-zookeeper:5.2.1
# Start Kafka
docker run --rm \
--net=host \
--name=kafka \
--env=KAFKA_ZOOKEEPER_CONNECT=localhost:2181 \
--env=KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
--env=KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
--detach confluentinc/cp-kafka:5.2.1
$ mvn test
Note: integration suite isn't yet separated from unit.
$ mvn verify
The core
and serving
modules are Spring Boot applications. These may be run as usual for the Spring Boot Maven plugin:
$ mvn --projects core spring-boot:run
# Or for short:
$ mvn -pl core spring-boot:run
Note that you should execute mvn
from the Feast repository root directory, as there are intermodule dependencies that Maven will not resolve if you cd
to subdirectories to run.
Compiling and running tests in IntelliJ should work as usual.
Running the Spring Boot apps may work out of the box in IDEA Ultimate, which has built-in support for Spring Boot projects, but the Community Edition needs a bit of help:
The Spring Boot Maven plugin automatically puts dependencies with provided
scope on the runtime classpath when using spring-boot:run
, such as its embedded Tomcat server. The "Play" buttons in the gutter or right-click menu of a main()
method do not do this.
A solution to this is:
- Open
View > Tool Windows > Maven
- Drill down to e.g.
Feast Core > Plugins > spring-boot:run
, right-click andCreate 'feast-core [spring-boot'…
- In the dialog that pops up, check the
Resolve Workspace artifacts
box - Click
OK
. You should now be able to select this run configuration for the Play button in the main toolbar, keyboard shortcuts, etc.
The following section is a quick walk-through to test whether your local Feast deployment is functional for development purposes.
2.4.1 Assumptions
-
PostgreSQL is running in
localhost:5432
and has a database calledpostgres
whichcan be accessed with credentials user
postgres
and passwordpassword
. Different database configurations can be supplied here (/core/src/main/resources/application.yml
) -
Redis is running locally and accessible from
localhost:6379
-
(optional) The local environment has been authentication with Google Cloud Platform and has full access to BigQuery. This is only necessary for BigQuery testing/development.
git clone https://github.com/gojek/feast.git && cd feast && \
export FEAST_HOME_DIR=$(pwd)
To run Feast Core locally using Maven:
# Feast Core can be configured from the following .yml file
# $FEAST_HOME_DIR/core/src/main/resources/application.yml
mvn --projects core spring-boot:run
Test whether Feast Core is running
grpc_cli call localhost:6565 ListStores ''
The output should list no stores since no Feast Serving has registered its stores to Feast Core:
connecting to localhost:6565
Rpc succeeded with OK status
Feast Serving is configured through the $FEAST_HOME_DIR/serving/src/main/resources/application.yml
. Each Serving deployment must be configured with a store. The default store is Redis (used for online serving).
The configuration for this default store is located in a separate .yml
file. The default location is $FEAST_HOME_DIR/serving/sample_redis_config.yml
:
name: serving
type: REDIS
redis_config:
host: localhost
port: 6379
subscriptions:
- name: "*"
project: "*"
version: "*"
Once Feast Serving is started, it will register its store with Feast Core (by name) and start to subscribe to a feature sets based on its subscription.
Start Feast Serving GRPC server on localhost:6566 with store name serving
mvn --projects serving spring-boot:run
Test connectivity to Feast Serving
grpc_cli call localhost:6566 GetFeastServingInfo ''
connecting to localhost:6566
version: "0.4.2-SNAPSHOT"
type: FEAST_SERVING_TYPE_ONLINE
Rpc succeeded with OK status
Test Feast Core to see whether it is aware of the Feast Serving deployment
grpc_cli call localhost:6565 ListStores ''
connecting to localhost:6565
store {
name: "serving"
type: REDIS
subscriptions {
name: "*"
version: "*"
project: "*"
}
redis_config {
host: "localhost"
port: 6379
}
}
Rpc succeeded with OK status
In order to use BigQuery as a historical store, it is necessary to start Feast Serving with a different store type.
Copy $FEAST_HOME_DIR/serving/sample_redis_config.yml
to the following location $FEAST_HOME_DIR/serving/my_bigquery_config.yml
and update the configuration as below:
name: bigquery
type: BIGQUERY
bigquery_config:
project_id: YOUR_GCP_PROJECT_ID
dataset_id: YOUR_GCP_DATASET
subscriptions:
- name: "*"
version: "*"
project: "*"
Then inside serving/src/main/resources/application.yml
modify the following key feast.store.config-path
to point to the new store configuration.
After making these changes, restart Feast Serving:
mvn --projects serving spring-boot:run
You should see two stores registered:
store {
name: "serving"
type: REDIS
subscriptions {
name: "*"
version: "*"
project: "*"
}
redis_config {
host: "localhost"
port: 6379
}
}
store {
name: "bigquery"
type: BIGQUERY
subscriptions {
name: "*"
version: "*"
project: "*"
}
bigquery_config {
project_id: "my_project"
dataset_id: "my_bq_dataset"
}
}
Before registering a new FeatureSet, a project is required.
grpc_cli call localhost:6565 CreateProject '
name: "your_project_name"
'
When a feature set is successfully registered, Feast Core will start an ingestion job that listens for new features in the feature set.
{% hint style="info" %}
Note that Feast currently only supports source of type KAFKA
, so you must have access to a running Kafka broker to register a FeatureSet successfully. It is possible to omit the source
from a Feature Set, but Feast Core will still use Kafka behind the scenes, it is simply abstracted away from the user.
{% endhint %}
Create a new FeatureSet in Feast by sending a request to Feast Core:
# Example of registering a new driver feature set
# Note the source value, it assumes that you have access to a Kafka broker
# running on localhost:9092
grpc_cli call localhost:6565 ApplyFeatureSet '
feature_set {
spec {
project: "your_project_name"
name: "driver"
version: 1
entities {
name: "driver_id"
value_type: INT64
}
features {
name: "city"
value_type: STRING
}
source {
type: KAFKA
kafka_source_config {
bootstrap_servers: "localhost:9092"
topic: "your-kafka-topic"
}
}
}
}
'
Verify that the FeatureSet has been registered correctly.
# To check that the FeatureSet has been registered correctly.
# You should also see logs from Feast Core of the ingestion job being started
grpc_cli call localhost:6565 GetFeatureSet '
project: "your_project_name"
name: "driver"
'
Or alternatively, list all feature sets
grpc_cli call localhost:6565 ListFeatureSets '
filter {
project: "your_project_name"
feature_set_name: "driver"
feature_set_version: "1"
}
'
# Produce FeatureRow messages to Kafka so it will be ingested by Feast
# and written to the registered stores.
# Make sure the value here is the topic assigned to the feature set
# ... producer.send("feast-driver-features" ...)
#
# Install Python SDK to help writing FeatureRow messages to Kafka
cd $FEAST_HOMEDIR/sdk/python
pip3 install -e .
pip3 install pendulum
# Produce FeatureRow messages to Kafka so it will be ingested by Feast
# and written to the corresponding store.
# Make sure the value here is the topic assigned to the feature set
# ... producer.send("feast-test_feature_set-features" ...)
python3 - <<EOF
import logging
import pendulum
from google.protobuf.timestamp_pb2 import Timestamp
from kafka import KafkaProducer
from feast.types.FeatureRow_pb2 import FeatureRow
from feast.types.Field_pb2 import Field
from feast.types.Value_pb2 import Value, Int32List, BytesList
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
producer = KafkaProducer(bootstrap_servers="localhost:9092")
row = FeatureRow()
fields = [
Field(name="driver_id", value=Value(int64_val=1234)),
Field(name="city", value=Value(string_val="JAKARTA")),
]
row.fields.MergeFrom(fields)
timestamp = Timestamp()
timestamp.FromJsonString(
pendulum.now("UTC").to_iso8601_string()
)
row.event_timestamp.CopyFrom(timestamp)
# The format is [PROJECT_NAME]/[FEATURE_NAME]:[VERSION]
row.feature_set = "your_project_name/driver:1"
producer.send("your-kafka-topic", row.SerializeToString())
producer.flush()
logger.info(row)
EOF
Ensure that Feast Serving returns results for the feature value for the specific driver
grpc_cli call localhost:6566 GetOnlineFeatures '
features {
project: "your_project_name"
name: "city"
version: 1
max_age {
seconds: 3600
}
}
entity_rows {
fields {
key: "driver_id"
value {
int64_val: 1234
}
}
}
'
field_values {
fields {
key: "driver_id"
value {
int64_val: 1234
}
}
fields {
key: "your_project_name/city:1"
value {
string_val: "JAKARTA"
}
}
}
If you have made it to this point successfully you should have a functioning Feast deployment, at the very least using the Apache Beam DirectRunner for ingestion jobs and Redis for online serving.
It is important to note that most of the functionality demonstrated above is already available in a more abstracted form in the Python SDK (Feast management, data ingestion, feature retrieval) and the Java/Go SDKs (feature retrieval). However, it is useful to understand these internals from a development standpoint.
We conform to the Google Java Style Guide. Maven can helpfully take care of that for you before you commit:
$ mvn spotless:apply
Formatting will be checked automatically during the verify
phase. This can be skipped temporarily:
$ mvn spotless:check # Check is automatic upon `mvn verify`
$ mvn verify -Dspotless.check.skip
If you're using IntelliJ, you can import these code style settings if you'd like to use the IDE's reformat function as you develop.
Make sure you apply go fmt
.
We use Python Black to format our Python code prior to submission.
Feast uses semantic versioning.
- Major and minor releases are cut from the
master
branch. - Whenever a major or minor release is cut, a branch is created for that release. This is called a "release branch". For example if
0.3
is released frommaster
, a branch namedv0.3-branch
is created. - You can create a release branch via the GitHub UI.
- From this branch a git tag is created for the specific release, for example
v0.3.0
. - Tagging a release will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc).
- A release branch should be substantially feature complete with respect to the intended release. Code that is committed to
master
may be merged or cherry-picked on to a release branch, but code that is directly committed to the release branch should be solely applicable to that release (and should not be committed back to master). - In general, unless you're committing code that only applies to the release stream (for example, temporary hotfixes, backported security fixes, or image hashes), you should commit to
master
and then merge or cherry-pick to the release branch. - It is also important to update the CHANGELOG.md when submitting a new release. This can be in the same PR or a separate PR.
- Finally it is also important to create a GitHub release which includes a summary of important changes as well as any artifacts associated with that release.