Skip to content

Latest commit

 

History

History
503 lines (374 loc) · 15.8 KB

contributing.md

File metadata and controls

503 lines (374 loc) · 15.8 KB

Contributing

1. Contribution process

We use RFCs and GitHub issues to communicate development ideas. The simplest way to contribute to Feast is to leave comments in our RFCs in the Feast Google Drive or our GitHub issues.

Please communicate your ideas through a GitHub issue or through our Slack Channel before starting development.

Please submit a PR to the master branch of the Feast repository once you are ready to submit your contribution. Code submission to Feast (including submission from project maintainers) require review and approval from maintainers or code owners.

PRs that are submitted by the general public need to be identified as ok-to-test. Once enabled, Prow will run a range of tests to verify the submission, after which community members will help to review the pull request.

{% hint style="success" %} Please sign the Google CLA in order to have your code merged into the Feast repository. {% endhint %}

2. Development guide

2.1 Overview

The following guide will help you quickly run Feast in your local machine.

The main components of Feast are:

  • Feast Core: Handles feature registration, starts and manages ingestion jobs and ensures that Feast internal metadata is consistent.

  • Feast Ingestion Jobs: Subscribes to streams of FeatureRows and writes these as feature

    values to registered databases (online, historical) that can be read by Feast Serving.

  • Feast Serving: Service that handles requests for features values, either online or batch.

2.2 Requirements

2.2.1 Development environment

The following software is required for Feast development

  • Java SE Development Kit 11
  • Python version 3.6 (or above) and pip
  • Maven version 3.6.x

Additionally, grpc_cli is useful for debugging and quick testing of gRPC endpoints.

2.2.2 Services

The following components/services are required to develop Feast:

  • Feast Core: Requires PostgreSQL (version 11 and above) to store state, and requires a Kafka (tested on version 2.x) setup to allow for ingestion of FeatureRows.
  • Feast Serving: Requires Redis (tested on version 5.x).

These services should be running before starting development. The following snippet will start the services using Docker.

# Start Postgres
docker run --name postgres --rm -it -d --net host -e POSTGRES_DB=postgres -e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=password postgres:12-alpine

# Start Redis
docker run --name redis --rm -it --net host -d redis:5-alpine

# Start Zookeeper (needed by Kafka)
docker run --rm \
  --net=host \
  --name=zookeeper \
  --env=ZOOKEEPER_CLIENT_PORT=2181 \
  --detach confluentinc/cp-zookeeper:5.2.1

# Start Kafka
docker run --rm \
  --net=host \
  --name=kafka \
  --env=KAFKA_ZOOKEEPER_CONNECT=localhost:2181 \
  --env=KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092 \
  --env=KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
  --detach confluentinc/cp-kafka:5.2.1

2.3 Testing and development

2.3.1 Running unit tests

$ mvn test

2.3.2 Running integration tests

Note: integration suite isn't yet separated from unit.

$ mvn verify

2.3.3 Running components locally

The core and serving modules are Spring Boot applications. These may be run as usual for the Spring Boot Maven plugin:

$ mvn --projects core spring-boot:run

# Or for short:
$ mvn -pl core spring-boot:run

Note that you should execute mvn from the Feast repository root directory, as there are intermodule dependencies that Maven will not resolve if you cd to subdirectories to run.

2.3.4 Running from IntelliJ

Compiling and running tests in IntelliJ should work as usual.

Running the Spring Boot apps may work out of the box in IDEA Ultimate, which has built-in support for Spring Boot projects, but the Community Edition needs a bit of help:

The Spring Boot Maven plugin automatically puts dependencies with provided scope on the runtime classpath when using spring-boot:run, such as its embedded Tomcat server. The "Play" buttons in the gutter or right-click menu of a main() method do not do this.

A solution to this is:

  1. Open View > Tool Windows > Maven
  2. Drill down to e.g. Feast Core > Plugins > spring-boot:run, right-click and Create 'feast-core [spring-boot'…
  3. In the dialog that pops up, check the Resolve Workspace artifacts box
  4. Click OK. You should now be able to select this run configuration for the Play button in the main toolbar, keyboard shortcuts, etc.

2.4 Validating your setup

The following section is a quick walk-through to test whether your local Feast deployment is functional for development purposes.

2.4.1 Assumptions

  • PostgreSQL is running in localhost:5432 and has a database called postgres which

    can be accessed with credentials user postgres and password password. Different database configurations can be supplied here (/core/src/main/resources/application.yml)

  • Redis is running locally and accessible from localhost:6379

  • (optional) The local environment has been authentication with Google Cloud Platform and has full access to BigQuery. This is only necessary for BigQuery testing/development.

2.4.2 Clone Feast

git clone https://github.com/gojek/feast.git && cd feast && \
export FEAST_HOME_DIR=$(pwd)

2.4.3 Starting Feast Core

To run Feast Core locally using Maven:

# Feast Core can be configured from the following .yml file
# $FEAST_HOME_DIR/core/src/main/resources/application.yml
mvn --projects core spring-boot:run

Test whether Feast Core is running

grpc_cli call localhost:6565 ListStores ''

The output should list no stores since no Feast Serving has registered its stores to Feast Core:

connecting to localhost:6565

Rpc succeeded with OK status

2.4.4 Starting Feast Serving

Feast Serving is configured through the $FEAST_HOME_DIR/serving/src/main/resources/application.yml. Each Serving deployment must be configured with a store. The default store is Redis (used for online serving).

The configuration for this default store is located in a separate .yml file. The default location is $FEAST_HOME_DIR/serving/sample_redis_config.yml:

name: serving
type: REDIS
redis_config:
  host: localhost
  port: 6379
subscriptions:
  - name: "*"
    project: "*"
    version: "*"

Once Feast Serving is started, it will register its store with Feast Core (by name) and start to subscribe to a feature sets based on its subscription.

Start Feast Serving GRPC server on localhost:6566 with store name serving

mvn --projects serving spring-boot:run

Test connectivity to Feast Serving

grpc_cli call localhost:6566 GetFeastServingInfo ''
connecting to localhost:6566
version: "0.4.2-SNAPSHOT"
type: FEAST_SERVING_TYPE_ONLINE

Rpc succeeded with OK status

Test Feast Core to see whether it is aware of the Feast Serving deployment

grpc_cli call localhost:6565 ListStores ''
connecting to localhost:6565
store {
  name: "serving"
  type: REDIS
  subscriptions {
    name: "*"
    version: "*"
    project: "*"
  }
  redis_config {
    host: "localhost"
    port: 6379
  }
}

Rpc succeeded with OK status

In order to use BigQuery as a historical store, it is necessary to start Feast Serving with a different store type.

Copy $FEAST_HOME_DIR/serving/sample_redis_config.yml to the following location $FEAST_HOME_DIR/serving/my_bigquery_config.yml and update the configuration as below:

name: bigquery
type: BIGQUERY
bigquery_config:
  project_id: YOUR_GCP_PROJECT_ID
  dataset_id: YOUR_GCP_DATASET
subscriptions:
  - name: "*"
    version: "*"
    project: "*"

Then inside serving/src/main/resources/application.yml modify the following key feast.store.config-path to point to the new store configuration.

After making these changes, restart Feast Serving:

mvn --projects serving spring-boot:run

You should see two stores registered:

store {
  name: "serving"
  type: REDIS
  subscriptions {
    name: "*"
    version: "*"
    project: "*"
  }
  redis_config {
    host: "localhost"
    port: 6379
  }
}
store {
  name: "bigquery"
  type: BIGQUERY
  subscriptions {
    name: "*"
    version: "*"
    project: "*"
  }
  bigquery_config {
    project_id: "my_project"
    dataset_id: "my_bq_dataset"
  }
}

2.4.5 Registering a FeatureSet

Before registering a new FeatureSet, a project is required.

grpc_cli call localhost:6565 CreateProject '
  name: "your_project_name"
'

When a feature set is successfully registered, Feast Core will start an ingestion job that listens for new features in the feature set.

{% hint style="info" %} Note that Feast currently only supports source of type KAFKA, so you must have access to a running Kafka broker to register a FeatureSet successfully. It is possible to omit the source from a Feature Set, but Feast Core will still use Kafka behind the scenes, it is simply abstracted away from the user. {% endhint %}

Create a new FeatureSet in Feast by sending a request to Feast Core:

# Example of registering a new driver feature set
# Note the source value, it assumes that you have access to a Kafka broker
# running on localhost:9092

grpc_cli call localhost:6565 ApplyFeatureSet '
feature_set {
  spec {
    project: "your_project_name"
    name: "driver"
    version: 1

    entities {
      name: "driver_id"
      value_type: INT64
    }

    features {
      name: "city"
      value_type: STRING
    }

    source {
      type: KAFKA
      kafka_source_config {
        bootstrap_servers: "localhost:9092"
        topic: "your-kafka-topic"
      }
    }
  }
}
'

Verify that the FeatureSet has been registered correctly.

# To check that the FeatureSet has been registered correctly.
# You should also see logs from Feast Core of the ingestion job being started
grpc_cli call localhost:6565 GetFeatureSet '
  project: "your_project_name"
  name: "driver"
'

Or alternatively, list all feature sets

grpc_cli call localhost:6565 ListFeatureSets '
  filter {
    project: "your_project_name"
    feature_set_name: "driver"
    feature_set_version: "1"
  }
'

2.4.6 Ingestion and Population of Feature Values

# Produce FeatureRow messages to Kafka so it will be ingested by Feast
# and written to the registered stores.
# Make sure the value here is the topic assigned to the feature set
# ... producer.send("feast-driver-features" ...)
#
# Install Python SDK to help writing FeatureRow messages to Kafka
cd $FEAST_HOMEDIR/sdk/python
pip3 install -e .
pip3 install pendulum

# Produce FeatureRow messages to Kafka so it will be ingested by Feast
# and written to the corresponding store.
# Make sure the value here is the topic assigned to the feature set
# ... producer.send("feast-test_feature_set-features" ...)
python3 - <<EOF
import logging
import pendulum
from google.protobuf.timestamp_pb2 import Timestamp
from kafka import KafkaProducer
from feast.types.FeatureRow_pb2 import FeatureRow
from feast.types.Field_pb2 import Field
from feast.types.Value_pb2 import Value, Int32List, BytesList

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

producer = KafkaProducer(bootstrap_servers="localhost:9092")

row = FeatureRow()

fields = [
    Field(name="driver_id", value=Value(int64_val=1234)),
    Field(name="city", value=Value(string_val="JAKARTA")),
]
row.fields.MergeFrom(fields)

timestamp = Timestamp()
timestamp.FromJsonString(
    pendulum.now("UTC").to_iso8601_string()
)
row.event_timestamp.CopyFrom(timestamp)

# The format is [PROJECT_NAME]/[FEATURE_NAME]:[VERSION]
row.feature_set = "your_project_name/driver:1"

producer.send("your-kafka-topic", row.SerializeToString())
producer.flush()
logger.info(row)
EOF

2.4.7 Retrieval from Feast Serving

Ensure that Feast Serving returns results for the feature value for the specific driver

grpc_cli call localhost:6566 GetOnlineFeatures '
features {
  project: "your_project_name"
  name: "city"
  version: 1
  max_age {
    seconds: 3600
  }
}
entity_rows {
  fields {
    key: "driver_id"
    value {
      int64_val: 1234
    }
  }
}
'
field_values {
  fields {
    key: "driver_id"
    value {
      int64_val: 1234
    }
  }
  fields {
    key: "your_project_name/city:1"
    value {
      string_val: "JAKARTA"
    }
  }
}

2.4.8 Summary

If you have made it to this point successfully you should have a functioning Feast deployment, at the very least using the Apache Beam DirectRunner for ingestion jobs and Redis for online serving.

It is important to note that most of the functionality demonstrated above is already available in a more abstracted form in the Python SDK (Feast management, data ingestion, feature retrieval) and the Java/Go SDKs (feature retrieval). However, it is useful to understand these internals from a development standpoint.

3. Style guide

3.1 Java

We conform to the Google Java Style Guide. Maven can helpfully take care of that for you before you commit:

$ mvn spotless:apply

Formatting will be checked automatically during the verify phase. This can be skipped temporarily:

$ mvn spotless:check  # Check is automatic upon `mvn verify`
$ mvn verify -Dspotless.check.skip

If you're using IntelliJ, you can import these code style settings if you'd like to use the IDE's reformat function as you develop.

3.2 Go

Make sure you apply go fmt.

3.3 Python

We use Python Black to format our Python code prior to submission.

4. Release process

Feast uses semantic versioning.

  • Major and minor releases are cut from the master branch.
  • Whenever a major or minor release is cut, a branch is created for that release. This is called a "release branch". For example if 0.3 is released from master, a branch named v0.3-branch is created.
  • You can create a release branch via the GitHub UI.
  • From this branch a git tag is created for the specific release, for example v0.3.0.
  • Tagging a release will automatically build and push the relevant artifacts to their repositories or package managers (docker images, Python wheels, etc).
  • A release branch should be substantially feature complete with respect to the intended release. Code that is committed to master may be merged or cherry-picked on to a release branch, but code that is directly committed to the release branch should be solely applicable to that release (and should not be committed back to master).
  • In general, unless you're committing code that only applies to the release stream (for example, temporary hotfixes, backported security fixes, or image hashes), you should commit to master and then merge or cherry-pick to the release branch.
  • It is also important to update the CHANGELOG.md when submitting a new release. This can be in the same PR or a separate PR.
  • Finally it is also important to create a GitHub release which includes a summary of important changes as well as any artifacts associated with that release.