This repository contains an example job which emulates writing/reading events to Kafka with Protobuf SerDe using Spark Streaming.
- Clone the repository (or open it in Intellij IDEA)
- Generate Protobuf specs via:
sbt clean compile
- In Intellij IDEA mark the
target/scala-2.12/src_managed/main
as generated sources root. Important: un-mark nested main/scalapb as generated sources root, otherwise you'll run into issues while compiling the project with Intellij. - Configure Python environment and Databricks CLI
- Install and configure
dbx
:
pip install dbx
dbx configure --profile-name=<your-databricks-cli-profile-name>
- Provide required properties in the
.env
file:
INSTANCE_PROFILE_NAME="your-instance-profile" # instance profile to access the MSK instance
DATABRICKS_CONFIG_PROFILE="your-databricks-cli-profile-name"
KAFKA_BOOTSTRAP_SERVERS_TO_SECRETS="" # Kafka Bootstrap Servers string
- Create the secret scope:
make create-scope
- Add the secrets:
make add-secrets
- Create a new instance pool in your databricks environment with name
dbx-pool
. - To deploy and launch the job in dev mode (the job won't be created or updated, ephemeral job run will be used):
make dev-launch-generator make dev-launch-processor
- To deploy the jobs so they'll be reflected in the Jobs UI:
make jobs-deploy
Local testing suite requires sbt
and Docker
, since we're using testcontainers
to run Kafka environment for unit tests.
Please find test example in src/test/scala/net/renarde/dbx/demos/app/UnifiedAppTest.scala
.