Skip to content

renardeinside/dbx-kafka-protobuf-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example Spark Streaming Job with Amazon MSK & Protobuf on Databricks

This repository contains an example job which emulates writing/reading events to Kafka with Protobuf SerDe using Spark Streaming.

  • Clone the repository (or open it in Intellij IDEA)
  • Generate Protobuf specs via:
sbt clean compile
  • In Intellij IDEA mark the target/scala-2.12/src_managed/main as generated sources root. Important: un-mark nested main/scalapb as generated sources root, otherwise you'll run into issues while compiling the project with Intellij.
  • Configure Python environment and Databricks CLI
  • Install and configure dbx:
pip install dbx
dbx configure --profile-name=<your-databricks-cli-profile-name>
  • Provide required properties in the .env file:
INSTANCE_PROFILE_NAME="your-instance-profile" # instance profile to access the MSK instance
DATABRICKS_CONFIG_PROFILE="your-databricks-cli-profile-name"
KAFKA_BOOTSTRAP_SERVERS_TO_SECRETS="" # Kafka Bootstrap Servers string
  • Create the secret scope:
make create-scope
  • Add the secrets:
make add-secrets
  • Create a new instance pool in your databricks environment with name dbx-pool.
  • To deploy and launch the job in dev mode (the job won't be created or updated, ephemeral job run will be used):
make dev-launch-generator
make dev-launch-processor
  • To deploy the jobs so they'll be reflected in the Jobs UI:
make jobs-deploy

Local testing suite requires sbt and Docker, since we're using testcontainers to run Kafka environment for unit tests.

Please find test example in src/test/scala/net/renarde/dbx/demos/app/UnifiedAppTest.scala.

About

Sample code for working with Kafka & Protobuf in Databricks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published