A local dev environment to follow the examples in "Learning Spark" 2nd edition by Damji et al.
The dev environment provides PySpark and Jupyter installations to work with the code examples. The Docker Compose file provides PostgreSQL and Kafka (with the corresponding Web-UIs) to try out connecting to them from Spark.
To connect to PostgreSQL, the corresponding JDBC driver must be downloaded from
https://jdbc.postgresql.org/download/, renamed to posgresql.jar
, and placed to
the jars
directory.
asdf local python 3.11.2 # or whatever 3.11.x version you have installed
POETRY_VIRTUALENVS_PREFER_ACTIVE_PYTHON=true poetry install
To start Jupyter Lab run
poetry run jupyter-lab
Then open a notebook in the notebooks
directory or create your own.
To stop Jupyter Lab press Ctrl-C
twice.
To start PySpark shell run
poetry run pyspark
or start a terminal session inside Jupyter Lab and run pyspark
.
To bring up PostgreSQL and Kafka run
docker compose up --detach
See docker-compose.yml
for the corresponding ports and credentials of the services.
To stop PostgreSQL and Kafka run
docker compose down --remove-orphans --volumes