Skip to content

Commit

Permalink
Added a dockerfile for building and testing the package (#195)
Browse files Browse the repository at this point in the history
* Added a dockerfile for building and testing the package

This dockerfile can be used to setup and run the tests in the Python Deequ package. This way, we do not need to install any dependencies in our local workspaces. Right now, it only builds against Spark version 3.3. Will be adding other versions in a future PR.

Verified that the docker run output is the same as that of the PR workflow.

* Locked Poetry version to 1.7.1
  • Loading branch information
rdsharma26 authored Apr 10, 2024
1 parent 7fd0cff commit 4bb727b
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
SPARK_VERSION: ${{matrix.PYSPARK_VERSION}}
run: |
pip install --upgrade pip
pip install poetry
pip install poetry==1.7.1
poetry install
poetry add pyspark==$SPARK_VERSION
poetry run python -m pytest -s tests
27 changes: 27 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update
RUN apt-get install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get install -y python3.8 python3-pip
RUN apt-get install -y python3.8-distutils
RUN apt-get install -y openjdk-11-jdk

# Update symlink to point to latest
RUN rm /usr/bin/python3 && ln -s /usr/bin/python3.8 /usr/bin/python3
RUN python3 --version
RUN pip3 --version
RUN java -version
RUN pip install poetry==1.7.1

COPY . /python-deequ
WORKDIR python-deequ

RUN poetry lock --no-update
RUN poetry install
RUN poetry add pyspark==3.3

ENV SPARK_VERSION=3.3
CMD poetry run python -m pytest -s tests
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,4 +244,14 @@ Take a look at tests in `tests/dataquality` and `tests/jobs`

```bash
$ poetry run pytest
```
```

## Running Tests Locally (Docker)

If you have issues installing the dependencies listed above, another way to run the tests and verify your changes is through Docker. There is a Dockerfile that will install the required dependencies and run the tests in a container.

```
docker build . -t spark-3.3-docker-test
docker run spark-3.3-docker-test
```

0 comments on commit 4bb727b

Please sign in to comment.