Skip to content

Commit

Permalink
Build dev images on main (#236)
Browse files Browse the repository at this point in the history
Fixes #220 

This PR adds a github actions workflow that builds our reusable images
on every merge to main. This should mitigate some of the issues we're
having with outdated fondant versions in the components.

Each image is tagged with the associated commit and we move the `dev`
tag to always point to the latest images build from main. This is now
the default version included in the `fondant_component.yaml` of the
reusable components. Just note that your runner might not recognize that
the image associated with the `dev` tag has changed, and you might have
to force it to pull the latest version.

The workflow uses [registry
caching](https://docs.docker.com/build/cache/backends/registry/) to
cache the docker images, which brings down the build time from
[~40m](https://github.com/ml6team/fondant/actions/runs/5398796173) to
[~20m](https://github.com/ml6team/fondant/actions/runs/5399309438). I
opted for registry caching as local github action caching is limited to
10GB, which would be too limited for the amount and size of images we're
building.

We can look into some additional improvements to this:
- [ ] **Direct the runners to always pull images**
This will introduce a small delay, but might be worth it to prevent hard
to detect issues because of outdated images.
- [ ] **Build fondant from local path**
The workflow introduced in this PR still requires changes to be merged
to main before they are built into the images. Especially while
developing on fondant itself, this might still be lacking in usability,
as we'd like to be able to test our local changes. This is possible by
passing an additional
[`--build-context`](https://docs.docker.com/engine/reference/commandline/buildx_build/#build-context),
but we'd have to integrate this into the local runner to really be
useful, without impacting the user experience.
- [ ] **Cache pip**
We can also leverage pip caching on top of / instead of docker caching.
This wouldn't add a lot of benefit on top of the current approach, as
the time spent on downloading packages and building wheels is negligible
compared to the time spent communicating with the registry cache. It
could be useful in allowing us to include the fondant dependency in the
`requirements.txt` again, which would be a bit more transparent. Then
the "Install dependencies" layer would no longer be cached by docker,
since we change the fondant version on each build, but the other
dependencies would be cached using pip. I believe this would still
increase the build time though as the cached packages still need to be
installed. To leverage the pip cache across containers, we can use a
[cache
mount](https://docs.docker.com/build/guide/mounts/#add-a-cache-mount).
  • Loading branch information
RobbeSneyders authored Jul 3, 2023
1 parent 7389212 commit d6b8775
Show file tree
Hide file tree
Showing 44 changed files with 178 additions and 71 deletions.
29 changes: 29 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Build dev images

on:
push:
branches:
- main

jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Set buildx alias
run: docker buildx install

- name: Build components
run: ./scripts/build_components.sh --cache -t $GITHUB_SHA -t dev
9 changes: 7 additions & 2 deletions components/caption_images/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt ./
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/caption_images/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Caption images
description: Component that captions images using a model from the Hugging Face hub
image: ghcr.io/ml6team/caption_images:latest
image: ghcr.io/ml6team/caption_images:dev

consumes:
images:
Expand Down
1 change: 0 additions & 1 deletion components/caption_images/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
git+https://github.com/ml6team/fondant@main
gcsfs==2023.4.0
Pillow==9.4.0
torch==2.0.1
Expand Down
9 changes: 7 additions & 2 deletions components/download_images/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/download_images/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Download images
description: Component that downloads images based on URLs
image: ghcr.io/ml6team/download_images:latest
image: ghcr.io/ml6team/download_images:dev

consumes:
images:
Expand Down
1 change: 0 additions & 1 deletion components/download_images/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
git+https://github.com/ml6team/fondant@main
albumentations==1.3.0
opencv-python-headless>=4.5.5.62,<5
gcsfs==2023.4.0
9 changes: 7 additions & 2 deletions components/embedding_based_laion_retrieval/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: LAION retrieval
description: A component that retrieves image URLs from LAION-5B based on a set of CLIP embeddings
image: ghcr.io/ml6team/embedding_based_laion_retrieval:latest
image: ghcr.io/ml6team/embedding_based_laion_retrieval:dev

consumes:
embeddings:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
git+https://github.com/ml6team/fondant@main
gcsfs==2023.4.0
9 changes: 7 additions & 2 deletions components/filter_comments/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt ./
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/filter_comments/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Filter comments
description: Component that filters code based on the code to comment ratio
image: ghcr.io/ml6team/filter_comments:latest
image: ghcr.io/ml6team/filter_comments:dev

consumes:
code:
Expand Down
1 change: 0 additions & 1 deletion components/filter_comments/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
gcsfs==2023.4.00
9 changes: 7 additions & 2 deletions components/filter_image_resolution/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
1 change: 0 additions & 1 deletion components/filter_image_resolution/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
gcsfs==2023.4.0
9 changes: 7 additions & 2 deletions components/filter_line_length/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt ./
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/filter_line_length/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Filter line length
description: Component that filters code based on line length
image: ghcr.io/ml6team/filter_line_length:latest
image: ghcr.io/ml6team/filter_line_length:dev

consumes:
code:
Expand Down
1 change: 0 additions & 1 deletion components/filter_line_length/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
gcsfs==2023.4.00
9 changes: 7 additions & 2 deletions components/image_cropping/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/image_cropping/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Image cropping
description: Component that removes single-colored borders around images and crops them appropriately
image: ghcr.io/ml6team/image_cropping:latest
image: ghcr.io/ml6team/image_cropping:dev

consumes:
images:
Expand Down
1 change: 0 additions & 1 deletion components/image_cropping/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
gcsfs==2023.4.0
Pillow==9.4.0
9 changes: 7 additions & 2 deletions components/image_embedding/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/image_embedding/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Image embedding
description: Component that embeds images using CLIP
image: ghcr.io/ml6team/image_embedding:latest
image: ghcr.io/ml6team/image_embedding:dev

consumes:
images:
Expand Down
1 change: 0 additions & 1 deletion components/image_embedding/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
git+https://github.com/ml6team/fondant@main
gcsfs==2023.4.0
Pillow==9.4.0
torch==2.0.0
Expand Down
9 changes: 7 additions & 2 deletions components/image_resolution_extraction/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Image resolution extraction
description: Component that extracts image resolution data from the images
image: ghcr.io/ml6team/image_resolution_extraction:latest
image: ghcr.io/ml6team/image_resolution_extraction:dev

consumes:
images:
Expand Down
1 change: 0 additions & 1 deletion components/image_resolution_extraction/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
gcsfs==2023.4.0
imagesize==1.4.1
9 changes: 7 additions & 2 deletions components/load_from_hf_hub/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
2 changes: 1 addition & 1 deletion components/load_from_hf_hub/fondant_component.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: Load from hub
description: Component that loads a dataset from the hub
image: ghcr.io/ml6team/load_from_hf_hub:latest
image: ghcr.io/ml6team/load_from_hf_hub:dev

produces:
dummy_variable: #TODO: fill in here
Expand Down
1 change: 0 additions & 1 deletion components/load_from_hf_hub/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
huggingface_hub==0.14.1
git+https://github.com/ml6team/fondant@main
pyarrow>=7.0
Pillow==9.4.0
gcsfs==2023.4.0
9 changes: 7 additions & 2 deletions components/pii_redaction/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
FROM --platform=linux/amd64 python:3.8-slim

## System dependencies
# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# install requirements
# Install requirements
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component/src

Expand Down
Loading

0 comments on commit d6b8775

Please sign in to comment.