Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update use case for Fondant 0.10.1 with lightweight components #9

Merged
merged 3 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ There are 5 components in total, these are:

> ⚠️ **Prerequisites:**
>
> - A Python version between 3.8 and 3.10 installed on your system.
> - A Python version between 3.8 and 3.11 installed on your system.
> - Docker installed and configured on your system.
> - A GPU is recommended to run the model-based components of the pipeline.

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
fondant==0.8.0
fondant==0.10.1
notebook==7.0.6
27 changes: 15 additions & 12 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ This example demonstrates an end-to-end fondant pipeline to collect and process

There are 5 components in total, these are:

1. [**Prompt Generation**](components/generate_prompts): This component generates a set of seed prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.
1. [**Prompt Generation**](components/generate_prompts.py): This component generates a set of seed
prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.

2. [**Image URL Retrieval**](https://github.com/ml6team/fondant/tree/main/components/prompt_based_laion_retrieval): This component retrieves images from the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset based on the seed prompts. The retrieval itself is done based on CLIP embeddings similarity between the prompt sentences and the captions in the LAION dataset. This component doesn’t return the actual images yet, only the URLs. The next component in the pipeline will then download these images.

Expand All @@ -16,6 +17,9 @@ There are 5 components in total, these are:

5. [**Add Segmentation Maps**](https://github.com/ml6team/fondant/tree/main/components/segment_images): This component segments the images using the [UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet) model. Each segmentation map contains segments of 150 possible categories listed [here](https://huggingface.co/openmmlab/upernet-convnext-small/blob/main/config.json#L110).

6. [**Write to Hugging Face Hub**](https://github.com/ml6team/fondant/tree/main/components/write_to_hf_hub):
Write the results as a dataset to the Hugging Face Hub.

## Environment

Please check that the following prerequisites are:
Expand Down Expand Up @@ -47,18 +51,15 @@ For more details on the pipeline creation, you can have a look at the

## Running the pipeline

This pipeline will generate prompts, retrieve urls of matching images in the laion dataset, download them
This pipeline will generate prompts, retrieve urls of matching images in the LAION dataset, download them
and generate corresponding captions and segmentations. If you added the optional `write_to_hf_hub`
component, it will write the resulting dataset to the HF hub.

Fondant provides multiple runners to run our pipeline:
- A Docker runner for local execution
- A Vertex AI runner for managed execution on Google Cloud
- A Kubeflow Pipelines runner for execution anywhere

Fondant provides different runners to run our pipeline.
Here we will use the local runner, which utilizes Docker compose under the hood.
For an overview of all runners, check the [Fondant documentation](https://fondant.ai/en/latest/pipeline/#running-a-pipeline).

The runner will first build the custom component and download the reusable components from the
The runner will first download the reusable components from the
component hub. Afterwards, you will see the components execute one by one.

```shell
Expand All @@ -78,11 +79,13 @@ fondant explore -b data_dir
To create your own dataset, you can update the generate_prompts component to generate prompts
describing the images you want.

Make the changes you in the
[./components/generate_prompts/src/main.py](./components/generate_prompts/src/main.py) file.
The component is implemented as a
[lightweight component](https://fondant.ai/en/latest/components/lightweight_components/)
at [./components/generate_prompts/__init__.py](./components/generate_prompts/__init__.py).
You can update it to create your own prompts.

If you now re-run your pipeline, the new changes will be picked up and Fondant will automatically
re-build the component with the changes included.
execute the component with the changes included.

```shell
fondant run local pipeline.py
Expand All @@ -98,5 +101,5 @@ fondant explore -b data_dir
## Scaling up

If you're happy with your dataset, it's time to scale up. Check
[our documentation](https://fondant.ai/en/latest/pipeline/#compiling-and-running-a-pipeline) for
[our documentation](https://fondant.ai/en/latest/components/lightweight_components/) for
more information about the available runners.
Empty file added src/components/__init__.py
Empty file.
125 changes: 125 additions & 0 deletions src/components/generate_prompts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
"""
This component generates a set of initial prompts that will be used to retrieve images
from the LAION-5B dataset.
"""
import typing as t

import dask.dataframe as dd
import pandas as pd
import pyarrow as pa

from fondant.component import DaskLoadComponent
from fondant.pipeline import lightweight_component


@lightweight_component(produces={"prompt": pa.string()})
class GeneratePromptsComponent(DaskLoadComponent):
interior_styles = [
"art deco",
"bauhaus",
"bouclé",
"maximalist",
"brutalist",
"coastal",
"minimalist",
"rustic",
"hollywood regency",
"midcentury modern",
"modern organic",
"contemporary",
"modern",
"scandinavian",
"eclectic",
"bohemiam",
"industrial",
"traditional",
"transitional",
"farmhouse",
"country",
"asian",
"mediterranean",
"rustic",
"southwestern",
"coastal",
]

interior_prefix = [
"comfortable",
"luxurious",
"simple",
]

rooms = [
"Bathroom",
"Living room",
"Hotel room",
"Lobby",
"Entrance hall",
"Kitchen",
"Family room",
"Master bedroom",
"Bedroom",
"Kids bedroom",
"Laundry room",
"Guest room",
"Home office",
"Library room",
"Playroom",
"Home Theater room",
"Gym room",
"Basement room",
"Garage",
"Walk-in closet",
"Pantry",
"Gaming room",
"Attic",
"Sunroom",
"Storage room",
"Study room",
"Dining room",
"Loft",
"Studio room",
"Appartement",
]

def __init__(self, *, n_rows_to_load: t.Optional[int]) -> None:
"""
Generate a set of initial prompts that will be used to retrieve images from the
LAION-5B dataset.

Args:
n_rows_to_load: Optional argument that defines the number of rows to load.
Useful for testing pipeline runs on a small scale
"""
self.n_rows_to_load = n_rows_to_load

@staticmethod
def make_interior_prompt(room: str, prefix: str, style: str) -> str:
"""Generate a prompt for the interior design model.

Args:
room: room name
prefix: prefix for the room
style: interior style

Returns:
prompt for the interior design model
"""
return f"{prefix.lower()} {room.lower()}, {style.lower()} interior design"

def load(self) -> dd.DataFrame:
import itertools

room_tuples = itertools.product(
self.rooms, self.interior_prefix, self.interior_styles
)
prompts = map(lambda x: self.make_interior_prompt(*x), room_tuples)

pandas_df = pd.DataFrame(prompts, columns=["prompt"])

if self.n_rows_to_load:
pandas_df = pandas_df.head(self.n_rows_to_load)

df = dd.from_pandas(pandas_df, npartitions=1)

return df
20 changes: 0 additions & 20 deletions src/components/generate_prompts/Dockerfile

This file was deleted.

40 changes: 0 additions & 40 deletions src/components/generate_prompts/README.md

This file was deleted.

13 changes: 0 additions & 13 deletions src/components/generate_prompts/fondant_component.yaml

This file was deleted.

1 change: 0 additions & 1 deletion src/components/generate_prompts/requirements.txt

This file was deleted.

Loading
Loading