ml6team · RobbeSneyders · Feb 7, 2024 · Feb 5, 2024 · Feb 5, 2024 · Feb 6, 2024
diff --git a/README.md b/README.md
@@ -50,7 +50,7 @@ There are 5 components in total, these are:
 
 > ⚠️ **Prerequisites:**
 >
-> - A Python version between 3.8 and 3.10 installed on your system.
+> - A Python version between 3.8 and 3.11 installed on your system.
 > - Docker installed and configured on your system.
 > - A GPU is recommended to run the model-based components of the pipeline.
 

diff --git a/requirements.txt b/requirements.txt
@@ -1,2 +1,2 @@
-fondant==0.8.0
+fondant==0.10.1
 notebook==7.0.6
diff --git a/src/README.md b/src/README.md
@@ -6,7 +6,8 @@ This example demonstrates an end-to-end fondant pipeline to collect and process
 
 There are 5 components in total, these are:
 
-1. [**Prompt Generation**](components/generate_prompts): This component generates a set of seed prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.
+1. [**Prompt Generation**](components/generate_prompts.py): This component generates a set of seed 
+   prompts using a rule-based approach that combines various rooms and styles together, like “a photo of a {room_type} in the style of {style_type}”. As input, it takes in a list of room types (bedroom, kitchen, laundry room, ..), a list of room styles (contemporary, minimalist, art deco, ...) and a list of prefixes (comfortable, luxurious, simple). These lists can be easily adapted to other domains. The output of this component is a list of seed prompts.
 
 2. [**Image URL Retrieval**](https://github.com/ml6team/fondant/tree/main/components/prompt_based_laion_retrieval): This component retrieves images from the [LAION-5B](https://laion.ai/blog/laion-5b/) dataset based on the seed prompts. The retrieval itself is done based on CLIP embeddings similarity between the prompt sentences and the captions in the LAION dataset. This component doesn’t return the actual images yet, only the URLs. The next component in the pipeline will then download these images.
 
@@ -16,6 +17,9 @@ There are 5 components in total, these are:
 
 5. [**Add Segmentation Maps**](https://github.com/ml6team/fondant/tree/main/components/segment_images): This component segments the images using the [UPerNet](https://huggingface.co/docs/transformers/model_doc/upernet) model. Each segmentation map contains segments of 150 possible categories listed [here](https://huggingface.co/openmmlab/upernet-convnext-small/blob/main/config.json#L110).
 
+6. [**Write to Hugging Face Hub**](https://github.com/ml6team/fondant/tree/main/components/write_to_hf_hub): 
+   Write the results as a dataset to the Hugging Face Hub.
+
 ## Environment
 
 Please check that the following prerequisites are:
@@ -47,18 +51,15 @@ For more details on the pipeline creation, you can have a look at the
 
 ## Running the pipeline
 
-This pipeline will generate prompts, retrieve urls of matching images in the laion dataset, download them 
+This pipeline will generate prompts, retrieve urls of matching images in the LAION dataset, download them 
 and generate corresponding captions and segmentations. If you added the optional `write_to_hf_hub` 
 component, it will write the resulting dataset to the HF hub.
 
-Fondant provides multiple runners to run our pipeline:
-- A Docker runner for local execution
-- A Vertex AI runner for managed execution on Google Cloud
-- A Kubeflow Pipelines runner for execution anywhere
-
+Fondant provides different runners to run our pipeline.
 Here we will use the local runner, which utilizes Docker compose under the hood.
+For an overview of all runners, check the [Fondant documentation](https://fondant.ai/en/latest/pipeline/#running-a-pipeline).
 
-The runner will first build the custom component and download the reusable components from the 
+The runner will first download the reusable components from the 
 component hub. Afterwards, you will see the components execute one by one.
 
 ```shell
@@ -78,11 +79,13 @@ fondant explore -b data_dir
 To create your own dataset, you can update the generate_prompts component to generate prompts 
 describing the images you want.
 
-Make the changes you in the 
-[./components/generate_prompts/src/main.py](./components/generate_prompts/src/main.py) file.
+The component is implemented as a 
+[lightweight component](https://fondant.ai/en/latest/components/lightweight_components/)
+at [./components/generate_prompts/__init__.py](./components/generate_prompts/__init__.py).
+You can update it to create your own prompts.
 
 If you now re-run your pipeline, the new changes will be picked up and Fondant will automatically 
-re-build the component with the changes included.
+execute the component with the changes included.
 
 ```shell
 fondant run local pipeline.py
@@ -98,5 +101,5 @@ fondant explore -b data_dir
 ## Scaling up
 
 If you're happy with your dataset, it's time to scale up. Check 
-[our documentation](https://fondant.ai/en/latest/pipeline/#compiling-and-running-a-pipeline) for 
+[our documentation](https://fondant.ai/en/latest/components/lightweight_components/) for 
 more information about the available runners.
diff --git a/src/components/__init__.py b/src/components/__init__.py
diff --git a/src/components/generate_prompts.py b/src/components/generate_prompts.py
@@ -0,0 +1,125 @@
+"""
+This component generates a set of initial prompts that will be used to retrieve images
+from the LAION-5B dataset.
+"""
+import typing as t
+
+import dask.dataframe as dd
+import pandas as pd
+import pyarrow as pa
+
+from fondant.component import DaskLoadComponent
+from fondant.pipeline import lightweight_component
+
+
+@lightweight_component(produces={"prompt": pa.string()})
+class GeneratePromptsComponent(DaskLoadComponent):
+    interior_styles = [
+        "art deco",
+        "bauhaus",
+        "bouclé",
+        "maximalist",
+        "brutalist",
+        "coastal",
+        "minimalist",
+        "rustic",
+        "hollywood regency",
+        "midcentury modern",
+        "modern organic",
+        "contemporary",
+        "modern",
+        "scandinavian",
+        "eclectic",
+        "bohemiam",
+        "industrial",
+        "traditional",
+        "transitional",
+        "farmhouse",
+        "country",
+        "asian",
+        "mediterranean",
+        "rustic",
+        "southwestern",
+        "coastal",
+    ]
+
+    interior_prefix = [
+        "comfortable",
+        "luxurious",
+        "simple",
+    ]
+
+    rooms = [
+        "Bathroom",
+        "Living room",
+        "Hotel room",
+        "Lobby",
+        "Entrance hall",
+        "Kitchen",
+        "Family room",
+        "Master bedroom",
+        "Bedroom",
+        "Kids bedroom",
+        "Laundry room",
+        "Guest room",
+        "Home office",
+        "Library room",
+        "Playroom",
+        "Home Theater room",
+        "Gym room",
+        "Basement room",
+        "Garage",
+        "Walk-in closet",
+        "Pantry",
+        "Gaming room",
+        "Attic",
+        "Sunroom",
+        "Storage room",
+        "Study room",
+        "Dining room",
+        "Loft",
+        "Studio room",
+        "Appartement",
+    ]
+
+    def __init__(self, *, n_rows_to_load: t.Optional[int]) -> None:
+        """
+        Generate a set of initial prompts that will be used to retrieve images from the
+        LAION-5B dataset.
+
+        Args:
+            n_rows_to_load: Optional argument that defines the number of rows to load.
+                Useful for testing pipeline runs on a small scale
+        """
+        self.n_rows_to_load = n_rows_to_load
+
+    @staticmethod
+    def make_interior_prompt(room: str, prefix: str, style: str) -> str:
+        """Generate a prompt for the interior design model.
+
+        Args:
+            room: room name
+            prefix: prefix for the room
+            style: interior style
+
+        Returns:
+            prompt for the interior design model
+        """
+        return f"{prefix.lower()} {room.lower()}, {style.lower()} interior design"
+
+    def load(self) -> dd.DataFrame:
+        import itertools
+
+        room_tuples = itertools.product(
+            self.rooms, self.interior_prefix, self.interior_styles
+        )
+        prompts = map(lambda x: self.make_interior_prompt(*x), room_tuples)
+
+        pandas_df = pd.DataFrame(prompts, columns=["prompt"])
+
+        if self.n_rows_to_load:
+            pandas_df = pandas_df.head(self.n_rows_to_load)
+
+        df = dd.from_pandas(pandas_df, npartitions=1)
+
+        return df
diff --git a/src/components/generate_prompts/Dockerfile b/src/components/generate_prompts/Dockerfile
diff --git a/src/components/generate_prompts/README.md b/src/components/generate_prompts/README.md
diff --git a/src/components/generate_prompts/fondant_component.yaml b/src/components/generate_prompts/fondant_component.yaml
diff --git a/src/components/generate_prompts/requirements.txt b/src/components/generate_prompts/requirements.txt