Skip to content

Commit

Permalink
Add readme generation as pre-commit hook
Browse files Browse the repository at this point in the history
  • Loading branch information
RobbeSneyders committed Oct 4, 2023
1 parent 8c86a54 commit c566b3b
Show file tree
Hide file tree
Showing 20 changed files with 47 additions and 109 deletions.
12 changes: 10 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ repos:
"--exit-non-zero-on-fix",
]


- repo: https://github.com/PyCQA/bandit
rev: 1.7.4
hooks:
Expand Down Expand Up @@ -55,4 +54,13 @@ repos:
- types-jsonschema
- types-PyYAML
- types-requests
pass_filenames: false
pass_filenames: false

- repo: local
hooks:
- id: generate_component_readmes
name: Generate component READMEs
language: python
entry: python scripts/component_readme/generate_readme.py
files: ^components/.*/fondant_component.yaml
additional_dependencies: ["fondant"]
2 changes: 1 addition & 1 deletion components/caption_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,4 @@ pipeline.add_op(caption_images_op, dependencies=[...]) #Add previous component
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
6 changes: 0 additions & 6 deletions components/embed_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,3 @@ embed_images_op = ComponentOp.from_registry(
pipeline.add_op(embed_images_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
6 changes: 0 additions & 6 deletions components/filter_comments/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,3 @@ filter_comments_op = ComponentOp.from_registry(
pipeline.add_op(filter_comments_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
10 changes: 2 additions & 8 deletions components/filter_image_resolution/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| min_image_dim | int | Minimum image dimension | None |
| max_aspect_ratio | float | Maximum aspect ratio | None |
| min_image_dim | int | Minimum image dimension | / |
| max_aspect_ratio | float | Maximum aspect ratio | / |

### Usage

Expand All @@ -40,9 +40,3 @@ filter_image_resolution_op = ComponentOp.from_registry(
pipeline.add_op(filter_image_resolution_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
12 changes: 3 additions & 9 deletions components/filter_line_length/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| avg_line_length_threshold | int | Threshold for average line length to filter on | None |
| max_line_length_threshold | int | Threshold for maximum line length to filter on | None |
| alphanum_fraction_threshold | float | Alphanum fraction to filter on | None |
| avg_line_length_threshold | int | Threshold for average line length to filter on | / |
| max_line_length_threshold | int | Threshold for maximum line length to filter on | / |
| alphanum_fraction_threshold | float | Alphanum fraction to filter on | / |

### Usage

Expand All @@ -43,9 +43,3 @@ filter_line_length_op = ComponentOp.from_registry(
pipeline.add_op(filter_line_length_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
6 changes: 0 additions & 6 deletions components/image_cropping/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,3 @@ image_cropping_op = ComponentOp.from_registry(
pipeline.add_op(image_cropping_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
6 changes: 0 additions & 6 deletions components/image_resolution_extraction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,3 @@ image_resolution_extraction_op = ComponentOp.from_registry(
pipeline.add_op(image_resolution_extraction_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
2 changes: 1 addition & 1 deletion components/language_filter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,4 @@ pipeline.add_op(language_filter_op, dependencies=[...]) #Add previous component
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
4 changes: 2 additions & 2 deletions components/load_from_files/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| directory_uri | str | Local or remote path to the directory containing the files | None |
| directory_uri | str | Local or remote path to the directory containing the files | / |

### Usage

Expand All @@ -45,4 +45,4 @@ pipeline.add_op(load_from_files_op, dependencies=[...]) #Add previous component
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
10 changes: 2 additions & 8 deletions components/load_from_hf_hub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| dataset_name | str | Name of dataset on the hub | None |
| column_name_mapping | dict | Mapping of the consumed hub dataset to fondant column names | None |
| dataset_name | str | Name of dataset on the hub | / |
| column_name_mapping | dict | Mapping of the consumed hub dataset to fondant column names | / |
| image_column_names | list | Optional argument, a list containing the original image column names in case the dataset on the hub contains them. Used to format the image from HF hub format to a byte string. | None |
| n_rows_to_load | int | Optional argument that defines the number of rows to load. Useful for testing pipeline runs on a small scale | None |
| index_column | str | Column to set index to in the load component, if not specified a default globally unique index will be set | None |
Expand All @@ -45,9 +45,3 @@ load_from_hf_hub_op = ComponentOp.from_registry(
pipeline.add_op(load_from_hf_hub_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
8 changes: 1 addition & 7 deletions components/load_from_parquet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| dataset_uri | str | The remote path to the parquet file/folder containing the dataset | None |
| dataset_uri | str | The remote path to the parquet file/folder containing the dataset | / |
| column_name_mapping | dict | Mapping of the consumed dataset | None |
| n_rows_to_load | int | Optional argument that defines the number of rows to load. Useful for testing pipeline runs on a small scale | None |
| index_column | str | Column to set index to in the load component, if not specified a default globally unique index will be set | None |
Expand All @@ -43,9 +43,3 @@ load_from_parquet_op = ComponentOp.from_registry(
pipeline.add_op(load_from_parquet_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
2 changes: 1 addition & 1 deletion components/minhash_generator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ pipeline.add_op(minhash_generator_op, dependencies=[...]) #Add previous compone
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
6 changes: 0 additions & 6 deletions components/pii_redaction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,3 @@ pii_redaction_op = ComponentOp.from_registry(
pipeline.add_op(pii_redaction_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
8 changes: 1 addition & 7 deletions components/prompt_based_laion_retrieval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| num_images | int | Number of images to retrieve for each prompt | None |
| num_images | int | Number of images to retrieve for each prompt | / |
| aesthetic_score | int | Aesthetic embedding to add to the query embedding, between 0 and 9 (higher is prettier). | 9 |
| aesthetic_weight | float | Weight of the aesthetic embedding when added to the query, between 0 and 1 | 0.5 |
| url | str | The url of the backend clip retrieval service, defaults to the public service | https://knn.laion.ai/knn-service |
Expand All @@ -50,9 +50,3 @@ prompt_based_laion_retrieval_op = ComponentOp.from_registry(
pipeline.add_op(prompt_based_laion_retrieval_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
8 changes: 1 addition & 7 deletions components/segment_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The component takes the following arguments to alter its behavior:
| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| model_id | str | id of the model on the Hugging Face hub | openmmlab/upernet-convnext-small |
| batch_size | int | batch size to use | None |
| batch_size | int | batch size to use | / |

### Usage

Expand All @@ -41,9 +41,3 @@ segment_images_op = ComponentOp.from_registry(
pipeline.add_op(segment_images_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
6 changes: 3 additions & 3 deletions components/text_length_filter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| min_characters_length | int | Minimum number of characters | None |
| min_words_length | int | Mininum number of words | None |
| min_characters_length | int | Minimum number of characters | / |
| min_words_length | int | Mininum number of words | / |

### Usage

Expand All @@ -44,4 +44,4 @@ pipeline.add_op(text_length_filter_op, dependencies=[...]) #Add previous compon
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
12 changes: 6 additions & 6 deletions components/text_normalization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| remove_additional_whitespaces | bool | If true remove all additional whitespace, tabs. | None |
| apply_nfc | bool | If true apply nfc normalization | None |
| normalize_lines | bool | If true analyze documents line-by-line and apply various rules to discard or edit lines. Used to removed common patterns in webpages, e.g. counter | None |
| do_lowercase | bool | If true apply lowercasing | None |
| remove_punctuation | str | If true punctuation will be removed | None |
| remove_additional_whitespaces | bool | If true remove all additional whitespace, tabs. | / |
| apply_nfc | bool | If true apply nfc normalization | / |
| normalize_lines | bool | If true analyze documents line-by-line and apply various rules to discard or edit lines. Used to removed common patterns in webpages, e.g. counter | / |
| do_lowercase | bool | If true apply lowercasing | / |
| remove_punctuation | str | If true punctuation will be removed | / |

### Usage

Expand Down Expand Up @@ -62,4 +62,4 @@ pipeline.add_op(text_normalization_op, dependencies=[...]) #Add previous compon
You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
```
12 changes: 3 additions & 9 deletions components/write_to_hf_hub/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| hf_token | str | The hugging face token used to write to the hub | None |
| username | str | The username under which to upload the dataset | None |
| dataset_name | str | The name of the dataset to upload | None |
| hf_token | str | The hugging face token used to write to the hub | / |
| username | str | The username under which to upload the dataset | / |
| dataset_name | str | The name of the dataset to upload | / |
| image_column_names | list | A list containing the image column names. Used to format to image to HF hub format | None |
| column_name_mapping | dict | Mapping of the consumed fondant column names to the written hub column names | None |

Expand All @@ -45,9 +45,3 @@ write_to_hf_hub_op = ComponentOp.from_registry(
pipeline.add_op(write_to_hf_hub_op, dependencies=[...]) #Add previous component as dependency
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
18 changes: 10 additions & 8 deletions scripts/component_readme/generate_readme.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
import argparse
import ast
from pathlib import Path

import jinja2
from fondant.component_spec import ComponentSpec


def read_component_spec(component_dir: Path) -> ComponentSpec:
return ComponentSpec.from_file(component_dir / "fondant_component.yaml")
def read_component_spec(component_spec_path: Path) -> ComponentSpec:
return ComponentSpec.from_file(component_spec_path)


def generate_readme(component_spec: ComponentSpec, *, component_dir: Path) -> str:
Expand Down Expand Up @@ -35,17 +34,20 @@ def write_readme(readme: str, component_dir: Path) -> None:
f.write(readme)


def main(component_dir: Path):
component_spec = read_component_spec(component_dir)
def main(component_spec_path: Path):
component_spec = read_component_spec(component_spec_path)
component_dir = component_spec_path.parent
readme = generate_readme(component_spec, component_dir=component_dir)
write_readme(readme, component_dir=component_dir)


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-d", "--component_dir",
parser.add_argument("component_specs",
nargs="+",
type=Path,
help="Path to the component to generate a readme for")
help="Path to the component spec to generate a readme from")
args = parser.parse_args()

main(args.component_dir)
for spec in args.component_specs:
main(spec)

0 comments on commit c566b3b

Please sign in to comment.