0.6.0
Highlights
-
Vertex AI is now supported as a backend for pipeline execution.
Simply run
fondant run vertex <pipeline.py>
to submit your pipeline.
Runfondant run vertex --help
to see the possible configuration options. -
The reusable components are now available on DockerHub under the
fndnt
organization.DockerHub is supported more broadly than Github container registry which we were using before.
-
Previously executed components are now cached when re-executed with the same arguments.
- This makes it easier to iterate on development of down-stream components
- This allows you to resume failed pipelines from their failed step
-
Added
fondant build
command which let's you build fondant components easilyRun
fondant build <component_dir>
. Checkfondant build -h
for options.
The command will also update the image reference in thefondant_component.yaml
to the newly built one. -
We migrated from KfP v1 to KfP v2. This means:
- We now benefit from the latest KfP developments
- We compile fondant pipelines to the IR YAML format, which is supported by other execution engines such as Vertex
- You need a KfP v2 cluster to run fondant pipelines
Fixes
- Fix data explorer for usage on Windows
- Fix propagation of
client_kwargs
argument to configure Dask Client
Components
- Every reusable component now has a clear README describing its usage
- Add
load_from_parquet
component to load parquet files as input data - Add
embed_text
component to embed documents and other text - Add
chunk_text
component to chunk documents into passages - Add
index_weaviate
component to index data in a weaviate vector store - Fix issue with mixed type ids in LAION retrieval components
- Improve success rate of
download_images
component - Fix OOM issues for inference components using GPU
- Limit data read by
load_from_hub
component to used columns
Detailed changes
- Add contribution segment by @GeorgesLorre in #463
- Update sample pipeline by @mrchtr in #464
- Update project description by @RobbeSneyders in #465
- Disable caching in the image retrieval sample pipeline by @mrchtr in #467
- Improve download images logs by @PhilippeMoussalli in #466
- Add CC-25M announcement to docs by @RobbeSneyders in #468
- Update release announcements by @mrchtr in #471
- Add dataset link to press release by @mrchtr in #472
- Create load from parquet by @PhilippeMoussalli in #474
- Fix caching writes by @PhilippeMoussalli in #469
- Add caching dependency by @PhilippeMoussalli in #479
- Add memory request and limit to components by @PhilippeMoussalli in #482
- Improve hit rate of download images component by @RobbeSneyders in #470
- Cast id to string laion by @PhilippeMoussalli in #485
- Bugfix partitioning by @PhilippeMoussalli in #478
- Generate READMEs for all components using a script by @RobbeSneyders in #484
- Add component hub doc page by @RobbeSneyders in #487
- explorer small fix by @Hakimovich99 in #481
- Optimize GPU components by @PhilippeMoussalli in #489
- Update Pillow to 10.0.1 to fix security issues by @RobbeSneyders in #493
- Update documentation regarding feedback by @mrchtr in #473
- Restructure-cli by @PhilippeMoussalli in #488
- Add empty requirements.txt to load_from_parquet component by @RobbeSneyders in #504
- Use s3 client instead of http to access common crawl by @mrchtr in #501
- Fix run CLI by @RobbeSneyders in #507
- Migrate to KfpV2 by @GeorgesLorre in #477
- Remove abstract component test by @mrchtr in #510
- Only keep columns in produces by @PhilippeMoussalli in #490
- Run black on components in pre-commit by @RobbeSneyders in #511
- Run bandit on components by @RobbeSneyders in #513
- Move container registry to DockerHub by @RobbeSneyders in #514
- Update component docs by @PhilippeMoussalli in #516
- Vertex cli by @PhilippeMoussalli in #519
- Refactor compile method for kfp and vertex by @PhilippeMoussalli in #522
- Modify arg default by @PhilippeMoussalli in #524
- Propagate
client_kwargs
argument and lower extract_images python version by @RobbeSneyders in #525 - Revert fsspec changes by @mrchtr in #523
- Add resource limits for Vertex by @RobbeSneyders in #529
- Update vertex and general docs by @PhilippeMoussalli in #526
- Component/generate embeddings by @tillwenke in #520
- Add fondant build command by @RobbeSneyders in #527
- Fix explorer build script for DockerHub by @RobbeSneyders in #531
- Chunker component by @PhilippeMoussalli in #528
- Update text embedding component by @PhilippeMoussalli in #532
- Add IndexWeaviate component by @tillwenke in #521
- Build command: raise errors when pushing and make tag optional by @RobbeSneyders in #533
- Update component readmes by @RobbeSneyders in #538
- Add network argument to vertex runner by @RobbeSneyders in #537
New Contributors
- @Hakimovich99 made their first contribution in #481
Full Changelog: 0.5.0...0.6.0