Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update release announcements #471

Merged
merged 2 commits into from
Sep 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions docs/announcements/CC_25M_community.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
# 25 million Creative Commons image dataset released

[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up
large-scale data processing by making containerized components reusable across pipelines &
[Fondant](https://fondant.ai) is an open-source project that aims to simplify and speed up
large-scale data processing by making containerized components reusable across pipelines &
execution environments, shared within the community.

A current challenge for generative AI is compliance with copyright laws. For this reason,
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative
A current challenge for generative AI is compliance with copyright laws. For this reason,
Fondant has developed a data-processing pipeline to create a 500-million dataset of Creative
Commons images to train a latent diffusion image generation model that respects copyright. Today,
as a first step, we are releasing a 25-million sample dataset and invite the open source
as a first step, we are releasing a 25-million sample dataset and invite the open source
community to collaborate on further refinement steps.

Fondant offers tools to download, explore and process the data. The current example pipeline
includes a component for downloading the urls, a simple file type filter, one for downloading
the images and one for deduplicating the urls. Additional processing components which could be
Fondant offers tools to download, explore and process the data. The current example pipeline
includes a component for downloading the urls and one for downloading the images.

Creating custom pipelines for specific purposes requires different building blocks. Fondant
pipelines can mix reusable components and custom components.

![sample_pipeline](https://github.com/ml6team/fondant/blob/main/docs/art/announcements/sample_pipeline_cc25.png?raw=true)

Additional processing components which could be
contributed include, in order of priority:

* Image-based deduplication
Expand All @@ -25,6 +31,6 @@ contributed include, in order of priority:
* AI generated image detection
* Any components that you propose to develop

The Fondant team also invites contributors to the core framework and is looking for feedback on
the framework’s usability and for suggestions for improvement. Contact us at
The Fondant team also invites contributors to the core framework and is looking for feedback on
the framework’s usability and for suggestions for improvement. Contact us at
[info@fondant.ai](mailto:info@fondant.ai) and/or join our [discord](https://discord.gg/HnTdWhydGp).
Binary file added docs/art/announcements/sample_pipeline_cc25.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/art/guides/component.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/overrides/main.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{% block announce %}
<p style="text-align: center">
We released a 25 million Creative Commons image dataset!
<a href="announcements/CC_25M_community/"
<a href="https://fondant.ai/en/latest/announcements/CC_25M_community/"
style="color: white; text-decoration: underline">Read more</a>
</p>
{% endblock %}