-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argilla 2.4: Curate Hub Datasets with Human Feedback—No Code Needed #2448
base: main
Are you sure you want to change the base?
Changes from all commits
c0db6e8
4b505b7
fd6c99d
7084bb8
2193342
001b717
5a46af5
a36fa27
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,55 @@ | ||||||
--- | ||||||
title: "Argilla 2.4: Easily Add Human Feedback to Hub Datasets—No Code Required" | ||||||
thumbnail: /blog/assets/argilla-ui-hub/thumbnail.png | ||||||
authors: | ||||||
- user: nataliaElv | ||||||
- user: burtenshaw | ||||||
- user: dvilasuero | ||||||
--- | ||||||
|
||||||
# Argilla 2.4: Easily Add Human Feedback to Hub Datasets—No Code Required | ||||||
|
||||||
We are incredibly excited to share the most impactful feature since Argilla joined Hugging Face: you can start your AI dataset projects without code and from any Hub dataset. | ||||||
|
||||||
Using Argilla’s UI, you can easily import a dataset from the Hugging Face Hub, define questions, and start collecting human feedback. | ||||||
|
||||||
> [!NOTE] | ||||||
> Not familiar with Argilla? Argilla is a free, open-source data-centric tool. Using Argilla, AI developers and domain experts can collaborate and build high-quality datasets. Argilla is part of the Hugging Face family and fully integrated with the Hub. Want to know more? Here’s an [intro blog post](https://huggingface.co/blog/dvilasuero/argilla-2-0). | ||||||
|
||||||
Why is this new feature important to you and the community? | ||||||
|
||||||
- The Hugging Face hub contains 230k datasets you can use as a foundation for your AI project. | ||||||
- It simplifies collecting human feedback from the Hugging Face community or specialized teams. | ||||||
- It democratizes dataset creation for users with extensive knowledge about a specific domain who are unsure about writing code. | ||||||
|
||||||
## Use cases | ||||||
|
||||||
This new feature democratizes building high-quality datasets on the Hub: | ||||||
|
||||||
- You have just published an open dataset and want the community to contribute: import it into a public Argilla Space and share the URL with the world! | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't you think "dataset publisher" sounds like a job title? I'm not sure how many people would feel represented by that. |
||||||
- If you want to start annotating a new dataset from scratch, upload a CSV to the Hub, import it into your Argilla Space, and start labeling! | ||||||
- If you want to curate an existing Hub dataset for fine-tuning or evaluating your model, import the dataset into an Argilla Space and start curating! | ||||||
|
||||||
- If you want to improve an existing Hub dataset to benefit the community, import it into an Argilla Space and start giving feedback! | ||||||
|
||||||
|
||||||
## How it works | ||||||
|
||||||
First, you need to deploy Argilla. The recommended way is to deploy on Spaces [following this guide](https://docs.argilla.io/latest/getting_started/quickstart/). The default deployment comes with Hugging Face OAuth enabled, meaning your Space will be open to any Hub users to annotate your dataset. OAuth is perfect for use cases when you want the community to contribute to your dataset if you're going to keep the annotation restricted to you and other collaborators,[check this other guide](https://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/). | ||||||
|
||||||
[screen recording] | ||||||
|
||||||
Once Argilla is running, sign in and click the “Import dataset from Hugging Face” button on the Home page. You can start with one of our example datasets or input the repo id of the dataset you want to use. Argilla automatically suggests an initial configuration based on the dataset’s features, so you don’t need to start from scratch. | ||||||
|
||||||
> [!NOTE] | ||||||
> In this first version, the Hub dataset must be public. If you are interested in support for private datasets, we’d love to hear from you on [GitHub](https://github.com/argilla-io/argilla). | ||||||
|
||||||
The dataset’s columns will be mapped to fields and questions in Argilla. Fields include the data that you want feedback on, like text, chats, or images. Questions are the feedback you want to collect, like labels, ratings, rankings, or text. If you need, you can add and configure questions or remove unnecessary fields. All of the changes that you make will be previewed in real time, so you can see how your changes affect the dataset. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Once you’re happy with the result, click “Create dataset” to import the dataset. Now you’re ready to give feedback! | ||||||
|
||||||
You can try this for yourself by following the [quickstart guide](https://docs.argilla.io/latest/getting_started/quickstart/). It takes under 5 minutes! | ||||||
|
||||||
This new workflow streamlines the import of datasets from the Hub, but you can still [import datasets using Argilla’s Python SDK](https://docs.argilla.io/latest/how_to_guides/dataset/) if you need further customization. | ||||||
nataliaElv marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
We’d love to hear your thoughts and first experiences. Let us know on GitHub or the HF Discord! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this title matches well with the thumbnail @dvsrepo? I see a big semantic difference between "creating datasets" and "adding human feedback to datasets".