Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argilla 2.4: Curate Hub Datasets with Human Feedback—No Code Needed #2448

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions _blog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4931,3 +4931,15 @@
- hub
- partnerships
- security

- local: argilla-ui-hub
title: "Argilla 2.4: Easily Add Human Feedback to Hub Datasets—No Code Required"
author: nataliaElv
thumbnail: /blog/assets/argilla-ui-hub/thumbnail.png
date: November 4, 2024
tags:
- hub
- spaces
- datasets
- argilla
- human feedback
55 changes: 55 additions & 0 deletions argilla-ui-hub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "Argilla 2.4: Easily Add Human Feedback to Hub Datasets—No Code Required"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this title matches well with the thumbnail @dvsrepo? I see a big semantic difference between "creating datasets" and "adding human feedback to datasets".

thumbnail: /blog/assets/argilla-ui-hub/thumbnail.png
authors:
- user: nataliaElv
- user: burtenshaw
- user: dvilasuero
---

# Argilla 2.4: Easily Add Human Feedback to Hub Datasets—No Code Required

We are incredibly excited to share the most impactful feature since Argilla joined Hugging Face: you can start your AI dataset projects without code and from any Hub dataset.

Using Argilla’s UI, you can easily import a dataset from the Hugging Face Hub, define questions, and start collecting human feedback.

> [!NOTE]
> Not familiar with Argilla? Argilla is a free, open-source data-centric tool. Using Argilla, AI developers and domain experts can collaborate and build high-quality datasets. Argilla is part of the Hugging Face family and fully integrated with the Hub. Want to know more? Here’s an [intro blog post](https://huggingface.co/blog/dvilasuero/argilla-2-0).

Why is this new feature important to you and the community?

- The Hugging Face hub contains 230k datasets you can use as a foundation for your AI project.
- It simplifies collecting human feedback from the Hugging Face community or specialized teams.
- It democratizes dataset creation for users with extensive knowledge about a specific domain who are unsure about writing code.

## Use cases

This new feature democratizes building high-quality datasets on the Hub:

- You have just published an open dataset and want the community to contribute: import it into a public Argilla Space and share the URL with the world!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- You have just published an open dataset and want the community to contribute: import it into a public Argilla Space and share the URL with the world!
- If you are a dataset publisher and want the community to contribute, import it into a public Argilla Space and share the URL with the world!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think "dataset publisher" sounds like a job title? I'm not sure how many people would feel represented by that.

- If you want to start annotating a new dataset from scratch, upload a CSV to the Hub, import it into your Argilla Space, and start labeling!
- If you want to curate an existing Hub dataset for fine-tuning or evaluating your model, import the dataset into an Argilla Space and start curating!

- If you want to improve an existing Hub dataset to benefit the community, import it into an Argilla Space and start giving feedback!


## How it works

First, you need to deploy Argilla. The recommended way is to deploy on Spaces [following this guide](https://docs.argilla.io/latest/getting_started/quickstart/). The default deployment comes with Hugging Face OAuth enabled, meaning your Space will be open to any Hub users to annotate your dataset. OAuth is perfect for use cases when you want the community to contribute to your dataset if you're going to keep the annotation restricted to you and other collaborators,[check this other guide](https://docs.argilla.io/latest/getting_started/how-to-configure-argilla-on-huggingface/).

[screen recording]

Once Argilla is running, sign in and click the “Import dataset from Hugging Face” button on the Home page. You can start with one of our example datasets or input the repo id of the dataset you want to use. Argilla automatically suggests an initial configuration based on the dataset’s features, so you don’t need to start from scratch.

> [!NOTE]
> In this first version, the Hub dataset must be public. If you are interested in support for private datasets, we’d love to hear from you on [GitHub](https://github.com/argilla-io/argilla).

The dataset’s columns will be mapped to fields and questions in Argilla. Fields include the data that you want feedback on, like text, chats, or images. Questions are the feedback you want to collect, like labels, ratings, rankings, or text. If you need, you can add and configure questions or remove unnecessary fields. All of the changes that you make will be previewed in real time, so you can see how your changes affect the dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The dataset’s columns will be mapped to fields and questions in Argilla. Fields include the data that you want feedback on, like text, chats, or images. Questions are the feedback you want to collect, like labels, ratings, rankings, or text. If you need, you can add and configure questions or remove unnecessary fields. All of the changes that you make will be previewed in real time, so you can see how your changes affect the dataset.
The goal is to map dataset columns to fields and questions in Argilla. Fields include the data you want feedback on, like text, chats, or images. Questions are the feedback you wish to collect, like labels, ratings, rankings, or text. You can add and configure questions or remove unnecessary fields if needed. You can preview all changes in real time to get a clear idea of the Argilla dataset you’re configuring.


Once you’re happy with the result, click “Create dataset” to import the dataset. Now you’re ready to give feedback!

You can try this for yourself by following the [quickstart guide](https://docs.argilla.io/latest/getting_started/quickstart/). It takes under 5 minutes!

This new workflow streamlines the import of datasets from the Hub, but you can still [import datasets using Argilla’s Python SDK](https://docs.argilla.io/latest/how_to_guides/dataset/) if you need further customization.
nataliaElv marked this conversation as resolved.
Show resolved Hide resolved

We’d love to hear your thoughts and first experiences. Let us know on GitHub or the HF Discord!
Binary file added assets/argilla-ui-hub/thumbnail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.