Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Collection API #1687

Merged
merged 18 commits into from
Sep 25, 2023
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
title: Inference
- local: guides/community
title: Community Tab
- local: guides/collections
title: Collections
- local: guides/manage-cache
title: Cache
- local: guides/model-cards
Expand Down Expand Up @@ -68,6 +70,8 @@
title: Repo Cards and Repo Card Data
- local: package_reference/space_runtime
title: Space runtime
- local: package_reference/collections
title: Collections
- local: package_reference/tensorboard
title: TensorBoard logger
- local: package_reference/webhooks_server
Expand Down
187 changes: 187 additions & 0 deletions docs/source/en/guides/collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Manage your collections
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

A collection is a group of related items on the Hub (models, datasets, Spaces, papers) that are organized together on a same page. Collections can be useful in many use cases such as creating your own portfolio, bookmarking content in categories or presenting a curated list of items your want to share. Check out this [guide](https://huggingface.co/docs/hub/collections) to understand in more details what are Collections and how they look like on the Hub,
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

Managing collections can be done in the browser directly. In this guide, we will focus on how to do it programmatically using `huggingface_hub`.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

## Fetch a collection

To fetch a collection, use [`get_collection`]. You can use it either on your own collections or any public one. To retrieve a collection, you must have its collection's `slug`. A slug is an identifier for a collection based on the title and a unique ID. You can find it in the URL of the collection page.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hfh_collection_slug.png"/>
Wauplin marked this conversation as resolved.
Show resolved Hide resolved
</div>

Here the slug is `"TheBloke/recent-models-64f9a55bb3115b4f513ec026"`. Let's fetch the collection:
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

```py
>>> from huggingface_hub import get_collection
>>> collection = get_collection("TheBloke/recent-models-64f9a55bb3115b4f513ec026")
>>> collection
Collection: {
{'description': "Models I've recently quantized.',
'items': [...],
'last_updated': datetime.datetime(2023, 9, 21, 7, 26, 28, 57000, tzinfo=datetime.timezone.utc),
'owner': 'TheBloke',
'position': 1,
'private': False,
'slug': 'TheBloke/recent-models-64f9a55bb3115b4f513ec026',
'theme': 'green',
'title': 'Recent models'}
}
>>> collection.items[0]
CollectionItem: {
{'item_object_id': '6507f6d5423b46492ee1413e',
'author': 'TheBloke',
'item_id': 'TheBloke/TigerBot-70B-Chat-GPTQ',
'item_type': 'model',
'lastModified': '2023-09-19T12:55:21.000Z',
'position': 0,
'private': False,
'repoType': 'model'
(...)
}
}
```

The [`Collection`] object returned by [`get_collection`] contains:
- high-level metadata: `slug`, `owner`, `title`, `description`, etc.
- a list of [`CollectionItem`] objects. Each item represents a model, a dataset, a Space or a paper.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

All items of a collection are guaranteed to have:
Wauplin marked this conversation as resolved.
Show resolved Hide resolved
- a unique `item_object_id`: this is the id of the collection item in the database
- an `item_id`: this is the id on the Hub of the underlying item (model, dataset, Space, paper). It is not necessarily unique! only the `item_id`/`item_type` pair is unique.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved
- an `item_type`: model, dataset, Space, paper.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved
- the `position` of the item in the collection. Position can be updated to re-organize your collection (see [`update_collection_item`] below)
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

A `note` can also be attached to the item. This is useful to add additional information about the item (e.g. a comment, a link to a blog post, etc.). If an item doesn't have a note, the attribute still exists with a `None` value.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

In addition to these base attributes, returned items can have additional attributes depending on their type: `author`, `private`, `lastModified`, `gated`, `title`, `likes`, `upvotes`, etc. None of these attributes are guaranteed to be returned.

## Create a new collection

Now that we know how to get a [`Collection`], let's create our own! Use [`create_collection`] with a title and optionally a description.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

```py
>>> from huggingface_hub import create_collection

>>> collection = create_collection(
... title="ICCV 2023",
... description="Portfolio of models, papers and demos I presented at ICCV 2023",
... )
```

It will return a [`Collection`] object with the high-level metadata (title, description, owner, etc.) and an empty list of items. You will now be able to refer to this collection using it's `slug`.

```py
>>> collection.slug
'iccv-2023-15e23b46cb98efca45'
>>> collection.title
"ICCV 2023"
>>> collection.owner
"username"
```

To create a collection on an organization page, pass `namespace="my-cool-org"` when creating the collection. Finally, you can also create private collections by passing `private=True`.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

## Manage items in a collection

Now that we have a [`Collection`], we want to add items to it and organize them.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

### Add items

Items have to be added one by one using [`add_collection_item`]. You only need to know the `collection_slug`, `item_id` and `item_type`. Optionally, you can also add a `note` to the item (500 characters maximum).

```py
>>> from huggingface_hub import create_collection, add_collection_item

>>> collection = create_collection(title="OS Week Highlights - Sept 18 - 24", namespace="osanseviero")
>>> collection.slug
"osanseviero/os-week-highlights-sept-18-24-650bfed7f795a59f491afb80"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always try to highlight some cool stuff in the docs examples 🤗

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice growth hack for Omar's profile:)


>>> add_collection_item(collection.slug, item_id="coqui/xtts", item_type="space")
>>> add_collection_item(
... collection.slug,
... item_id="warp-ai/wuerstchen",
... item_type="model",
... note="Würstchen is a new fast and efficient high resolution text-to-image architecture and model"
... )
>>> add_collection_item(collection.slug, item_id="lmsys/lmsys-chat-1m", item_type="dataset")
>>> add_collection_item(collection.slug, item_id="warp-ai/wuerstchen", item_type="space") # same item_id, different item_type
```

If an item already exists in a collection (i.e. same `item_id`/`item_type` pair), an HTTP 409 error will be raised. You can choose to ignore this error by setting `exists_ok=True`.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

### Add a note to an existing item

You can modify an existing item to add or modify the note attached to it using [`update_collection_item`]. Let's reuse the example above:

```py
>>> from huggingface_hub import get_collection, update_collection_item

# Fetch collection with newly added items
>>> collection_slug = "osanseviero/os-week-highlights-sept-18-24-650bfed7f795a59f491afb80"
>>> collection = get_collection(collection_slug)

# Add note the `lmsys-chat-1m` dataset
>>> update_collection_item(
... collection_slug=collection_slug,
... item_object_id=collection.items[2].item_object_id,
... note="This dataset contains one million real-world conversations with 25 state-of-the-art LLMs.",
... )
```

### Re-order items
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

Items in a collection are ordered. The order is determined by the `position` attribute of each item. By default, items are ordered by appending new items at the end of the collection. You can update the ordering using [`update_collection_item`] the same way you would add a note.
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

Let's reuse our example above:

```py
>>> from huggingface_hub import get_collection, update_collection_item

# Fetch collection
>>> collection_slug = "osanseviero/os-week-highlights-sept-18-24-650bfed7f795a59f491afb80"
>>> collection = get_collection(collection_slug)

# Reorder to place the two `Wuerstchen` items together
>>> update_collection_item(
... collection_slug=collection_slug,
... item_object_id=collection.items[3].item_object_id,
... position=2,
... )
```

### Remove items

Finally you can also remove an item using [`delete_collection_item`].
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

```py
>>> from huggingface_hub import get_collection, update_collection_item

# Fetch collection
>>> collection_slug = "osanseviero/os-week-highlights-sept-18-24-650bfed7f795a59f491afb80"
>>> collection = get_collection(collection_slug)

# Remove `coqui/xtts` Space from the list
>>> delete_collection_item(collection_slug=collection_slug, item_object_id=collection.items[0].item_object_id)
```

## Delete collection

A collection can be deleted using [`delete_collection`].

<Tip warning={true}>

This is a non-revertible action. A deleted collection cannot be restored.

</Tip>

```py
>>> from huggingface_hub import delete_collection
>>> collection = delete_collection("username/useless-collection-64f9a55bb3115b4f513ec026", missing_ok=True)
```
25 changes: 25 additions & 0 deletions docs/source/en/package_reference/collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Managing collections

Check out the [`HfApi`] documentation page for the reference of methods to manage your Space on the Hub.

- Get collection content: [`get_collection`]
- Create new collection: [`create_collection`]
- Update a collection: [`update_collection_metadata`]
- Delete a collection: [`delete_collection`]
- Add an item to a collection: [`add_collection_item`]
- Update an item in a collection: [`update_collection_item`]
- Remove an item from a collection: [`delete_collection_item`]

## Data structures
Wauplin marked this conversation as resolved.
Show resolved Hide resolved

### Collection

[[autodoc]] Collection

### CollectionItem

[[autodoc]] CollectionItem
18 changes: 18 additions & 0 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@
"try_to_load_from_cache",
],
"hf_api": [
"Collection",
"CollectionItem",
"CommitInfo",
"CommitOperation",
"CommitOperationAdd",
Expand All @@ -142,11 +144,13 @@
"ModelSearchArguments",
"RepoUrl",
"UserLikes",
"add_collection_item",
"add_space_secret",
"add_space_variable",
"change_discussion_status",
"comment_discussion",
"create_branch",
"create_collection",
"create_commit",
"create_commits_on_pr",
"create_discussion",
Expand All @@ -155,6 +159,8 @@
"create_tag",
"dataset_info",
"delete_branch",
"delete_collection",
"delete_collection_item",
"delete_file",
"delete_folder",
"delete_repo",
Expand All @@ -165,6 +171,7 @@
"duplicate_space",
"edit_discussion_comment",
"file_exists",
"get_collection",
"get_dataset_tags",
"get_discussion_details",
"get_full_repo_name",
Expand Down Expand Up @@ -199,6 +206,8 @@
"space_info",
"super_squash_history",
"unlike",
"update_collection_item",
"update_collection_metadata",
"update_repo_visibility",
"upload_file",
"upload_folder",
Expand Down Expand Up @@ -438,6 +447,8 @@ def __dir__():
try_to_load_from_cache, # noqa: F401
)
from .hf_api import (
Collection, # noqa: F401
CollectionItem, # noqa: F401
CommitInfo, # noqa: F401
CommitOperation, # noqa: F401
CommitOperationAdd, # noqa: F401
Expand All @@ -451,11 +462,13 @@ def __dir__():
ModelSearchArguments, # noqa: F401
RepoUrl, # noqa: F401
UserLikes, # noqa: F401
add_collection_item, # noqa: F401
add_space_secret, # noqa: F401
add_space_variable, # noqa: F401
change_discussion_status, # noqa: F401
comment_discussion, # noqa: F401
create_branch, # noqa: F401
create_collection, # noqa: F401
create_commit, # noqa: F401
create_commits_on_pr, # noqa: F401
create_discussion, # noqa: F401
Expand All @@ -464,6 +477,8 @@ def __dir__():
create_tag, # noqa: F401
dataset_info, # noqa: F401
delete_branch, # noqa: F401
delete_collection, # noqa: F401
delete_collection_item, # noqa: F401
delete_file, # noqa: F401
delete_folder, # noqa: F401
delete_repo, # noqa: F401
Expand All @@ -474,6 +489,7 @@ def __dir__():
duplicate_space, # noqa: F401
edit_discussion_comment, # noqa: F401
file_exists, # noqa: F401
get_collection, # noqa: F401
get_dataset_tags, # noqa: F401
get_discussion_details, # noqa: F401
get_full_repo_name, # noqa: F401
Expand Down Expand Up @@ -508,6 +524,8 @@ def __dir__():
space_info, # noqa: F401
super_squash_history, # noqa: F401
unlike, # noqa: F401
update_collection_item, # noqa: F401
update_collection_metadata, # noqa: F401
update_repo_visibility, # noqa: F401
upload_file, # noqa: F401
upload_folder, # noqa: F401
Expand Down
Loading
Loading