Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify upload docs #944

Merged
merged 8 commits into from
Jul 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 109 additions & 59 deletions docs/source/how-to-upstream.mdx
Original file line number Diff line number Diff line change
@@ -1,28 +1,116 @@
# Upload files to the Hub

Sharing your files and work is a very important aspect of the Hub. The `huggingface_hub` uses a Git-based workflow to upload files to the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to:
Sharing your files and work is an important aspect of the Hub. The `huggingface_hub` offers several options for uploading your files to the Hub. You can use these functions independently or integrate them into your library, making it more convenient for your users to interact with the Hub. This guide will show you how to push files:

* Push files with a `commit` context manager.
* Push files with the [`~Repository.push_to_hub`] function.
* Upload very large files with [Git LFS](https://git-lfs.github.com/).
* Push files without Git installed with [`HfApi`]
* without using Git.
* that are very large with [Git LFS](https://git-lfs.github.com/).
* with the `commit` context manager.
* with the [`~Repository.push_to_hub`] function.

Whenever you want to upload files to the Hub, you need to log in to your Hugging Face account:

1. Log in to your Hugging Face account with the following command:
- Log in to your Hugging Face account with the following command:

```bash
huggingface-cli login
```

2. Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, login with [`notebook_login`]:
- Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, log in with [`notebook_login`]:

```python
>>> from huggingface_hub import notebook_login
>>> notebook_login()
```

[`notebook_login`] will launch a widget in your notebook from which you can enter your Hugging Face credentials.
[`notebook_login`] launches a widget in your notebook from which you can enter your Hugging Face credentials.

## Push files without Git

If you don't have Git installed on your system, use [`create_commit`] to push your files to the Hub. [`create_commit`] uses the HTTP protocol to upload files to the Hub.

However, [`create_commit`] is a low-level API for working at a commit level. The [`upload_file`] and [`upload_folder`] functions are higher-level APIs that use [`create_commit`] under the hood and are generally more convenient. We recommend trying these functions first if you don't need to work at a lower level.

### Upload a file

Once you've created a repository with the [`create_repo`](./how-to-manage#create-a-repository) function, you can upload a file to your repository with the [`upload_file`] function.

Specify the path of the file to upload, where you want to upload the file to in the repository, and the name of the repository you want to add the file to. Depending on your repository type, you can optionally set the repository type as a `dataset`, `model`, or `space`.

```py
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.upload_file(path_or_fileobj="/path/to/local/folder/README.md",
... path_in_repo="README.md",
... repo_id="username/test-dataset",
... repo_type="dataset",
... )
```

### Upload a folder

Use the [`upload_folder`] function to upload a local folder to an existing repository. Specify the path of the local folder to upload, where you want to upload the folder to in the repository, and the name of the repository you want to add the folder to. Depending on your repository type, you can optionally set the repository type as a `dataset`, `model`, or `space`.

```py
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.upload_folder(folder_path="/path/to/local/folder",
... path_in_repo="my-dataset/train",
... repo_id="username/test-dataset",
... repo_type="dataset",
... )
```

### create_commit

If you want to work at a commit-level, use the [`create_commit`] function directly. There are two types of operations supported by [`create_commit`]:

- `CommitOperationAdd` uploads a file to the Hub. If the file already exists, the file contents are overwritten. This operation accepts two arguments:

* `path_in_repo`: the repository path to upload a file to.
* `path_or_fileobj`: either a path to a file on your filesystem or a file-like object. This is the content of the file to upload to the Hub.

- `CommitOperationDelete` removes a file from a repository. This operation accepts `path_in_repo` as an argument.

For example, if you want to upload two files and delete a file in a Hub repository:

1. Use the appropriate `CommitOperation` to add and delete a file:

```py
>>> from huggingface_hub import HfApi, CommitOperationAdd, CommitOperationDelete
>>> api = HfApi()
>>> operations = [
... CommitOperationAdd(path_in_repo="LICENSE.md", path_or_fileobj="~/repo/LICENSE.md"),
... CommitOperationAdd(path_in_repo="weights.h5", path_or_fileobj="~/repo/weights-final.h5"),
... CommitOperationDelete(path_in_repo="old-weights.h5"),
... ]
```

2. Pass your operations to [`create_commit`]:

```py
>>> api.create_commit(
... repo_id="lysandre/test-model",
... operations=operations,
... commit_message="Upload my model weights and license",
... )
```

In addition to [`upload_file`] and [`upload_folder`], the following functions also use [`create_commit`] under the hood:

* [`delete_file`] deletes a single file from a repository on the Hub.
* [`metadata_update`] updates a repository's metadata.

For more detailed information, take a look at the [`HfApi`] reference.

## Push files with Git LFS

Git LFS automatically handles files larger than 10MB. But for very large files (>5GB), you need to install a custom transfer agent for Git LFS:

```bash
huggingface-cli lfs-enable-largefiles
```

You should install this for each repository that has a very large file. Once installed, you'll be able to push files larger than 5GB.

## commit context manager

Expand All @@ -40,7 +128,7 @@ The `commit` context manager handles four of the most common Git commands: pull,
... f.write(json.dumps({"hey": 8}))
```

Here is another example of how to save and upload a file to a repository:
Here is another example of how to use the `commit` context manager to save and upload a file to a repository:

```python
>>> import torch
Expand All @@ -49,7 +137,7 @@ Here is another example of how to save and upload a file to a repository:
... torch.save(model.state_dict(), "model.pt")
```

Set `blocking=False` if you would like to push your commits asynchronously. Non-blocking behavior is helpful when you want to continue running your script while you push your commits.
Set `blocking=False` if you would like to push your commits asynchronously. Non-blocking behavior is helpful when you want to continue running your script while your commits are being pushed.

```python
>>> with repo.commit(commit_message="My cool model :)", blocking=False)
Expand Down Expand Up @@ -85,69 +173,31 @@ When `blocking=False`, commands are tracked, and your script will only exit when

## push_to_hub

The [`Repository`] class also has a [`~Repository.push_to_hub`] function to add files, make a commit, and push them to a repository. Unlike the `commit` context manager, [`~Repository.push_to_hub`] requires you to pull from a repository first, save the files, and then call [`~Repository.push_to_hub`].
The [`Repository`] class has a [`~Repository.push_to_hub`] function to add files, make a commit, and push them to a repository. Unlike the `commit` context manager, you'll need to pull from a repository first before calling [`~Repository.push_to_hub`].

For example, if you've already cloned a repository from the Hub, then you can initialize the `repo` from the local directory:

```python
>>> from huggingface_hub import Repository
>>> repo = Repository(local_dir="path/to/local/repo")
stevhliu marked this conversation as resolved.
Show resolved Hide resolved
```

Update your local clone with [`~Repository.git_pull`] and then push your file to the Hub:

```py
>>> repo.git_pull()
>>> repo.push_to_hub(commit_message="Commit my-awesome-file to the Hub")
```

However, if you aren't ready to push a file yet, you can still use [`~Repository.git_add`] and [`~Repository.git_commit`] to add and commit your file:
However, if you aren't ready to push a file yet, you can use [`~Repository.git_add`] and [`~Repository.git_commit`] to only add and commit your file:

```py
>>> repo.git_add("path/to/file")
>>> repo.git_commit(commit_message="add my first model config file :)")
```

Once you're ready, you can push your file to your repository with [`~Repository.git_push`]:
When you're ready, push the file to your repository with [`~Repository.git_push`]:

```py
>>> repo.git_push()
```

## Upload with Git LFS

For huge files (>5GB), you need to install a custom transfer agent for Git LFS:

```bash
huggingface-cli lfs-enable-largefiles
```

You should install this for each model repository that contains a model file. Once installed, you are now able to push files larger than 5GB.

## Managing files in a repo without Git with the `create_commit` API

`huggingface_hub` also offers a way to upload files to the Hub without Git installed on your system with the [`create_commit`] method of [`HfApi`].
For example, if you want to upload two files and delete another file in a Hub repo:

```py
>>> from huggingface_hub import HfApi, CommitOperationAdd, CommitOperationDelete
>>> api = HfApi()
>>> operations = [
... CommitOperationAdd(path_in_repo="LICENSE.md", path_or_fileobj="~/repo/LICENSE.md"),
... CommitOperationAdd(path_in_repo="weights.h5", path_or_fileobj="~/repo/weights-final.h5"),
... CommitOperationDelete(path_in_repo="old-weights.h5"),
... ]
>>> api.create_commit(
... repo_id="lysandre/test-model",
... operations=operations,
... )
```

[`create_commit`] uses the HTTP protocol to upload files to the Hub. It automatically takes care of uploading large files and binary files with the Git LFS protocol.
There are currently two kind of operations supported by the [`create_commit`] method:

1. [`CommitOperationAdd`] to upload a file to the Hub. If the file already exists, its content will be overwritten. It takes two arguments:
* `path_in_repo`: the path in the repository where the file should be uploaded
* `path_or_fileobj`: either a path to a file on your filesystem, or a file-like object. The content of the file to upload to the Hub.
2. [`CommitOperationDelete`] to remove a file from a repository. It takes `path_in_repo` as an argument.

Instead of [`create_commit`], you can also use the following convenience methods:
* [`upload_file`] to upload a single file to a repo on the Hub
* [`upload_folder`] to upload a local directory to a repo on the Hub
* [`delete_file`] to delete a single file from a repo on the Hub
* [`metadata_update`] to update a repo's metadata

All these methods use the `create_commit` API under the hood.
For a more detailed description, visit the [`hf_api`] documentation page.
2 changes: 2 additions & 0 deletions src/huggingface_hub/hf_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -1991,6 +1991,8 @@ def upload_file(
Example usage:
```python
>>> from huggingface_hub import upload_file
>>> with open("./local/filepath", "rb") as fobj:
... upload_file(
... path_or_fileobj=fileobj,
Expand Down