diff --git a/docs/source/how-to-upstream.mdx b/docs/source/how-to-upstream.mdx index 96170f91e7..3361b14224 100644 --- a/docs/source/how-to-upstream.mdx +++ b/docs/source/how-to-upstream.mdx @@ -1,28 +1,116 @@ # Upload files to the Hub -Sharing your files and work is a very important aspect of the Hub. The `huggingface_hub` uses a Git-based workflow to upload files to the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to: +Sharing your files and work is an important aspect of the Hub. The `huggingface_hub` offers several options for uploading your files to the Hub. You can use these functions independently or integrate them into your library, making it more convenient for your users to interact with the Hub. This guide will show you how to push files: -* Push files with a `commit` context manager. -* Push files with the [`~Repository.push_to_hub`] function. -* Upload very large files with [Git LFS](https://git-lfs.github.com/). -* Push files without Git installed with [`HfApi`] +* without using Git. +* that are very large with [Git LFS](https://git-lfs.github.com/). +* with the `commit` context manager. +* with the [`~Repository.push_to_hub`] function. Whenever you want to upload files to the Hub, you need to log in to your Hugging Face account: -1. Log in to your Hugging Face account with the following command: +- Log in to your Hugging Face account with the following command: ```bash huggingface-cli login ``` -2. Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, login with [`notebook_login`]: +- Alternatively, if you prefer working from a Jupyter or Colaboratory notebook, log in with [`notebook_login`]: ```python >>> from huggingface_hub import notebook_login >>> notebook_login() ``` - [`notebook_login`] will launch a widget in your notebook from which you can enter your Hugging Face credentials. + [`notebook_login`] launches a widget in your notebook from which you can enter your Hugging Face credentials. + +## Push files without Git + +If you don't have Git installed on your system, use [`create_commit`] to push your files to the Hub. [`create_commit`] uses the HTTP protocol to upload files to the Hub. + +However, [`create_commit`] is a low-level API for working at a commit level. The [`upload_file`] and [`upload_folder`] functions are higher-level APIs that use [`create_commit`] under the hood and are generally more convenient. We recommend trying these functions first if you don't need to work at a lower level. + +### Upload a file + +Once you've created a repository with the [`create_repo`](./how-to-manage#create-a-repository) function, you can upload a file to your repository with the [`upload_file`] function. + +Specify the path of the file to upload, where you want to upload the file to in the repository, and the name of the repository you want to add the file to. Depending on your repository type, you can optionally set the repository type as a `dataset`, `model`, or `space`. + +```py +>>> from huggingface_hub import HfApi +>>> api = HfApi() +>>> api.upload_file(path_or_fileobj="/path/to/local/folder/README.md", +... path_in_repo="README.md", +... repo_id="username/test-dataset", +... repo_type="dataset", +... ) +``` + +### Upload a folder + +Use the [`upload_folder`] function to upload a local folder to an existing repository. Specify the path of the local folder to upload, where you want to upload the folder to in the repository, and the name of the repository you want to add the folder to. Depending on your repository type, you can optionally set the repository type as a `dataset`, `model`, or `space`. + +```py +>>> from huggingface_hub import HfApi +>>> api = HfApi() +>>> api.upload_folder(folder_path="/path/to/local/folder", +... path_in_repo="my-dataset/train", +... repo_id="username/test-dataset", +... repo_type="dataset", +... ) +``` + +### create_commit + +If you want to work at a commit-level, use the [`create_commit`] function directly. There are two types of operations supported by [`create_commit`]: + +- `CommitOperationAdd` uploads a file to the Hub. If the file already exists, the file contents are overwritten. This operation accepts two arguments: + + * `path_in_repo`: the repository path to upload a file to. + * `path_or_fileobj`: either a path to a file on your filesystem or a file-like object. This is the content of the file to upload to the Hub. + +- `CommitOperationDelete` removes a file from a repository. This operation accepts `path_in_repo` as an argument. + +For example, if you want to upload two files and delete a file in a Hub repository: + +1. Use the appropriate `CommitOperation` to add and delete a file: + +```py +>>> from huggingface_hub import HfApi, CommitOperationAdd, CommitOperationDelete +>>> api = HfApi() +>>> operations = [ +... CommitOperationAdd(path_in_repo="LICENSE.md", path_or_fileobj="~/repo/LICENSE.md"), +... CommitOperationAdd(path_in_repo="weights.h5", path_or_fileobj="~/repo/weights-final.h5"), +... CommitOperationDelete(path_in_repo="old-weights.h5"), +... ] +``` + +2. Pass your operations to [`create_commit`]: + +```py +>>> api.create_commit( +... repo_id="lysandre/test-model", +... operations=operations, +... commit_message="Upload my model weights and license", +... ) +``` + +In addition to [`upload_file`] and [`upload_folder`], the following functions also use [`create_commit`] under the hood: + +* [`delete_file`] deletes a single file from a repository on the Hub. +* [`metadata_update`] updates a repository's metadata. + +For more detailed information, take a look at the [`HfApi`] reference. + +## Push files with Git LFS + +Git LFS automatically handles files larger than 10MB. But for very large files (>5GB), you need to install a custom transfer agent for Git LFS: + +```bash +huggingface-cli lfs-enable-largefiles +``` + +You should install this for each repository that has a very large file. Once installed, you'll be able to push files larger than 5GB. ## commit context manager @@ -40,7 +128,7 @@ The `commit` context manager handles four of the most common Git commands: pull, ... f.write(json.dumps({"hey": 8})) ``` -Here is another example of how to save and upload a file to a repository: +Here is another example of how to use the `commit` context manager to save and upload a file to a repository: ```python >>> import torch @@ -49,7 +137,7 @@ Here is another example of how to save and upload a file to a repository: ... torch.save(model.state_dict(), "model.pt") ``` -Set `blocking=False` if you would like to push your commits asynchronously. Non-blocking behavior is helpful when you want to continue running your script while you push your commits. +Set `blocking=False` if you would like to push your commits asynchronously. Non-blocking behavior is helpful when you want to continue running your script while your commits are being pushed. ```python >>> with repo.commit(commit_message="My cool model :)", blocking=False) @@ -85,69 +173,31 @@ When `blocking=False`, commands are tracked, and your script will only exit when ## push_to_hub -The [`Repository`] class also has a [`~Repository.push_to_hub`] function to add files, make a commit, and push them to a repository. Unlike the `commit` context manager, [`~Repository.push_to_hub`] requires you to pull from a repository first, save the files, and then call [`~Repository.push_to_hub`]. +The [`Repository`] class has a [`~Repository.push_to_hub`] function to add files, make a commit, and push them to a repository. Unlike the `commit` context manager, you'll need to pull from a repository first before calling [`~Repository.push_to_hub`]. + +For example, if you've already cloned a repository from the Hub, then you can initialize the `repo` from the local directory: ```python >>> from huggingface_hub import Repository +>>> repo = Repository(local_dir="path/to/local/repo") +``` + +Update your local clone with [`~Repository.git_pull`] and then push your file to the Hub: + +```py >>> repo.git_pull() >>> repo.push_to_hub(commit_message="Commit my-awesome-file to the Hub") ``` -However, if you aren't ready to push a file yet, you can still use [`~Repository.git_add`] and [`~Repository.git_commit`] to add and commit your file: +However, if you aren't ready to push a file yet, you can use [`~Repository.git_add`] and [`~Repository.git_commit`] to only add and commit your file: ```py >>> repo.git_add("path/to/file") >>> repo.git_commit(commit_message="add my first model config file :)") ``` -Once you're ready, you can push your file to your repository with [`~Repository.git_push`]: +When you're ready, push the file to your repository with [`~Repository.git_push`]: ```py >>> repo.git_push() ``` - -## Upload with Git LFS - -For huge files (>5GB), you need to install a custom transfer agent for Git LFS: - -```bash -huggingface-cli lfs-enable-largefiles -``` - -You should install this for each model repository that contains a model file. Once installed, you are now able to push files larger than 5GB. - -## Managing files in a repo without Git with the `create_commit` API - -`huggingface_hub` also offers a way to upload files to the Hub without Git installed on your system with the [`create_commit`] method of [`HfApi`]. -For example, if you want to upload two files and delete another file in a Hub repo: - -```py ->>> from huggingface_hub import HfApi, CommitOperationAdd, CommitOperationDelete ->>> api = HfApi() ->>> operations = [ -... CommitOperationAdd(path_in_repo="LICENSE.md", path_or_fileobj="~/repo/LICENSE.md"), -... CommitOperationAdd(path_in_repo="weights.h5", path_or_fileobj="~/repo/weights-final.h5"), -... CommitOperationDelete(path_in_repo="old-weights.h5"), -... ] ->>> api.create_commit( -... repo_id="lysandre/test-model", -... operations=operations, -... ) -``` - -[`create_commit`] uses the HTTP protocol to upload files to the Hub. It automatically takes care of uploading large files and binary files with the Git LFS protocol. -There are currently two kind of operations supported by the [`create_commit`] method: - -1. [`CommitOperationAdd`] to upload a file to the Hub. If the file already exists, its content will be overwritten. It takes two arguments: - * `path_in_repo`: the path in the repository where the file should be uploaded - * `path_or_fileobj`: either a path to a file on your filesystem, or a file-like object. The content of the file to upload to the Hub. -2. [`CommitOperationDelete`] to remove a file from a repository. It takes `path_in_repo` as an argument. - -Instead of [`create_commit`], you can also use the following convenience methods: -* [`upload_file`] to upload a single file to a repo on the Hub -* [`upload_folder`] to upload a local directory to a repo on the Hub -* [`delete_file`] to delete a single file from a repo on the Hub -* [`metadata_update`] to update a repo's metadata - -All these methods use the `create_commit` API under the hood. -For a more detailed description, visit the [`hf_api`] documentation page. diff --git a/src/huggingface_hub/hf_api.py b/src/huggingface_hub/hf_api.py index a744cea5f1..025a411572 100644 --- a/src/huggingface_hub/hf_api.py +++ b/src/huggingface_hub/hf_api.py @@ -1991,6 +1991,8 @@ def upload_file( Example usage: ```python + >>> from huggingface_hub import upload_file + >>> with open("./local/filepath", "rb") as fobj: ... upload_file( ... path_or_fileobj=fileobj,