Skip to content

Commit

Permalink
fix/issue 2090 : Add a repo_files command, with recursive deletion. (
Browse files Browse the repository at this point in the history
…#2280)

* feat(Command): Add a new  Command to allow files deletion from repositories

* Switch to pytest assertion style

* Pass style && quality gates

* Rename DeleteSubCommand and reorder commands

* Rename delete_files_r to delete_files

* Next round of post-review changes

* Fix some tests

* Implement more post-review changes

* Push last modifications to the test

* Pass formatter

* Pass formatter

* refacto tests

* handle folders

* concise

* Update src/huggingface_hub/commands/repo_files.py

* style

* Add commit_description commit_message and create_pr parameters to the repo_files delete command

* Start documenting

* Improve the documentation

* Finish the doc guide

* Apply suggestions from code review

* fix tests

---------

Co-authored-by: Lucain <lucainp@gmail.com>
  • Loading branch information
OlivierKessler01 and Wauplin authored Jun 3, 2024
1 parent b6890e1 commit c8dc5f5
Show file tree
Hide file tree
Showing 8 changed files with 505 additions and 5 deletions.
34 changes: 34 additions & 0 deletions docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,40 @@ By default, the `huggingface-cli upload` command will be verbose. It will print
https://huggingface.co/Wauplin/my-cool-model/tree/main
```

## huggingface-cli repo-files

If you want to delete files from a Hugging Face repository, use the `huggingface-cli repo-files` command.

### Delete files

The `huggingface-cli repo-files <repo_id> delete` sub-command allows you to delete files from a repository. Here are some usage examples.

Delete a folder :
```bash
>>> huggingface-cli repo-files Wauplin/my-cool-model delete folder/
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
```

Delete multiple files:
```bash
>>> huggingface-cli repo-files Wauplin/my-cool-model delete file.txt folder/pytorch_model.bin
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
```

Use Unix-style wildcards to delete sets of files:
```bash
>>> huggingface-cli repo-files Wauplin/my-cool-model delete *.txt folder/*.bin
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
```

### Specify a token

To delete files from a repo you must be authenticated and authorized. By default, the token saved locally (using `huggingface-cli login`) will be used. If you want to authenticate explicitly, use the `--token` option:

```bash
>>> huggingface-cli repo-files --token=hf_**** Wauplin/my-cool-model delete file.txt
```

## huggingface-cli scan-cache

Scanning your cache directory is useful if you want to know which repos you have downloaded and how much space it takes on your disk. You can do that by running `huggingface-cli scan-cache`:
Expand Down
6 changes: 4 additions & 2 deletions src/huggingface_hub/commands/huggingface_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from huggingface_hub.commands.download import DownloadCommand
from huggingface_hub.commands.env import EnvironmentCommand
from huggingface_hub.commands.lfs import LfsCommands
from huggingface_hub.commands.repo_files import RepoFilesCommand
from huggingface_hub.commands.scan_cache import ScanCacheCommand
from huggingface_hub.commands.tag import TagCommands
from huggingface_hub.commands.upload import UploadCommand
Expand All @@ -30,10 +31,11 @@ def main():
commands_parser = parser.add_subparsers(help="huggingface-cli command helpers")

# Register commands
DownloadCommand.register_subcommand(commands_parser)
UploadCommand.register_subcommand(commands_parser)
RepoFilesCommand.register_subcommand(commands_parser)
EnvironmentCommand.register_subcommand(commands_parser)
UserCommands.register_subcommand(commands_parser)
UploadCommand.register_subcommand(commands_parser)
DownloadCommand.register_subcommand(commands_parser)
LfsCommands.register_subcommand(commands_parser)
ScanCacheCommand.register_subcommand(commands_parser)
DeleteCacheCommand.register_subcommand(commands_parser)
Expand Down
128 changes: 128 additions & 0 deletions src/huggingface_hub/commands/repo_files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# coding=utf-8
# Copyright 2023-present, the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains command to update or delete files in a repository using the CLI.
Usage:
# delete all
huggingface-cli repo-files <repo_id> delete *
# delete single file
huggingface-cli repo-files <repo_id> delete file.txt
# delete single folder
huggingface-cli repo-files <repo_id> delete folder/
# delete multiple
huggingface-cli repo-files <repo_id> delete file.txt folder/ file2.txt
# delete multiple patterns
huggingface-cli repo-files <repo_id> delete file.txt *.json folder/*.parquet
# delete from different revision / repo-type
huggingface-cli repo-files <repo_id> delete file.txt --revision=refs/pr/1 --repo-type=dataset
"""

from argparse import _SubParsersAction
from typing import List, Optional

from huggingface_hub import logging
from huggingface_hub.commands import BaseHuggingfaceCLICommand
from huggingface_hub.hf_api import HfApi


logger = logging.get_logger(__name__)


class DeleteFilesSubCommand:
def __init__(self, args) -> None:
self.args = args
self.repo_id: str = args.repo_id
self.repo_type: Optional[str] = args.repo_type
self.revision: Optional[str] = args.revision
self.api: HfApi = HfApi(token=args.token, library_name="huggingface-cli")
self.patterns: List[str] = args.patterns
self.commit_message: Optional[str] = args.commit_message
self.commit_description: Optional[str] = args.commit_description
self.create_pr: bool = args.create_pr
self.token: Optional[str] = args.token

def run(self) -> None:
logging.set_verbosity_info()
url = self.api.delete_files(
delete_patterns=self.patterns,
repo_id=self.repo_id,
repo_type=self.repo_type,
revision=self.revision,
commit_message=self.commit_message,
commit_description=self.commit_description,
create_pr=self.create_pr,
)
print(f"Files correctly deleted from repo. Commit: {url}.")
logging.set_verbosity_warning()


class RepoFilesCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction):
repo_files_parser = parser.add_parser("repo-files", help="Manage files in a repo on the Hub")
repo_files_parser.add_argument(
"repo_id", type=str, help="The ID of the repo to manage (e.g. `username/repo-name`)."
)
repo_files_subparsers = repo_files_parser.add_subparsers(
help="Action to execute against the files.",
required=True,
)
delete_subparser = repo_files_subparsers.add_parser(
"delete",
help="Delete files from a repo on the Hub",
)
delete_subparser.set_defaults(func=lambda args: DeleteFilesSubCommand(args))
delete_subparser.add_argument(
"patterns",
nargs="+",
type=str,
help="Glob patterns to match files to delete.",
)
delete_subparser.add_argument(
"--repo-type",
choices=["model", "dataset", "space"],
default="model",
help="Type of the repo to upload to (e.g. `dataset`).",
)
delete_subparser.add_argument(
"--revision",
type=str,
help=(
"An optional Git revision to push to. It can be a branch name "
"or a PR reference. If revision does not"
" exist and `--create-pr` is not set, a branch will be automatically created."
),
)
delete_subparser.add_argument(
"--commit-message", type=str, help="The summary / title / first line of the generated commit."
)
delete_subparser.add_argument(
"--commit-description", type=str, help="The description of the generated commit."
)
delete_subparser.add_argument(
"--create-pr", action="store_true", help="Whether to create a new Pull Request for these changes."
)
repo_files_parser.add_argument(
"--token",
type=str,
help="A User Access Token generated from https://huggingface.co/settings/tokens",
)

repo_files_parser.set_defaults(func=RepoFilesCommand)
81 changes: 79 additions & 2 deletions src/huggingface_hub/hf_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -4611,7 +4611,7 @@ def upload_folder(
ignore_patterns = [ignore_patterns]
ignore_patterns += DEFAULT_IGNORE_PATTERNS

delete_operations = self._prepare_upload_folder_deletions(
delete_operations = self._prepare_folder_deletions(
repo_id=repo_id,
repo_type=repo_type,
revision=DEFAULT_REVISION if create_pr else revision,
Expand Down Expand Up @@ -4772,6 +4772,82 @@ def delete_file(
parent_commit=parent_commit,
)

@validate_hf_hub_args
def delete_files(
self,
repo_id: str,
delete_patterns: List[str],
*,
token: Union[bool, str, None] = None,
repo_type: Optional[str] = None,
revision: Optional[str] = None,
commit_message: Optional[str] = None,
commit_description: Optional[str] = None,
create_pr: Optional[bool] = None,
parent_commit: Optional[str] = None,
) -> CommitInfo:
"""
Delete files from a repository on the Hub.
If a folder path is provided, the entire folder is deleted as well as
all files it contained.
Args:
repo_id (`str`):
The repository from which the folder will be deleted, for example:
`"username/custom_transformers"`
delete_patterns (`List[str]`):
List of files or folders to delete. Each string can either be
a file path, a folder path or a Unix shell-style wildcard.
E.g. `["file.txt", "folder/", "data/*.parquet"]`
token (Union[bool, str, None], optional):
A valid user access token (string). Defaults to the locally saved
token, which is the recommended method for authentication (see
https://huggingface.co/docs/huggingface_hub/quick-start#authentication).
To disable authentication, pass `False`.
to the stored token.
repo_type (`str`, *optional*):
Type of the repo to delete files from. Can be `"model"`,
`"dataset"` or `"space"`. Defaults to `"model"`.
revision (`str`, *optional*):
The git revision to commit from. Defaults to the head of the `"main"` branch.
commit_message (`str`, *optional*):
The summary (first line) of the generated commit. Defaults to
`f"Delete files using huggingface_hub"`.
commit_description (`str` *optional*)
The description of the generated commit.
create_pr (`boolean`, *optional*):
Whether or not to create a Pull Request with that commit. Defaults to `False`.
If `revision` is not set, PR is opened against the `"main"` branch. If
`revision` is set and is a branch, PR is opened against this branch. If
`revision` is set and is not a branch name (example: a commit oid), an
`RevisionNotFoundError` is returned by the server.
parent_commit (`str`, *optional*):
The OID / SHA of the parent commit, as a hexadecimal string. Shorthands (7 first characters) are also supported.
If specified and `create_pr` is `False`, the commit will fail if `revision` does not point to `parent_commit`.
If specified and `create_pr` is `True`, the pull request will be created from `parent_commit`.
Specifying `parent_commit` ensures the repo has not changed before committing the changes, and can be
especially useful if the repo is updated / committed to concurrently.
"""
operations = self._prepare_folder_deletions(
repo_id=repo_id, repo_type=repo_type, delete_patterns=delete_patterns, path_in_repo="", revision=revision
)

if commit_message is None:
commit_message = f"Delete files {' '.join(delete_patterns)} with huggingface_hub"

return self.create_commit(
repo_id=repo_id,
repo_type=repo_type,
token=token,
operations=operations,
revision=revision,
commit_message=commit_message,
commit_description=commit_description,
create_pr=create_pr,
parent_commit=parent_commit,
)

@validate_hf_hub_args
def delete_folder(
self,
Expand Down Expand Up @@ -8771,7 +8847,7 @@ def _build_hf_headers(
headers=self.headers,
)

def _prepare_upload_folder_deletions(
def _prepare_folder_deletions(
self,
repo_id: str,
repo_type: Optional[str],
Expand Down Expand Up @@ -8994,6 +9070,7 @@ def _parse_revision_from_pr_url(pr_url: str) -> str:
upload_folder = api.upload_folder
delete_file = api.delete_file
delete_folder = api.delete_folder
delete_files = api.delete_files
create_commits_on_pr = api.create_commits_on_pr
preupload_lfs_files = api.preupload_lfs_files
create_branch = api.create_branch
Expand Down
11 changes: 11 additions & 0 deletions src/huggingface_hub/utils/_paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,11 @@ def filter_repo_objects(
if isinstance(ignore_patterns, str):
ignore_patterns = [ignore_patterns]

if allow_patterns is not None:
allow_patterns = [_add_wildcard_to_directories(p) for p in allow_patterns]
if ignore_patterns is not None:
ignore_patterns = [_add_wildcard_to_directories(p) for p in ignore_patterns]

if key is None:

def _identity(item: T) -> str:
Expand All @@ -128,3 +133,9 @@ def _identity(item: T) -> str:
continue

yield item


def _add_wildcard_to_directories(pattern: str) -> str:
if pattern[-1] == "/":
return pattern + "*"
return pattern
Loading

0 comments on commit c8dc5f5

Please sign in to comment.