Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented CLI upload functionality #1618

Merged
merged 19 commits into from
Sep 6, 2023

Conversation

martinbrose
Copy link
Contributor

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 31, 2023

The documentation is not available anymore as the PR was closed or merged.

@Wauplin
Copy link
Contributor

Wauplin commented Sep 5, 2023

Hey @martinbrose, thanks again for the great job on the PR! Together with #1617, it will be a great improvement for the users 🔥

FYI, I just pushed a new commit (acc570b) in which I:

  • tweaked a few args description (minor changes)
  • added support for --every => it was not really clear in the initial ticket. The goal in this case is to use CommitScheduler to trigger a background job that will sync with the Hub every X minutes.
  • tweaked how to compute implicit path_in_repo, especially on Windows. I saw you've used .replace("\\", "/") which should be fine but I chose to use pathlib.Path.as_posix which feels safer.
  • added support to create the repo if it doesn't exists yet
  • added some mocked tests
  • added a small section in the docs

I also extensively tested the command locally. I hope I have covered all possible use cases. Here is the current UX:

# Upload file (implicit path in repo)
huggingface-cli upload my-cool-model ./my-cool-model.safetensors

# Upload file (explicit path in repo)
huggingface-cli upload my-cool-model ./my-cool-model.safetensors  model.safetensors

# Upload directory (implicit paths)
huggingface-cli upload my-cool-model

# Upload directory (explicit local path, explicit path in repo)
huggingface-cli upload my-cool-model ./models/my-cool-model .

# Upload filtered directory (example: tensorboard logs except for the last run)
huggingface-cli upload my-cool-model ./model/training /logs --include "*.tfevents.*" --exclude "*20230905*"

# Upload private dataset
huggingface-cli upload Wauplin/my-cool-dataset ./data . --repo-type=dataset --private

# Upload with token
huggingface-cli upload Wauplin/my-cool-model --token=hf_****

# Sync local Space with Hub (upload new files, delete removed files)
huggingface-cli upload Wauplin/space-example --repo-type=space --exclude="/logs/*" --delete="*" --commit-message="Sync local Space with Hub"

# Schedule commits every 30 minutes
huggingface-cli upload Wauplin/my-cool-model --every=30

cc @osanseviero @julien-c @LysandreJik for another review round, especially on the CLI signature. It's more or less what was described in #1543 (comment).
cc @stevhliu for the guide section as well 🙏

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool to be able to schedule your upload with this as well! 👍

docs/source/en/guides/upload.md Outdated Show resolved Hide resolved
docs/source/en/guides/upload.md Outdated Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks great to me!

I've added some comments regarding the API coming from linux expectations, but feel free to disregard if you feel strongly. The implementation is clean and it'll greatly simplify model uploading outside of python runtimes.

Thanks both for your work!

huggingface-cli upload my-cool-model ./models

# Upload directory (implicit local_path, implicit path_in_repo)
huggingface-cli upload my-cool-model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this supposed to do exactly? Reading it I understand it to be the following: upload the my-cool-model folder and its content to the <user>/my-cool-model repository, the contents of the folder being at the root of the repository.

However, it doesn't seem to be the case: it seems to be uploading all folders and files in the current working directory to the root of the <user>/my-cool-model repository. Is this expected?

Reproducer:

mkdir hfh-test && cd hfh-test
mkdir folder-1
touch folder-1/file.txt
mkdir folder-2
touch folder-2/file.txt

# Current repo structure
# ./
# ../
# folder-1/
#     file.txt
# folder-2/
#     file.txt

huggingface-cli upload folder-1

I would expect folder-1 to contain the contents of folder-1, but it contains everything that was in the working directory: https://huggingface.co/lysandre/folder-1/tree/main

I would expect the command to do what it just did to be huggingface-cli upload folder-1 .

Copy link
Contributor

@Wauplin Wauplin Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @LysandreJik. My initial thought was that huggingface-cli upload my-cool-model would upload the current directory to the Hub (at root level). So huggingface-cli upload my-cool-model being equivalent to huggingface-cli upload my-cool-model . . . A bit as when you have a local repo and you do git add . && git commit -m "something" && git push.


That being said, the use case you've just described (e.g. repo_id == name of a local folder) is perfectly valid as well. I've updated the CLI in that sense. Now the behavior looks like this:

# Current repo structure
# ./
# ../
# folder-1/
#     file.txt
# folder-2/
#     file.txt

# Upload "./folder-1" content to "Wauplin/folder-1"
# On the hub: file.txt
>>> huggingface-cli upload folder-1

# Upload "./folder-1" content to "huggingface/folder-1"
# On the hub: file.txt
>>> huggingface-cli upload huggingface/folder-1

# Upload "./folder-1" and "./folder-2" to "Wauplin/my-cool-model"
# On the hub: folder-1/file.txt and folder-2/file.txt
>>> huggingface-cli upload my-cool-model .

# Upload "./folder-1" and "./folder-2" to "Wauplin/my-cool-model" under "./data"
# On the hub: data/folder-1/file.txt and data/folder-2/file.txt
>>> huggingface-cli upload my-cool-model . data/

# Raise exception => user must set local path explicitly 
>>> huggingface-cli upload folder-3

Comment on lines 133 to 134
# Upload directory (implicit path_in_repo)
huggingface-cli upload my-cool-model ./models
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also kinda expect this to upload the contents of models to my-cool-model at the root of my-cool-model, but it uploads them to the subfolder models

Copy link
Contributor

@Wauplin Wauplin Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the CLI and now that's the case. Also added a unit test for this use case. By default all content is uploaded at root level unless specified differently.

Comment on lines 130 to 131
# Upload file (implicit path_in_repo)
huggingface-cli upload my-cool-model model.safetensors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as I would expect it to work 👌

docs/source/en/guides/upload.md Show resolved Hide resolved
@Wauplin
Copy link
Contributor

Wauplin commented Sep 6, 2023

Thanks for the great feedback @LysandreJik! I have revisited a bit the API given your comments:

  • No more relative path to compute the path_in_repo. Files are always uploaded at root level except if explicitly set differently by the user. It makes things easier to explain/understand as well.
  • If the local_path is not provided (e.g. huggingface-cli upload my-cool-model), we check if ./my-cool-model is a local file or folder. If it's the case, we upload it. If not, we raise an exception and ask the user to be explicit.

TL;DR: no implicit stuff except when it's obvious

I'll merge this as soon as the CI is green :)

@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Patch coverage: 74.13% and project coverage change: -0.43% ⚠️

Comparison is base (9e71350) 82.28% compared to head (49b43c5) 81.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1618      +/-   ##
==========================================
- Coverage   82.28%   81.86%   -0.43%     
==========================================
  Files          61       62       +1     
  Lines        6849     6964     +115     
==========================================
+ Hits         5636     5701      +65     
- Misses       1213     1263      +50     
Files Changed Coverage Δ
src/huggingface_hub/commands/huggingface_cli.py 0.00% <0.00%> (ø)
src/huggingface_hub/commands/upload.py 75.45% <75.45%> (ø)
src/huggingface_hub/commands/download.py 97.26% <100.00%> (+0.11%) ⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wauplin Wauplin merged commit b94f891 into huggingface:main Sep 6, 2023
@martinbrose martinbrose deleted the 1543-CLI-upload branch September 6, 2023 16:34
@LysandreJik
Copy link
Member

Thank you both for your work! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants