Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements non git mixin #618

Conversation

muellerzr
Copy link
Contributor

@muellerzr muellerzr commented Jan 21, 2022

This is the start of getting the improvements-non-git-mixin up to speed and worked towards its completion.

Basic changes found here:

  • upload_file now takes in a commit_message. This was part of the intended implementation, so trickled it up. This also allows for the back-ends to write custom messages for if something is a weight, tokenizer, etc, mimicking how Repository can do so already.
  • Added a test in test_hubmixin that makes sure the timings for uploading a new file to a blank repo, and uploading a new file to a repo with fairly large files is about the same. Currently the fake repo uploads 5 .5mb files (locally this took 12 seconds to accomplish. If we want to lower that to one .5mb file that's okay)

@muellerzr muellerzr requested a review from LysandreJik January 21, 2022 19:19
@muellerzr muellerzr changed the base branch from non-git-mixins to non-git-mixins-up-to-date January 21, 2022 19:39
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a nice improvement to the existing PR. Feel free to merge that PR in mine, and then let's discuss what is needed on it before we can merge.

What needs to be taken care of before we can merge in main, in my opinion:

  • Taking care of the 5GB files and chunking them before uploading them. Ideally the size would be given manually to the method, similarly to how it's done in datasets' push_to_hub method which already respects that paradigm (see Push to hub capabilities for Dataset and DatasetDict datasets#3098)
  • I may be mistaken but I don't think progress bars are implemented yet for upload_file
  • Very explicit documentation mentioning that for complex workflows, Repository should be used as it respects the git workflow

Anything else? Exciting to see progress on this!

Comment on lines +1342 to +1343
if commit_message is not None:
headers["Commit-Summary"] = commit_message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/huggingface_hub/hub_mixin.py Outdated Show resolved Hide resolved
@muellerzr
Copy link
Contributor Author

Nope! That should be about it for this PR. Will make the adjustment + merge, then will work towards the list

@muellerzr muellerzr merged commit cfe58df into huggingface:non-git-mixins-up-to-date Jan 24, 2022
LysandreJik pushed a commit to LysandreJik/huggingface_hub that referenced this pull request Mar 29, 2022
Add in commit_message + tests
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
LysandreJik pushed a commit to LysandreJik/huggingface_hub that referenced this pull request Apr 11, 2022
Add in commit_message + tests
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
LysandreJik pushed a commit to LysandreJik/huggingface_hub that referenced this pull request Apr 20, 2022
Add in commit_message + tests
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Wauplin added a commit that referenced this pull request Aug 17, 2022
* Propose a non-git mixin

* Fix merge

* Improvements non git mixin (#618)

Add in commit_message + tests
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Big refactor

* FIX typos in contributing.md

* Remove redefined logger in HfApi.py

* Use upload_folder in both mixins + some docstring

* moved back logger to top of hf_api ><

* space in documentation can be ambiguous

* WIP started deprecation

* deprecate skip lfs file and use Path

* added decorator to deprecate specific arguments + unittests for it

* simplified tests

* proper decorators

* fix docstring

* hubmixin: fixed existing tests + add http one

* unique repo names across tests

* make push_to_hub_keras work + tests

* logs are not overwritten in push_to_hub_keras

* flake8

* refacto push_to_hub from mixin.save_pretrained

* deprecate positional argument in version 0.12

* remove docstring for deprecated skip_lfs_files

* delete old logs when uploading keras model to hub

* remove TODO in tests

* remove useless todo

* Update src/huggingface_hub/hub_mixin.py

* flake8

* remove un-explicit _generate_url helper

* exclude some folders from flake8

* clean pr

* move out tests fixing to other issue

* introduce decorator to deprecate tests

* docstring

* fix pattern in expect_deprecation decorator

* docstring

* remove extras['ml'] integration

* Apply suggestions from code review

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* use expect_deprecation decorator in test

* remove caret in comment

* explicit mocked model

* delete unused and failing tett

* optional path_in_repo for root

* revert back token docstring

* Update src/huggingface_hub/hf_api.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>
Co-authored-by: Wauplin <lucainp@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants