add `push_to_hub` to pipeline #29172

not-lain · 2024-02-21T14:05:14Z

What does this PR do?

this will add push_to_hub method to the pipelines allowing people to push their custom pipelines to the huggingface hub easily

Fixes #28857 #28983

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil @Rocketknight1

additional resources :

https://huggingface.co/docs/transformers/custom_models (there is a push_to_hub method)
https://huggingface.co/docs/transformers/en/add_new_pipeline (doesn't include a push_to_hub method)
notebook: https://colab.research.google.com/drive/1yCET-FdkI4kHK1B-VUrrM7sKHMO3LkOg?usp=sharing

TODO:

fix documentation for push_to_hub
~~fix automap configuration for intial push (keeps adding a user/repo--file.module)~~
update save_pretrained method (checkout PreTrainedModel class for more info ) (enhancement)
add tests

…rmers into add-push-to-hub

not-lain · 2024-02-22T11:27:46Z

@ArthurZucker @Rocketknight1
yess finally, fixed the docs

as for the tests, i will leave that part to you
as for the configuration i just wanted to highlight what needs to be fixed, also it's consistent enough, and according to Sylvian, not always do people push the pipeline to the same repo containing the model which is tricky, my suggestion is to leave the configuration for now and open a seperate issue for that.

TLDR;
assuming the model is in another remote repo is better than assuming it's in the same one we're pushing to and messing the configuration.

any reviews, comments or ideas are much appreciated.

not-lain · 2024-02-22T12:07:45Z

almost forgot #29004 will fix any problems with remote pipeline configuration for most of the cases (adds remote repo flags user/repo--file.module to the custom-pipeline field, leaving this as the final configuration inconsistency since this is related to the auto_config instead, adding extra unnecessary remote flags , maybe we should add a if else there checking if the pipeline being pushed to the same original model repo.
a ruff estimation of the code in L938 of the same file could be like this :

if self.model.config._name_or_path != repo_id :
  custom_object_save(self, save_directory)

let me know if you approve of this

not-lain · 2024-02-23T14:34:26Z

@ArthurZucker @Rocketknight1 after careful investigation it turns out that the extra flag is added due to the AutoModelForxxx and NOT the new push_to_hub method so i'm removing it from the todo list since it's irrelevant to this pull request.
reporoduction :
https://colab.research.google.com/drive/1unFh3i5FyRRHcUO8Al7cLKkYXPhtr0lC?usp=sharing

update save_pretrained

not-lain · 2024-02-28T16:52:39Z

@Rocketknight1 I have updated the save_pretrained a little bit, the reason why i did this is that it's coupled with the push_to_hub method. I have done my part to cover as much ground as possible and this pull request is about the push_to_hub method so i will stop here.

since i don't know the repo that you want to test push to, i will leave that part to you to add them, this notebook will help you out when creating tests https://colab.research.google.com/drive/130IpVrScW8cNomEDY2Fa6-mA4_VrRmgT?usp=sharing

awaiting review

Rocketknight1

Conditionally approving this PR too, except for one TODO, and the comment that I think we probably need to refactor / reconsider our model of custom pipelines and how they should be saved/loaded. See this comment for more.

Rocketknight1 · 2024-02-29T17:05:39Z

src/transformers/pipelines/base.py

+        # TODO:
+        # depricate the safe_serialization parameter and use kwargs instead
+        # or update the save_pretrained to get all the parameters such as max_shard_size, ...


TODO not finished here!

since we are passing everything to the

+ kwargs["safe_serialization"] = safe_serialization + self.model.save_pretrained(save_directory, **kwargs)

as kwargs i felt we should switch to a kwargs annotation
also yeh you are right, there is no need for deprecation or any changes, it already works perfectly as is, should i remove that comment ?

Yep - we don't want to leave unnecessary TODOs in the codebase!

removed the extra comment ✅

Rocketknight1 · 2024-02-29T17:08:07Z

src/transformers/pipelines/base.py

+Pipeline.push_to_hub = copy_func(Pipeline.push_to_hub)
+if Pipeline.push_to_hub.__doc__ is not None:
+    Pipeline.push_to_hub.__doc__ = Pipeline.push_to_hub.__doc__.format(
+        object="pipe", object_class="pipeline", object_files="pipeline file"
+    ).replace(".from_pretrained", "")


This bit confuses me - since you're inheriting PushToHubMixin.push_to_hub, __doc__ should always be defined, right? I can see it's a copy of the same code for the other classes that inherit from PushToHubMixin, though, I'm just not sure why it's coded this way. Not a blocker, just a comment!

since other people used this method to copy the docs i chose to use the same one as them, to stay in the same page as them, just to avoid straying too much from the norm

Yep, that's totally fine! I was just pointing out my own confusion, I guess

true that, i think the reason for them using this annotation is that they only need to change one method (the original one) to change the docs for all of the other classes using it.
meaning one changes all, which is a really nice approach

HuggingFaceDocBuilderDev · 2024-02-29T17:34:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

not-lain · 2024-03-07T00:21:48Z

cc @Rocketknight1 @ArthurZucker
I added the new feature to the dynamic pipeline test thing
now it's finally ready for a review 🚀🤗

not-lain · 2024-03-11T22:11:51Z

cc @Rocketknight1 @ArthurZucker
any reviews on this one ?

not-lain · 2024-03-18T16:37:26Z

@Rocketknight1 friendly pinging you here.
Just wanted to say that the test that I added is working perfectly ✅

Rocketknight1 · 2024-03-20T16:56:46Z

Sorry for the delay! I still feel like we might need to refactor our model for what custom pipelines actually do, but in the meantime this seems okay to add.

cc @amyeroberts for core maintainer review - this is basically a PR that adds push_to_hub() to custom pipelines. They already have a save_pretrained() method, so this just pushes the result of that.

We had some internal discussions about this, and at some point we might need to tackle the question of custom pipelines properly, including properly separating them from models (right now they're kind of attached at the hip to the model in their repo). Still, I think this fix is useful in the short-term!

not-lain · 2024-03-29T03:54:02Z

@Rocketknight1

including properly separating them from models

I do agree with you on this point and I do understand where you're coming from but I don't think that this is relevant much to this pr, even if we do seperate the model from the pipeline we still need the push_to_hub method.

imo we should create a separate issue for that

amyeroberts

Thanks for adding this!

Just one small comment

amyeroberts · 2024-04-04T14:09:02Z

src/transformers/pipelines/base.py


        if self.modelcard is not None:
            self.modelcard.save_pretrained(save_directory)

+    @staticmethod


We should have a # Copied from comment here as it's the same as the implementation in configuration_utils.py

now that I am reading this I realize that this is an extra method I will remove it now since the def _set_token_in_kwargs is already defined in src\transformers\configuration_utils.py

well in src\transformers\modeling_utils.py in the def save_pretrained they didn't even import nor create a function to add the token to the kwargs.
IMO the _set_token_in_kwargs should be moved to the PushToHubMixin instead, this should help a lot to avoid changing every single class manually, for now I will settle in simply adding a comment since that enhancement is out of scope in this pr, do let me know if you approve of this, if so can you open another issue and tag me, I'll try to contribute to that

EDIT:
same goes for lots of other classes, I think we definitely should implement the DRY principle here and add the _set_token_in_kwargs to the PushToHubMixin instead especially since this is repetitive and we have a parameter that will be deprecated

_set_token_in_kwargs is only defined in the config class. In fact, looking at it - we shouldn't need it here at all. This is a work around to account for the fact some models' config classes have their own from_pretrained method - but this isn't the case for pipelines

I see, thanks for the clarification

…rmers into add-push-to-hub

not-lain · 2024-04-14T21:54:41Z

Hi @amyeroberts
Any reviews on this PR?

amyeroberts · 2024-04-15T08:40:31Z

@not-lain Per this conversation, the changes removing _set_token_in_kwargs should be done

…rmers into add-push-to-hub

amyeroberts

Thanks for iterating!

Make sure that the input arguments are consistent with the logic and docstrings

src/transformers/pipelines/base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts

Thanks for adding this feature!

not-lain · 2024-04-16T14:39:44Z

@amyeroberts @Rocketknight1 Thanks a lot guys ✨✨

* add `push_to_hub` to pipeline * fix docs * format with ruff * update save_pretrained * update save_pretrained * remove unnecessary comment * switch to push_to_hub method in DynamicPipelineTester * remove unused imports * update docs for add_new_pipeline * fix docs for add_new_pipeline * add comment * fix italien docs * changes to token retrieval for pipelines * Update src/transformers/pipelines/base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

add push_to_hub to pipeline

61d8277

not-lain mentioned this pull request Feb 21, 2024

Fix custom architectures #28983

Closed

Merge branch 'huggingface:main' into add-push-to-hub

8396070

not-lain mentioned this pull request Feb 22, 2024

Add push_to_hub( ) to pipeline #28870

Closed

5 tasks

not-lain added 3 commits February 22, 2024 11:51

fix docs

910581d

Merge branch 'add-push-to-hub' of https://github.com/not-lain/transfo…

0329b39

…rmers into add-push-to-hub

format with ruff

4be0c35

Rocketknight1 mentioned this pull request Feb 22, 2024

wrongly annotated configuration when saving a model that has a custom pipeline #28907

Closed

4 tasks

not-lain and others added 4 commits February 27, 2024 15:36

update save_pretrained

8fc4ac5

update save_pretrained

25bdabe

Merge branch 'huggingface:main' into test-parameter-update

c025667

Merge pull request #2 from not-lain/test-parameter-update

d513918

update save_pretrained

ArthurZucker requested a review from Rocketknight1 February 29, 2024 10:42

Merge branch 'huggingface:main' into add-push-to-hub

ede0d55

Rocketknight1 approved these changes Feb 29, 2024

View reviewed changes

not-lain added 3 commits March 1, 2024 15:38

remove unnecessary comment

547edb6

switch to push_to_hub method in DynamicPipelineTester

e69e241

remove unused imports

6b108db

not-lain mentioned this pull request Mar 7, 2024

fix for custom pipeline configuration #29004

Merged

5 tasks

not-lain added 2 commits March 29, 2024 04:18

update docs for add_new_pipeline

e777027

fix docs for add_new_pipeline

b465ac5

Merge remote-tracking branch 'origin/main' into add-push-to-hub

e51ee44

Merge branch 'huggingface:main' into add-push-to-hub

226e1a5

amyeroberts approved these changes Apr 4, 2024

View reviewed changes

not-lain added 2 commits April 4, 2024 15:51

add comment

8b524fc

Merge branch 'add-push-to-hub' of https://github.com/not-lain/transfo…

1210fdd

…rmers into add-push-to-hub

amyeroberts self-requested a review April 5, 2024 11:48

not-lain and others added 2 commits April 5, 2024 13:06

fix italien docs

4872b57

Merge branch 'huggingface:main' into add-push-to-hub

a3db605

not-lain added 2 commits April 15, 2024 15:31

changes to token retrieval for pipelines

5912999

Merge branch 'add-push-to-hub' of https://github.com/not-lain/transfo…

a87d6fd

…rmers into add-push-to-hub

amyeroberts reviewed Apr 15, 2024

View reviewed changes

src/transformers/pipelines/base.py Outdated Show resolved Hide resolved

Update src/transformers/pipelines/base.py

33cc7f5

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts approved these changes Apr 16, 2024

View reviewed changes

amyeroberts merged commit 0eaef0c into huggingface:main Apr 16, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `push_to_hub` to pipeline #29172

add `push_to_hub` to pipeline #29172

not-lain commented Feb 21, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 22, 2024

not-lain commented Feb 23, 2024

not-lain commented Feb 28, 2024 •

edited

Loading

Rocketknight1 left a comment

Rocketknight1 Feb 29, 2024

not-lain Feb 29, 2024

Rocketknight1 Mar 1, 2024

not-lain Mar 1, 2024

Rocketknight1 Feb 29, 2024

not-lain Feb 29, 2024

Rocketknight1 Feb 29, 2024

not-lain Feb 29, 2024

HuggingFaceDocBuilderDev commented Feb 29, 2024

not-lain commented Mar 7, 2024

not-lain commented Mar 11, 2024

not-lain commented Mar 18, 2024

Rocketknight1 commented Mar 20, 2024

not-lain commented Mar 29, 2024

amyeroberts left a comment

amyeroberts Apr 4, 2024

not-lain Apr 4, 2024

not-lain Apr 4, 2024 •

edited

Loading

amyeroberts Apr 5, 2024

not-lain Apr 5, 2024

not-lain commented Apr 14, 2024

amyeroberts commented Apr 15, 2024

amyeroberts left a comment

amyeroberts left a comment

not-lain commented Apr 16, 2024

add push_to_hub to pipeline #29172

add push_to_hub to pipeline #29172

Conversation

not-lain commented Feb 21, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

TODO:

not-lain commented Feb 22, 2024 • edited Loading

not-lain commented Feb 22, 2024

not-lain commented Feb 23, 2024

not-lain commented Feb 28, 2024 • edited Loading

Rocketknight1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 29, 2024

not-lain commented Mar 7, 2024

not-lain commented Mar 11, 2024

not-lain commented Mar 18, 2024

Rocketknight1 commented Mar 20, 2024

not-lain commented Mar 29, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

not-lain Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

not-lain commented Apr 14, 2024

amyeroberts commented Apr 15, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

not-lain commented Apr 16, 2024

add `push_to_hub` to pipeline #29172

add `push_to_hub` to pipeline #29172

not-lain commented Feb 21, 2024 •

edited

Loading

not-lain commented Feb 22, 2024 •

edited

Loading

not-lain commented Feb 28, 2024 •

edited

Loading

not-lain Apr 4, 2024 •

edited

Loading