Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipelines] Add revision tag to all default pipelines #17667

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Jun 10, 2022

What does this PR do?

Fixes #17666

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 10, 2022

The documentation is not available anymore as the PR was closed or merged.

@patrickvonplaten patrickvonplaten changed the title trigger test failure [Pipelines] Add revision tag to all default pipelines Jun 10, 2022
@patrickvonplaten
Copy link
Contributor Author

@Narsil would this be ok for you? see: #17666

I can add a simple slow test that iterates over the default pipelines to make sure that the pipeline is correctly loaded

Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/transformers/pipelines/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, LGTM!

"type": "text",
},
"zero-shot-classification": {
"impl": ZeroShotClassificationPipeline,
"tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
"pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
"default": {
"model": {"pt": "facebook/bart-large-mnli", "tf": "roberta-large-mnli"},
"config": {"pt": "facebook/bart-large-mnli", "tf": "roberta-large-mnli"},
"tokenizer": {"pt": "facebook/bart-large-mnli", "tf": "roberta-large-mnli"},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Narsil before merging this PR, I'd love to have your opinion here. I've checked a the pipeline function and I don't think a "default" tokenizer is never used. If no repo_id is provided it seems like for tokenizer or feature extractor it's always the model id that is used and never the tokenizer id => to me it looks like this is dead code here. Can you confirm?


@slow
@require_torch
def test_load_default_pipelines_pt(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test here that the default model that is loaded by calling pipeline(<task_name>) indeed loads the corresponding model. Weights are compared to be sure it's actually exactly the same model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is run for all pipelines and should in general serve as a good test to make sure all default pipelines work as expected

@slow
@require_tf
@require_tensorflow_probability
def test_load_default_pipelines_tf_table_qa(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split the test here into table_qa and no table_qa because of the scatter and tensorflow_prop dependencies which are quite annoying and in case one is not installed I still want to run all the other tasks

@@ -187,9 +190,8 @@
"tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (),
"default": {
"model": {
"pt": "google/tapas-base-finetuned-wtq",
"tokenizer": "google/tapas-base-finetuned-wtq",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines -204 to -205
"tokenizer": "dandelin/vilt-b32-finetuned-vqa",
"feature_extractor": "dandelin/vilt-b32-finetuned-vqa",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvonplaten
Copy link
Contributor Author

PR is good to go for me.

@Narsil could you please take a look at this comment: #17667 (comment) before merging - I think there is some dead code. Default tokenizer never seem to be called.

Also cc @sgugger PR should be ready otherwise

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

src/transformers/pipelines/__init__.py Outdated Show resolved Hide resolved
@@ -607,3 +617,125 @@ def add(number, extra=0):

outputs = [item for item in dataset]
self.assertEqual(outputs, [[{"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}]])

def check_models_equal_pt(self, model1, model2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are adding all of this in a Tester that is under require_pt. I think you need to make a new Tester class since there are TensorFlow tests too.

Copy link
Contributor Author

@patrickvonplaten patrickvonplaten Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - thanks!

patrickvonplaten and others added 2 commits June 30, 2022 15:08
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@patrickvonplaten
Copy link
Contributor Author

@Narsil approved offline (tokenizer_default code is dead indeed) => merging!

@patrickvonplaten patrickvonplaten merged commit e4d2588 into huggingface:main Jun 30, 2022
@patrickvonplaten patrickvonplaten deleted the revision_tags_for_default_pipeline branch June 30, 2022 14:37
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
)

* trigger test failure

* upload revision poc

* Update src/transformers/pipelines/base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* up

* add test

* correct some stuff

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct require flag

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Default pipelines] Add a revision tag to all pipelines
6 participants