Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set pad_token in run_glue_no_trainer.py #28534 #30157

Closed
wants to merge 5 commits into from

Conversation

JINO-ROHIT
Copy link
Contributor

What does this PR do?

This PR adds the pad tokenizer in the run_glue_no_trainer.py script under examples.

Fixes #28534

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts @ArthurZucker

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding - just a small suggestion

@JINO-ROHIT
Copy link
Contributor Author

done! @amyeroberts

@amyeroberts amyeroberts changed the title adding pad token to fix #28534 Set pad_token in run_glue_no_trainer.py #28534 Apr 10, 2024
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating!

For the failing tests, running make fixup and pushing the changes should resolve

@JINO-ROHIT
Copy link
Contributor Author

JINO-ROHIT commented Apr 10, 2024

Exception: Found the following copy inconsistencies:

  • tests/models\roc_bert\test_tokenization_roc_bert.py: copy does not match models.bert.test_tokenization_bert.BertTokenizationTest.test_is_whitespace at line 167
    Run make fix-copies or python utils/check_copies.py --fix_and_overwrite to fix them.
    make: *** [Makefile:38: repo-consistency] Error 1

I get this error but i tried running both commands, still gives the same error, how to resolve this? @amyeroberts

@amyeroberts
Copy link
Collaborator

There's something funny going on with the formatting here - files like src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py shouldn't be modified when calling make fixup.
and you shouldn't need to run make fix-copies for this PR.

In your environment, make sure you have all of the necessary formatting libraries with pip install -e .[testing]. Once that's done, I'd undo the formatting changes to the unrelated files and then try running make fix up again

@JINO-ROHIT
Copy link
Contributor Author

@amyeroberts ive recreated my env, can you help me out with undoing the formatting files?

@JINO-ROHIT JINO-ROHIT closed this Apr 13, 2024
@JINO-ROHIT JINO-ROHIT deleted the fix-28534 branch April 13, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run_glue_no_trainer.py script crashes on Mistral model due to tokenizer issue
2 participants