Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing users to use the latest tokenizers release ! #19139

Merged
merged 2 commits into from
Sep 21, 2022

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Sep 21, 2022

What does this PR do?

  • Allow users to use the most recent tokenizers version.
  • Should be 100% backward compatible, but there were quite large changes to the actual codebase.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger
Copy link
Collaborator

sgugger commented Sep 21, 2022

You need to run make style when changing the setup.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 21, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@SaulLu SaulLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look good to me! Thank you 🎊

I guess that the first user of this feature will be MarkupLM 😄

@sgugger
Copy link
Collaborator

sgugger commented Sep 21, 2022

Failure seem unrelated. Yet rebasing on main to remove the failure linked to the Datasets release and re-laucnhing spurious tests would be helpful to ease everyone's mind :-)

@Narsil
Copy link
Contributor Author

Narsil commented Sep 21, 2022

@sgugger Should I wait for a second core maintainer's opinion on this ?

@sgugger
Copy link
Collaborator

sgugger commented Sep 21, 2022

Nope, you can go ahead and merge :-)

@Narsil Narsil merged commit d5848a5 into huggingface:main Sep 21, 2022
@Narsil Narsil deleted the upgrade_tokenizers2 branch September 21, 2022 15:46
@Smaug123
Copy link

I couldn't immediately find the release process for this repository - when will this make it into a release? tokenizers (for versions earlier than 0.13.0) had no wheel available for Apple silicon, so I believe until this PR is released we're stuck with source builds for that dependency.

@sgugger
Copy link
Collaborator

sgugger commented Sep 26, 2022

The next release of Transformers will be in a month roughly. In the meantime, you can install it from source.

oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
…19139)

* Allowing users to use the latest `tokenizers` release !

* Upgrading the versions table too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants