Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HerbertTokenizer doesn't work on version 3.5.1 #10790

Closed
1 of 4 tasks
Zhylkaaa opened this issue Mar 18, 2021 · 3 comments
Closed
1 of 4 tasks

HerbertTokenizer doesn't work on version 3.5.1 #10790

Zhylkaaa opened this issue Mar 18, 2021 · 3 comments

Comments

@Zhylkaaa
Copy link
Contributor

Zhylkaaa commented Mar 18, 2021

Environment info

  • transformers version: 3.5.1
  • Platform: MacOS X, Linux
  • Python version: 3.7
  • PyTorch version (GPU?): 1.6.0

Who can help

Information

Model I am using (Bert, XLNet ...): allegro/herbert-base-cased

I tried to use official script on model hub page with transformers version 3.5.1. Week ago it worked just fine, but now I am getting error listed below.
@rmroczkowski maybe you have some information on this topic, I saw some new commits on model hub, but they shouldn't change anything

For latest version it works fine with AutoTokenizers (EDIT: only version 4.4 works, I tasted version 3.5.1, 4.0.0, 4.3 and got same error)

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)
    I tried importing AutoTokenizers and HerbertTokenizer, but got the same error
    `OSError: Can't load tokenizer for 'allegro/herbert-base-cased'. Make sure that:
  • 'allegro/herbert-base-cased' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'allegro/herbert-base-cased' is the correct path to a directory containing relevant tokenizer files`

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. install transformers 3.5.1
  2. try to use official script from https://huggingface.co/allegro/herbert-base-case

Expected behavior

tokenizer loads and works

@Zhylkaaa
Copy link
Contributor Author

I guess this is related to URL issue #10744 ? and one should change model URL

@Zhylkaaa
Copy link
Contributor Author

I resolved this by updating URL's to models, this is my current code:

    "vocab_file": {"allegro/herbert-base-cased": "https://huggingface.co/allegro/herbert-base-cased/resolve/main/vocab.json"},
    "merges_file": {"allegro/herbert-base-cased": "https://huggingface.co/allegro/herbert-base-cased/resolve/main/merges.txt"},
}``` 
Is there a way to fix this to maintain backward compatibility? @LysandreJik

@julien-c
Copy link
Member

Cross-posting the Forum thread: https://discuss.huggingface.co/t/delete-organizations-models-from-the-hub/954/40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants