Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeBERTa v2 throws "TypeError: stat: path should be string...", v1 not #10097

Closed
1 of 4 tasks
205g0 opened this issue Feb 9, 2021 · 8 comments
Closed
1 of 4 tasks

DeBERTa v2 throws "TypeError: stat: path should be string...", v1 not #10097

205g0 opened this issue Feb 9, 2021 · 8 comments

Comments

@205g0
Copy link

205g0 commented Feb 9, 2021

Environment info

  • transformers version: 4.3.1
  • Platform: Linux-5.4.0-54-generic-x86_64-with-glibc2.29
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.7.1+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: false
  • Using distributed or parallel set-up in script?: false

Who can help

@BigBird01 @patil-suraj

Information

Model I am using (DeBERTa v2):

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Create this file:
from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-xlarge-v2')
model = AutoModel.from_pretrained('microsoft/deberta-xlarge-v2')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state
print(outputs)
  1. Run the file

  2. You'll get:

(venv) root@16gb:~/deberta# python3 index.py
Traceback (most recent call last):
  File "index.py", line 4, in <module>
    tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-xlarge-v2')
  File "/root/deberta/venv/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 398, in from_pretrained
    return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/root/deberta/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/root/deberta/venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1860, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/root/deberta/venv/lib/python3.8/site-packages/transformers/models/deberta/tokenization_deberta.py", line 542, in __init__
    if not os.path.isfile(vocab_file):
  File "/usr/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I tried this with the DeBERTa v1 models and there was no error. I've the same behavior when using DebertaTokenizer, DebertaModel

Expected behavior

No error.

@patil-suraj
Copy link
Contributor

Hi @205g0

Thank you for reporting this!

microsoft/deberta-xlarge-v2 uses sentencepiece vocab and it's not implemented for deberta, which is the reason for this error.

@205g0
Copy link
Author

205g0 commented Feb 9, 2021

Hey Suraj, thanks for the quick response and good to know!

@patil-suraj
Copy link
Contributor

@BigBird01 do you think you could add the missing tokenizer, otherwise, I could add it. Thanks!

@LysandreJik
Copy link
Member

DeBERTa-v2 is not available in the library yet. We're working towards it with @BigBird01.

@BigBird01
Copy link
Contributor

Thanks @205g0 for the interest in DeBERTa-v2. We are working on it with @LysandreJik, hopefully, it will be available soon. You can check our PR for the progress.

@patil-suraj
Copy link
Contributor

Oh sorry, @BigBird01, I did not realize that this was a work in progress

@BigBird01
Copy link
Contributor

BigBird01 commented Feb 9, 2021

Oh sorry, @BigBird01, I did not realize that this was a work in progress

No worry, @patil-suraj. Thanks for your quick response. We are glad to integrate these SOTA NLU models with HF to benefit the community:)

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants