You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have compatibility issues either with transformers or pytorch I think.
I Installed it with pip
from nmtscore import NMTScorer
scorer = NMTScorer()
scorer.score("This is a sentence.", "This is another sentence.")
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'M2M100Tokenizer'.
The class this function is called from is 'SMALL100Tokenizer'.
Traceback (most recent call last):
File "/home/retd/ia/traductions/nmtscore_test.py", line 3, in
scorer = NMTScorer()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/scorer.py", line 21, in init
self.model = load_translation_model(model, **model_kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/init.py", line 231, in load_translation_model
translation_model = SMALL100Model(**kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/small100.py", line 15, in init
super().init(model_name_or_path, device)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/m2m100.py", line 30, in init
self.tokenizer = self._load_tokenizer()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/small100.py", line 23, in _load_tokenizer
return SMALL100Tokenizer.from_pretrained(self.model_name_or_path)
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 148, in init
super().init(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 270, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 183, in vocab_size
return len(self.encoder) + len(self.lang_token_to_id) + self.num_madeup_words
AttributeError: 'SMALL100Tokenizer' object has no attribute 'encoder'. Did you mean: 'encode'?
The text was updated successfully, but these errors were encountered:
A workaround is to use an older version of transformers:
The issue seems to arise due to the transformers library update for version 4.34, which heavily influences the tokenizer workflow. Hence, tokenization_small100.py only functions with transformer < 4.34 at the moment.
Hi !
I have compatibility issues either with transformers or pytorch I think.
I Installed it with pip
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'M2M100Tokenizer'.
The class this function is called from is 'SMALL100Tokenizer'.
Traceback (most recent call last):
File "/home/retd/ia/traductions/nmtscore_test.py", line 3, in
scorer = NMTScorer()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/scorer.py", line 21, in init
self.model = load_translation_model(model, **model_kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/init.py", line 231, in load_translation_model
translation_model = SMALL100Model(**kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/small100.py", line 15, in init
super().init(model_name_or_path, device)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/m2m100.py", line 30, in init
self.tokenizer = self._load_tokenizer()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/small100.py", line 23, in _load_tokenizer
return SMALL100Tokenizer.from_pretrained(self.model_name_or_path)
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 148, in init
super().init(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/retd/scoring/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 270, in get_vocab
vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
File "/home/retd/scoring/lib/python3.10/site-packages/nmtscore/models/tokenization_small100.py", line 183, in vocab_size
return len(self.encoder) + len(self.lang_token_to_id) + self.num_madeup_words
AttributeError: 'SMALL100Tokenizer' object has no attribute 'encoder'. Did you mean: 'encode'?
The text was updated successfully, but these errors were encountered: