[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

gradient-school · 2019-12-02T13:04:31Z

🐛 Bug

Model I am using (Bert, XLNet....): ALBERT

Language I am using the model on (English, Chinese....): English

The problem arise when using:

the official example scripts: (give details)
my own modified scripts: (give details)

import tensorflow as tf
from transformers import *
#Download AlbertMaskedLM model
model = TFAlbertForMaskedLM.from_pretrained('albert-large-v2')

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details) Initial validation

To Reproduce

Steps to reproduce the behavior:
import tensorflow as tf
from transformers import *
#Download AlbertMaskedLM model
model = TFAlbertForMaskedLM.from_pretrained('albert-large-v2')
1.
2.
3.

The code throws an error as follows :

100%|██████████| 484/484 [00:00<00:00, 271069.99B/s]
100%|██████████| 87059544/87059544 [00:03<00:00, 28448930.07B/s]

ValueError Traceback (most recent call last)
in ()
----> 1 model = TFAlbertForMaskedLM.from_pretrained('albert-large-v2')

3 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
287 # 'by_name' allow us to do transfer learning by skipping/adding layers
288 # see https://github.com/tensorflow/tensorflow/blob/00fad90125b18b80fe054de1055770cfb8fe4ba3/tensorflow/python/keras/engine/network.py#L1339-L1357
--> 289 model.load_weights(resolved_archive_file, by_name=True)
290
291 ret = model(model.dummy_inputs, training=False) # Make sure restore ops are run

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in load_weights(self, filepath, by_name)
179 raise ValueError('Load weights is not yet supported with TPUStrategy '
180 'with steps_per_run greater than 1.')
--> 181 return super(Model, self).load_weights(filepath, by_name)
182
183 @trackable.no_automatic_dependency_tracking

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py in load_weights(self, filepath, by_name)
1173 f = f['model_weights']
1174 if by_name:
-> 1175 saving.load_weights_from_hdf5_group_by_name(f, self.layers)
1176 else:
1177 saving.load_weights_from_hdf5_group(f, self.layers)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py in load_weights_from_hdf5_group_by_name(f, layers)
749 '") expects ' + str(len(symbolic_weights)) +
750 ' weight(s), but the saved weights' + ' have ' +
--> 751 str(len(weight_values)) + ' element(s).')
752 # Set values.
753 for i in range(len(weight_values)):

ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s).

Expected behavior

TFAlbertMaskedLM model can not be loaded from pre-trained

Environment

OS: Linux (Colab)
Python version: 3.6
PyTorch version: Tensorflow 2.0
PyTorch Transformers version (or branch):
Using GPU ? Yes
Distributed of parallel setup ?
Any other relevant information:

Additional context

thomwolf · 2019-12-05T11:45:52Z

cc @LysandreJik

LysandreJik · 2019-12-05T22:56:45Z

It should be fixed now, thanks for raising an issue.

gradient-school · 2019-12-09T07:14:37Z

Thanks @LysandreJik for your prompt response. The issue mentioned above is resolved but I am getting an error in converting predicted IDs back to token using AlbertTokenizer. Here is the error that I am seeing (pred_index value below is 29324). Please advise or let me know if I should open another issue as original issue has been resolved.

TypeError Traceback (most recent call last)
in ()
----> 1 pred_token = tokenizer.convert_ids_to_tokens([pred_index])[0]
2 print('Predicted token:', pred_token)

2 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in convert_ids_to_tokens(self, ids, skip_special_tokens)
1034 tokens.append(self.added_tokens_decoder[index])
1035 else:
-> 1036 tokens.append(self._convert_id_to_token(index))
1037 return tokens
1038

/usr/local/lib/python3.6/dist-packages/transformers/tokenization_albert.py in _convert_id_to_token(self, index, return_unicode)
172 def _convert_id_to_token(self, index, return_unicode=True):
173 """Converts an index (integer) in a token (string/unicode) using the vocab."""
--> 174 token = self.sp_model.IdToPiece(index)
175 if six.PY2 and return_unicode and isinstance(token, str):
176 token = token.decode('utf-8')

/usr/local/lib/python3.6/dist-packages/sentencepiece.py in IdToPiece(self, id)
185
186 def IdToPiece(self, id):
--> 187 return _sentencepiece.SentencePieceProcessor_IdToPiece(self, id)
188
189 def GetScore(self, id):

TypeError: in method 'SentencePieceProcessor_IdToPiece', argument 2 of type 'int'

LysandreJik · 2019-12-09T16:03:31Z

Hmm, I have no issues running this code snippet:

from transformers import AlbertTokenizer

tokenizer = AlbertTokenizer.from_pretrained("albert-large-v2")

print(tokenizer.convert_ids_to_tokens(29324))
# or
print(tokenizer.convert_ids_to_tokens([29324]))

Is there a way you could give us a short code sample that reproduces the problem, so that we may debug what's happening? Thank you.

gradient-school · 2019-12-10T06:38:05Z

@LysandreJik thanks for your response. I figured out the issue. Below is the code which reproduces the issue. In the below code, 'pred_index' comes out as numpy.int64 and when placed in 'convert_ids_to_tokens' method, it throws the error mentioned above. If I convert it to an int then it works fine.

Here is the example code to reproduce the issue

Encode a text inputs

text = "What is the fastest car in the world."
tokenized_text = tokenizer.tokenize(text)

#Get tokenizer
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')

#Lets mask 'world' and check if model can predict it
tokenized_text[7] = '[MASK]'

#Convert tokenized text to indexes
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

#Download AlbertMaskedLM model
model = TFAlbertForMaskedLM.from_pretrained('albert-base-v2')

#Prediction
inputs = tf.constant(indexed_tokens)[None,:]
outputs = model(inputs)

#Lets check the prediction at index 7 (in place of [MASK])
pred_index = tf.argmax(outputs[0][0,7]).numpy()
pred_token = tokenizer.convert_ids_to_tokens([pred_index])[0]
print('Predicted token:', pred_token)

gradient-school · 2019-12-10T07:40:57Z

Please note that above code works as is for BERT (but throws an error for Albert).

LysandreJik · 2019-12-10T14:35:27Z

This is probably the exact same problem than #945

If I understand correctly SentencePiece doesn't like numpy integers and crashes. Should we cast it to an int @thomwolf?

thomwolf · 2019-12-14T09:57:12Z

Yes I think so. We can probably just add a int(idx) in the base tokenizer class PretrainedTokenizer before the call to _convert_id_to_tokens so we can even input tensors in addition to np arrays.

stale · 2020-02-12T10:21:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thomwolf assigned LysandreJik Dec 5, 2019

stale bot added the wontfix label Feb 12, 2020

stale bot closed this as completed Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

gradient-school commented Dec 2, 2019

thomwolf commented Dec 5, 2019

LysandreJik commented Dec 5, 2019

gradient-school commented Dec 9, 2019

LysandreJik commented Dec 9, 2019

gradient-school commented Dec 10, 2019

gradient-school commented Dec 10, 2019

LysandreJik commented Dec 10, 2019

thomwolf commented Dec 14, 2019

stale bot commented Feb 12, 2020

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

[ALBERT] : ValueError: Layer #1 (named "predictions") expects 11 weight(s), but the saved weights have 10 element(s). #2024

Comments

gradient-school commented Dec 2, 2019

🐛 Bug

To Reproduce

100%|██████████| 484/484 [00:00<00:00, 271069.99B/s] 100%|██████████| 87059544/87059544 [00:03<00:00, 28448930.07B/s]

Expected behavior

Environment

Additional context

thomwolf commented Dec 5, 2019

LysandreJik commented Dec 5, 2019

gradient-school commented Dec 9, 2019

LysandreJik commented Dec 9, 2019

gradient-school commented Dec 10, 2019

Encode a text inputs

gradient-school commented Dec 10, 2019

LysandreJik commented Dec 10, 2019

thomwolf commented Dec 14, 2019

stale bot commented Feb 12, 2020

100%|██████████| 484/484 [00:00<00:00, 271069.99B/s]
100%|██████████| 87059544/87059544 [00:03<00:00, 28448930.07B/s]