Using FP16 on BartModel #3249

AOZMH · 2020-03-12T15:10:15Z

🐛 Bug

Information

Model I am using (Bert, XLNet ...): BART

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: CNN/DM
my own task or dataset: (give details below)

To reproduce

I've installed the master branch of transformers but I still encountered the same issue as #3117 when using FP16 BartModel. I just initialized the model without loading the pretarined weights, but I guess the model should still be able to correctly forward the input LongTensor(batch, seq_length). The code is shown below, simply initialize a model and forward an input:

model = BartModel(BartConfig())
model = model.cuda().half()
cur_inputs = torch.zeros(4,16,dtype=torch.long).cuda()
cur_res = model(cur_inputs)

The error is:

~\Anaconda3\envs\pytorch\lib\site-packages\transformers\modeling_bart.py in forward(self, query, key, value, key_padding_mask, layer_state, need_weights, static_kv, attn_mask)
assert v is not None
--> attn_output = torch.bmm(attn_probs, v)
assert attn_output.size() == (bsz * self.num_heads, tgt_len, self.head_dim)
attn_output = attn_output.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
RuntimeError: Expected object of scalar type Float but got scalar type Half for argument #2 'mat2' in call to _th_bmm

@sshleifer The model is quite novel to me, so am I using it incorrectly or there's still a bug in BertModel class? Thanks in advance for the help!

Environment info

transformers version: master branch
Platform: Windows
Python version: 3.7.0
PyTorch version (GPU?): 1.4.0
Tensorflow version (GPU?): /
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

The text was updated successfully, but these errors were encountered:

AOZMH · 2020-03-13T06:33:20Z

@sshleifer May I ask could you reproduce the error in your machine? I ran the same code on a Linux machine with master-branch of transformers, but still got the same error. I'm planning to use BartModel these days so please notify me at your earliest convenience if there're any updates. Many thanks!

sshleifer · 2020-03-13T15:45:35Z

Yes, will try to fix it today! Thanks for reporting!

AOZMH · 2020-03-14T15:36:54Z

Yes, will try to fix it today! Thanks for reporting!

Thanks Sam,

The code works well this time! Thanks again for the contribution.

sshleifer self-assigned this Mar 12, 2020

sshleifer linked a pull request Mar 13, 2020 that will close this issue

[BART] FP16 testing fixes #3266

Merged

sshleifer mentioned this issue Mar 13, 2020

[BART] FP16 testing fixes #3266

Merged

sshleifer closed this as completed in #3266 Mar 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using FP16 on BartModel #3249

Using FP16 on BartModel #3249

AOZMH commented Mar 12, 2020 •

edited

Loading

AOZMH commented Mar 13, 2020

sshleifer commented Mar 13, 2020

AOZMH commented Mar 14, 2020

Using FP16 on BartModel #3249

Using FP16 on BartModel #3249

Comments

AOZMH commented Mar 12, 2020 • edited Loading

🐛 Bug

Information

To reproduce

Environment info

AOZMH commented Mar 13, 2020

sshleifer commented Mar 13, 2020

AOZMH commented Mar 14, 2020

AOZMH commented Mar 12, 2020 •

edited

Loading