XTransformer Bug: cuda failed #42

wwangwitsel · 2021-07-30T04:38:11Z

Description

There exist some bugs when I run the XTransformer model. I run the training command as instrcucted in https://github.com/amzn/pecos/blob/mainline/pecos/xmc/xtransformer/README.md . However, I meet the bug: "RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select". I use the disable-gpu command and the code can be run. So I wonder if there exist some bugs in the gpu utils code of XTransformer. Thanks!

How to Reproduce?

Steps to reproduce

python3 -m pecos.xmc.xtransformer.train --trn-text-path ${X_txt_path} \
                                            --trn-feat-path ${X_path}  \
                                            --trn-label-path ${Y_path} \
                                            --model-dir ${model_dir}

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Error message or code output

 File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/model.py", line 375, in train
    return_dict=True,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/matcher.py", line 1333, in train
    matcher.fine_tune_encoder(prob, val_prob=val_prob, val_csr_codes=val_csr_codes)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/matcher.py", line 1079, in fine_tune_encoder
    label_embedding=(text_model_W_seq, text_model_b_seq),
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/network.py", line 234, in forward
    inputs_embeds=inputs_embeds,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 989, in forward
    past_key_values_length=past_key_values_length,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 215, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 1724, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

Environment

Operating system: ubuntu
Python version: 3.6
Pytorch version: 1.5.1

The text was updated successfully, but these errors were encountered:

jiong-zhang · 2021-07-30T15:12:57Z

Hi @wwangwitsel , the bug exists in the libpecos-0.1.0 version for single GPU training and it's fixed in this PR. The fix will be included in the next release. Meanwhile, you can checkout the latest code and use Installation from Source to avoid that.

jiong-zhang · 2021-08-04T20:36:45Z

Closing this issue after 5 days. You can reopen it if there are further questions. Thanks.

wwangwitsel added the bug Something isn't working label Jul 30, 2021

jiong-zhang self-assigned this Jul 30, 2021

jiong-zhang closed this as completed Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XTransformer Bug: cuda failed #42

XTransformer Bug: cuda failed #42

wwangwitsel commented Jul 30, 2021

jiong-zhang commented Jul 30, 2021

jiong-zhang commented Aug 4, 2021

XTransformer Bug: cuda failed #42

XTransformer Bug: cuda failed #42

Comments

wwangwitsel commented Jul 30, 2021

Description

How to Reproduce?

Steps to reproduce

What have you tried to solve it?

Error message or code output

Environment

jiong-zhang commented Jul 30, 2021

jiong-zhang commented Aug 4, 2021