Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XTransformer Bug: cuda failed #42

Closed
wwangwitsel opened this issue Jul 30, 2021 · 2 comments
Closed

XTransformer Bug: cuda failed #42

wwangwitsel opened this issue Jul 30, 2021 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@wwangwitsel
Copy link

Description

There exist some bugs when I run the XTransformer model. I run the training command as instrcucted in https://github.com/amzn/pecos/blob/mainline/pecos/xmc/xtransformer/README.md . However, I meet the bug: "RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select". I use the disable-gpu command and the code can be run. So I wonder if there exist some bugs in the gpu utils code of XTransformer. Thanks!

How to Reproduce?

Steps to reproduce

python3 -m pecos.xmc.xtransformer.train --trn-text-path ${X_txt_path} \
                                            --trn-feat-path ${X_path}  \
                                            --trn-label-path ${Y_path} \
                                            --model-dir ${model_dir}

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

Error message or code output

 File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/model.py", line 375, in train
    return_dict=True,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/matcher.py", line 1333, in train
    matcher.fine_tune_encoder(prob, val_prob=val_prob, val_csr_codes=val_csr_codes)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/matcher.py", line 1079, in fine_tune_encoder
    label_embedding=(text_model_W_seq, text_model_b_seq),
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/pecos/xmc/xtransformer/network.py", line 234, in forward
    inputs_embeds=inputs_embeds,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 989, in forward
    past_key_values_length=past_key_values_length,
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 215, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 1724, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

Environment

  • Operating system: ubuntu
  • Python version: 3.6
  • Pytorch version: 1.5.1
@wwangwitsel wwangwitsel added the bug Something isn't working label Jul 30, 2021
@jiong-zhang jiong-zhang self-assigned this Jul 30, 2021
@jiong-zhang
Copy link
Contributor

Hi @wwangwitsel , the bug exists in the libpecos-0.1.0 version for single GPU training and it's fixed in this PR. The fix will be included in the next release. Meanwhile, you can checkout the latest code and use Installation from Source to avoid that.

@jiong-zhang
Copy link
Contributor

Closing this issue after 5 days. You can reopen it if there are further questions. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants