Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error during CharacterEmbeddings #421

Closed
lisette-garciamoya opened this issue Jan 24, 2019 · 9 comments · Fixed by #434
Closed

CUDA Error during CharacterEmbeddings #421

lisette-garciamoya opened this issue Jan 24, 2019 · 9 comments · Fixed by #434
Labels
bug Something isn't working

Comments

@lisette-garciamoya
Copy link

My code

from flair.embeddings import CharacterEmbeddings

sentence = Sentence('La casa es muy bonita.', use_tokenizer=True)

embedding = CharacterEmbeddings()
embedding.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

Errors
error

Environment:

  • OS: Ubuntu 18.04.1
  • Version: code from master branch
  • Nvidia:
    nvidia
@lisette-garciamoya lisette-garciamoya added the bug Something isn't working label Jan 24, 2019
@stefan-it
Copy link
Member

stefan-it commented Jan 24, 2019

Thanks for reporting :) I think this could be fixed by:

Current code:

# chars for rnn processing
chars = torch.LongTensor(tokens_mask)
chars = chars.to(flair.device)

character_embeddings = self.char_embedding(chars).transpose(0, 1)

Fix:

# chars for rnn processing
chars = torch.LongTensor(tokens_mask)
chars = chars.to(flair.device)
chars = chars.detach().cpu()   # <-- added

character_embeddings = self.char_embedding(chars).transpose(0, 1)

@lisette-garciamoya
Copy link
Author

Thank you. It solves the problem.

@alanakbik
Copy link
Collaborator

Could you also try this fix?

# chars for rnn processing
chars = torch.LongTensor(tokens_mask)
chars = chars.cpu()   # <-- added (don't detach!)

character_embeddings = self.char_embedding(chars).transpose(0, 1)

I think that if you detach the vector, the gradients cannot flow into the character model during training and features are not trained. I.e. the character features would stay random.

If you do not detach with the code above, training will be slower but this way the character features are always trained on the downstream task like proposed by Lample et al., 2016.

@lisette-garciamoya
Copy link
Author

It seemed to be fixed but now I get the same error while executing the real program.

(I tested it with .detach() and without .detach() -> same error)

@JieyuZhao
Copy link

I think same error occurred when I tried the BERT tutorial example.
It is also about "RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'".

@alanakbik
Copy link
Collaborator

Hello @lisette-garciamoya what do you mean by executing the real program?

@alanakbik
Copy link
Collaborator

Hi @lisette-garciamoya - I was able to understand where the error is coming from. In fact, the original code of the CharacterEmbeddings class is correct.

However, when you instantiate the CharacterEmbeddings, by default it is only instantiated on CPU. The ModelTrainer then puts it on GPU which is why the training works. But if you instantiate the CharacterEmbeddings yourself, it is only on CPU even if you are on a GPU machine, which causes the error.

For now, the simplest fix is to do this:

 from flair.embeddings import CharacterEmbeddings

sentence = Sentence('La casa es muy bonita.', use_tokenizer=True)

embedding = CharacterEmbeddings()
embeddings = embeddings.cuda() # add this line to put the embeddings on CUDA
embedding.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

Could you test if this works for you?

We will also set up a PR that fixes this behavior. Default behavior should be that embeddings are instantiated on cuda if available.

Thanks for finding this error and reporting it!

@alanakbik
Copy link
Collaborator

Should be fixed by the latest PR. Feel free to reopen if there are still issues!

Thanks again for reporting the error!

@PaulZhangIsing
Copy link

Hi @lisette-garciamoya - I was able to understand where the error is coming from. In fact, the original code of the CharacterEmbeddings class is correct.

However, when you instantiate the CharacterEmbeddings, by default it is only instantiated on CPU. The ModelTrainer then puts it on GPU which is why the training works. But if you instantiate the CharacterEmbeddings yourself, it is only on CPU even if you are on a GPU machine, which causes the error.

For now, the simplest fix is to do this:

 from flair.embeddings import CharacterEmbeddings

sentence = Sentence('La casa es muy bonita.', use_tokenizer=True)

embedding = CharacterEmbeddings()
embeddings = embeddings.cuda() # add this line to put the embeddings on CUDA
embedding.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

Could you test if this works for you?

We will also set up a PR that fixes this behavior. Default behavior should be that embeddings are instantiated on cuda if available.

Thanks for finding this error and reporting it!

To me, I used
from flair.embeddings import BertEmbeddings
from flair.data import Sentence
from flair.embeddings import FlairEmbeddings

init embedding

flair_embedding_forward = FlairEmbeddings('news-forward')
bert_embedding = BertEmbeddings('bert-large-cased').cuda()

create a sentence

sentence = Sentence('The grass is green .')

embed words in sentence

x = bert_embedding.embed(sentence)

for token in sentence:
print(token)
print(token.embedding)
it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants