How to load a word embedding dictionary using torchtext #722

nawshad · 2020-04-04T01:05:08Z

Hi,

I have tried to write that to a gensim word2vec format then load, but it throws error about string to float conversion. Is there a standard way to use custom pre-trained embedding (not created through gensim) which is a python dictionary to load using torchtext?

Thanks,

zhangguanheng66 · 2020-04-06T15:03:56Z

@bentrevett @mttk Any ideas for this issue. I think we support the pretrained word vector in torchtext.

bentrevett · 2020-04-06T15:54:46Z

There is a way to load custom embeddings from a file, so you can write your dictionary to a file and then read it with TorchText.

import torchtext.vocab as vocab

custom_embeddings = vocab.Vectors(name = 'custom_embeddings.txt')

The format of your custom_embeddings.txt file needs to be the token followed by the values of each of the dimensions for the embedding, all separated by a single space, e.g. here's three tokens with 20 dimensional embeddings (all just ones as an example):

good 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
great 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
awesome 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

You then align these with your vocabulary when you build_vocab for the desired Field:

TEXT.build_vocab(train_data, vectors = custom_embeddings)

Then you actually load these pre-trained embeddings into your model with:

model.embedding.weight.data.copy_(TEXT.vocab.vectors)

bentrevett mentioned this issue Apr 30, 2020

Tutorial 6: How to fit in pre-trained embedding matrix as the embedding layer in the transformer model? bentrevett/pytorch-seq2seq#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load a word embedding dictionary using torchtext #722

How to load a word embedding dictionary using torchtext #722

nawshad commented Apr 4, 2020 •

edited

Loading

zhangguanheng66 commented Apr 6, 2020

bentrevett commented Apr 6, 2020 •

edited

Loading

How to load a word embedding dictionary using torchtext #722

How to load a word embedding dictionary using torchtext #722

Comments

nawshad commented Apr 4, 2020 • edited Loading

zhangguanheng66 commented Apr 6, 2020

bentrevett commented Apr 6, 2020 • edited Loading

nawshad commented Apr 4, 2020 •

edited

Loading

bentrevett commented Apr 6, 2020 •

edited

Loading