Loading pre-trained word vectors is broken #98

matt-peters · 2017-08-22T23:44:40Z

The logic to load the pre-trained word vectors seems to be broken. I looked in test/vocab.py but didn't see any tests that covered correctness of loaded vectors.

Here's a snippet to reproduce the issue:

from torchtext.vocab import Vocab 
from collections import Counter

the_vocab = Vocab(Counter(['the']), vectors='glove.6B.50d')

the_index = the_vocab.stoi['the']
print(the_index)
print(the_vocab.vectors.numpy()[the_index, :])

With the master branch this displays:

2
[  3.02532905e-17   1.40129846e-45   2.10194770e-44   0.00000000e+00
   1.00185045e-16   1.40129846e-45   1.40129846e-44   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   4.84570079e-17   1.40129846e-45   1.54142831e-44   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00  -2.00000000e+00   8.72027897e-07  -4.65774808e-10
   4.82658813e-26   1.40129846e-45   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00  -8.29709032e-27   3.23318939e+02
   9.88949980e-32   1.40129846e-45   1.58111748e-32   1.40129846e-45
  -8.15662628e+28  -9.19432299e+27   9.88943397e-32   1.40129846e-45
   7.55058708e-26   1.40129846e-45]

However, if I manually download and unpack the GloVe 50d file, this is the line with the token:

the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862 -0.00066023 -0.6566 0.27843 -0.14767 -0.55677 0.14658 -0.0095095 0.011658 0.10204 -0.12792 -0.8443 -0.12181 -0.016801 -0.33279 -0.1552 -0.23131 -0.19181 -1.8823 -0.76746 0.099051 -0.42125 -0.19526 4.0071 -0.18594 -0.52287 -0.31681 0.00059213 0.0074449 0.17778 -0.15897 0.012041 -0.054223 -0.29871 -0.15749 -0.34758 -0.045637 -0.44251 0.18785 0.0027849 -0.18411 -0.11514 -0.78581

The text was updated successfully, but these errors were encountered:

matt-peters mentioned this issue Aug 23, 2017

Fix loading pre-trained word vectors #99

Merged

jekbradbury closed this as completed in #99 Aug 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading pre-trained word vectors is broken #98

Loading pre-trained word vectors is broken #98

matt-peters commented Aug 22, 2017 •

edited

Loading

Loading pre-trained word vectors is broken #98

Loading pre-trained word vectors is broken #98

Comments

matt-peters commented Aug 22, 2017 • edited Loading

matt-peters commented Aug 22, 2017 •

edited

Loading