You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank you so much for this package. It's very useful.
I know you've already more or less answered this here. I wasn't able to reproduce the example. I think it used an older version of fasttext.
import fasttext as ft
model2=ft.load_model('fasttext/wiki.en.bin')
class FastTextEmbeddings(object):
def __getitem__(self, item):
item = np.array(item, copy=True)
item[item > len(fastText_wv) # for testing insert a value here] = -1
return fastText_wv.get_input_vector(item)
There are a few issues:
--fastText_wv does not return a length. That said, we can use one for testing by looking at the output length when loading the model.
--get_input_vector cannot be called.
getInputVector(): incompatible function arguments. The following argument types are supported:
1. (self: fasttext_pybind.fasttext, arg0: fasttext_pybind.Vector, arg1: int) -> None
Invoked with: <fasttext_pybind.fasttext object at 0x2ab5724b0>, <fasttext_pybind.Vector object at 0x2a29c5370>, array(18446744073709551615, dtype=uint64)
This is with fasttext 0.9.1
I'm sure you have this working with gensim or fasttext. I'm wondering if you could share your code example as you did with Numpy. I'm not sure where to start on debugging. Most of the methods in gensim require supplying the token over the word index.
The text was updated successfully, but these errors were encountered:
Seeing a few things that were missing in that post, This should work:
import fasttext as ft
model2=ft.load_model('fasttext/wiki.en.bin')
Array was not needed:
class FastTextEmbeddings(object):
def __getitem__(self, item):
# this has to be set
if item > dim: return [1e-8] * 300
return model2.get_input_vector(item)
Need the fasttext ids:
def buildVectorsFT(row):
#spacy is overkill here; but this tokenizes
nlpAll = nlpSpacy(alltext)
tokens = [t.text.lower() for t in nlpAll if t.is_alpha and not t.is_stop] #[token.text.lower() for token in nlpAll if not token.is_stop]
words = Counter(t for t in tokens)
orths = {t: model2.get_word_id(t) for t in tokens}
sorted_words = sorted(words)
documents[title] = (title, [orths[t] for t in sorted_words],
np.array([words[t] for t in sorted_words], dtype=np.float32))
return
First, thank you so much for this package. It's very useful.
I know you've already more or less answered this here. I wasn't able to reproduce the example. I think it used an older version of fasttext.
There are a few issues:
--fastText_wv does not return a length. That said, we can use one for testing by looking at the output length when loading the model.
--get_input_vector cannot be called.
This is with fasttext 0.9.1
I'm sure you have this working with gensim or fasttext. I'm wondering if you could share your code example as you did with Numpy. I'm not sure where to start on debugging. Most of the methods in gensim require supplying the token over the word index.
The text was updated successfully, but these errors were encountered: