Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble with reading word2vec file #2

Open
burette opened this issue Dec 3, 2019 · 0 comments
Open

trouble with reading word2vec file #2

burette opened this issue Dec 3, 2019 · 0 comments

Comments

@burette
Copy link

burette commented Dec 3, 2019

if embed_type == "google":
with open(word2vec, "rb") as f:
header = f.readline()
vocab_size, layer1_size = map(int, header.split())
binary_len = np.dtype('float32').itemsize * layer1_size
for line in xrange(vocab_size):
word = []
while True:
ch = f.read(1)
if ch == ' ':
word = ''.join(word)
break
if ch != '\n':
word.append(ch)
idx = 0
emb_string = f.read(binary_len)
if word in vocabulary_user:
u = u + 1
idx = vocabulary_user[word]
initW_user[idx] = np.fromstring(emb_string, dtype='float32')

            if word in vocabulary_item:
                item = item + 1
                idx = vocabulary_item[word]
                initW_item[idx] = np.fromstring(emb_string, dtype='float32')

for line in xrange(vocab_size):
word = []
while True:
ch = f.read(1)
if ch == ' ':
word = ''.join(word)
break
if ch != '\n':
word.append(ch)

hi, when I run these code,my computer and workstation both are memory out .Some question:
1.is your word2vec file is GoogleNews-vectors-negative300.bin?if not ,what's this ?can you offer download url? ,thank U
2.In my understanding,these code is to vectorize representation user reviews, so can I use GoogleNews-vectors-negative300.bin to represent word in reviews directly?Are these ways the same?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant