Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trouble using wordVectors in Rscript. #37

Open
bmschmidt opened this issue Jun 21, 2017 · 2 comments
Open

trouble using wordVectors in Rscript. #37

bmschmidt opened this issue Jun 21, 2017 · 2 comments

Comments

@bmschmidt
Copy link
Owner

bmschmidt commented Jun 21, 2017

Pasted from an e-mail I received for tracking.

I’m having trouble using wordVectors in Rscript.
A minimal case:

library(magrittr)
library(wordVectors)

model <- read.vectors('foo.bin')
model %>% closest_to('man') %>% print()

This code works fine in an interactive R session, but it fails when run via Rscript:


Error in context[[formula]] : subscript out of bounds
Calls: %>% ... <Anonymous> -> closest_to -> cosineSimilarity -> sub_out_formula
Execution halted

(I get the same failure if I rewrite it to use traditional(function(nesting(syntax))) instead of magrittr, BTW.)

Code like

cosineSimilarity(model[['man']],model[['woman']]) %>% print()

also fails similarly in Rscript but works when stepped through or source()’d in an interactive R session.

Back in 2015 I used wordVectors extensively in Rscript with no problems, so whatever’s going on here seems to be connected to changes since then.

@bmschmidt
Copy link
Owner Author

OK: whatever the issue is, it seems to be solved on the dev branch but not master.

Hmm, looks like I need to read up on Rscript.

My testing code is this:

library(wordVectors)
test = read.vectors("~/vector_models/GoogleNewsPartial.bin",nrow=1000)
test %>% closest_to("that")

And I get the error:

Filename ends with .bin, so reading in binary format
Reading the first 1000 rows of a word2vec binary file of 3000000 rows and 300 columns
  |======================================================================| 100%
Error in new("VectorSpaceModel", .) : could not find function "new"
Calls: %>% ... eval -> _fseq -> freduce -> withVisible -> <Anonymous>
Execution halted

Error in new("VectorSpaceModel", .) : could not find function "new"

This seems to be because Rscript doesn't load basic libraries by default: if I run

Rscript --default-packages=methods,utils test.R

It works for me.

That's a different problem than you're getting! But maybe try it with those default package flags?

@tkinias
Copy link

tkinias commented Jul 12, 2017

Interestingly, when I run your minimal file (substituting a vector file I have on hand), I get this:

Filename ends with .bin, so reading in binary format
Reading the first 1000 rows of a word2vec binary file of 42013 rows and 100 columns
  |======================================================================| 100%
Error in tcrossprod(x, y)/(sqrt(tcrossprod(square_magnitudes(x), square_magnitudes(y)))) : 
  non-conformable arrays
Calls: %>% ... withVisible -> <Anonymous> -> closest_to -> cosineSimilarity
Execution halted

Yet another weird error.

I see this with the dev branch, too, however:

> devtools::install_github('bmschmidt/wordVectors',ref='dev')
Skipping install of 'wordVectors' from a github remote, the SHA1 (ee00f79e) has not changed since last install.

I can make it work, though, by adding:

library(methods)
library(utils)

to the top of the Rscript file. That accomplishes the same thing as adding --default-packages=methods,utils as command-line switches, but it’s more convenient for running an executable script straight from the shell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants