Version 2.0 beta
Upgrade focusing on ease of use (with new, simpler syntax) and CRAN-ability. Bumping major version because of a breaking change in the behavior of nearest_to
, which now returns a data.frame.
Changes
Change in nearest_to behavior.
There's a change in nearest_to
that will break some existing code. Now it returns a data.frame instead of a list. The data.frame columns have elaborate names so they can easily be manipulated with dplyr, and/or plotted with ggplot. There are flags to return to the old behavior (as_df=FALSE
).
New syntax for vector addition.
This package now allows formula scoping for the most common operations, and string inputs to access in the context of a particular matrix. This makes this much nicer for handling the bread and butter word2vec operations.
For instance, instead of writing (in normal matrix format, not the existing enhancements)
vectors %>% nearest_to(vectors[rownames(vectors)=="king",] - vectors[rownames(vectors)=="man",] + vectors[rownames(vectors)=="woman",])
(whew!), you can now write
vectors %>% nearest_to(~"king" - "man" + "woman")
(whew!), you can now write
vectors %>% nearest_to(~"king" - "man" + "woman")
Most basic math is supported in this interface; to overweight some words, say, you could just multiply out the vectors:
vectors %>% nearest_to(~"king" - "man"*2 + "woman" + "lady")
Reading tweaks.
In keeping with the goal of allowing manipulation of models in low-memory environments, it's now possible to read only rows with words matching certain criteria by passing an argument to read.binary.vectors(); either rowname_list
for a fixed list, or rowname_regexp
for a regular expression. (You could, say, read only the gerunds from a file by entering rowname_regexp = "*.ing"
).
Test Suite
The package now includes a test suite.
Other materials for rOpenScience and JOSS.
This package has enough users it might be nice to get it on CRAN. I'm trying doing so through rOpenSci. That requires a lot of small files scattered throughout.