You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#first create some dummy data for repeatability. Read in moby dick from gutenberg. Since readlines breaks at the newline char we'll treat each newline as a new "text"
#Before we can infer the topical makeup of new files, we need a compatible instance list (aka use-pipe-from in mallet)
#For some reason, load_mallet_model_directory does not load the instance file that we saved above as part of the write_mallet_model . . . I'm not sure why?
#Interestingly, we can build an inferencer from the model before reloading it using load_mallet_model_directory, but it does not work after loading. in other words: this works correctly
inf <- inferencer(m)
inf
#But once we relaod the model from file, like this
m <- load_mallet_model_directory("DEMO_MODEL") #DEMO_MODEL = local path
#Hmm, that's weird. Imagine that we quit R and want to come back another day and load the model and do some inference on some new files. It looks like we cannot do that.
#But maybe there is another route. I saved the instance list, so perhaps I can read it in and then use it in conjunction with the compatible_instances(docs, instances) function
#According to the help file: m can be either a topic inferencer object from read_inferencer or inferencer or a mallet_model object. m is of the later type:
class(m)
[1] "mallet_model"
#So why the error?
#Let's try another route. rebuild the same model
m <- train_model(training_ilist, n_topics=10, n_iters=100, seed=1966)
m_inferencer <- inferencer(m)
Here is a dummied up script to test what seems to be a bug with inference in dfrtopics
options(java.parameters="-Xmx6g")
library(dfrtopics)
library(dplyr)
#first create some dummy data for repeatability. Read in moby dick from gutenberg. Since readlines breaks at the newline char we'll treat each newline as a new "text"
texts <- text_of_file <- readLines("http://www.gutenberg.org/files/2701/2701-0.txt")
#Now remove those pesky blanks
texts <- texts[-which(texts == "")]
#Grab 2000 random items for training and put into dataframe with proper colnames and some dummied id labels
training_docs <- data_frame(id = paste("Train", 1:2000, sep="_"), text = sample(texts, 2000))
#Now grab another 100 that we'll pretend are new documents for inference later on
inference_docs <- data_frame(id = paste("Test", 1:100, sep="_"), text = sample(texts, 100))
#Make an instance list for the training docs (for the sake of this demo, no stoplist)
training_ilist <- make_instances(training_docs)
#Train a topic model
m <- train_model(training_ilist, n_topics=10, n_iters=100, seed=1966)
#Now write the model to disk so we can load it later. Also write out the instance list, we're going to need it.
write_mallet_model(m, "DEMO_MODEL", save_instances = TRUE)
#Before we can infer the topical makeup of new files, we need a compatible instance list (aka use-pipe-from in mallet)
#For some reason, load_mallet_model_directory does not load the instance file that we saved above as part of the write_mallet_model . . . I'm not sure why?
#Interestingly, we can build an inferencer from the model before reloading it using load_mallet_model_directory, but it does not work after loading. in other words: this works correctly
inf <- inferencer(m)
inf
#But once we relaod the model from file, like this
m <- load_mallet_model_directory("DEMO_MODEL") #DEMO_MODEL = local path
#We can't create an inferencer
inf <- inferencer(m)
inf # returns NULL
#Hmm, that's weird. Imagine that we quit R and want to come back another day and load the model and do some inference on some new files. It looks like we cannot do that.
#But maybe there is another route. I saved the instance list, so perhaps I can read it in and then use it in conjunction with the compatible_instances(docs, instances) function
ilist <- read_instances("DEMO_MODEL/instances.mallet")
inference_ilist <- compatible_instances(inference_docs, ilist)
#Ok, so now we've got a loaded model from disk and a compatiable instance list. I should be able to infer topics on new docs. . .
inferred_m <- infer_topics(m, inference_ilist) # Tada!
#But no. . . .
#Error in rJava::.jcall(m, "[D", "getSampledDistribution", inst, n_iterations, :
#RcallMethod: invalid object parameter
#According to the help file: m can be either a topic inferencer object from read_inferencer or inferencer or a mallet_model object. m is of the later type:
class(m)
[1] "mallet_model"
#So why the error?
#Let's try another route. rebuild the same model
m <- train_model(training_ilist, n_topics=10, n_iters=100, seed=1966)
m_inferencer <- inferencer(m)
#Save it to disk
write_inferencer(m_inferencer, "DEMO_MODEL/m_inferencer.mallet")
#Read the inference from the file
inf <- read_inferencer("DEMO_MODEL/m_inferencer.mallet")
test <- infer_topics(inf, inference_ilist)
#Ugh. same error again . . .
#Error in rJava::.jcall(m, "[D", "getSampledDistribution", inst, n_iterations, :
#RcallMethod: invalid object parameter
#What now?
The text was updated successfully, but these errors were encountered: