You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as a beginner in R and topic modelling with R, I am following what other people suggested to first fit a lda model on a corpus of corporate annual reports, and then visualize the results thorugh LDAvis. Everything works fine until the very last step, when I open the directory on the browser and get the following error:
"SyntaxError: JSON.parse: bad control character in string literal at line 10 column 16177 of the JSON data"
Here are my codes:
#load text mining library
library(tm)
#load files into corpus
#get listing of .txt files in directory
ceoletters <- read.csv("ceoletters.csv")
corpus <- iconv(ceoletters$ceoletter, to = "ASCII", sub = "")
#create corpus from vector
letters <- Corpus(VectorSource(corpus))
#start preprocessing
letters <-tm_map(letters,content_transformer(tolower))
letters <- tm_map(letters, removePunctuation)
letters <- tm_map(letters, removeNumbers)
letters <- tm_map(letters,removeWords,stopwords("english"))
letters <- tm_map(letters, stripWhitespace)
#Stem document
letters <- tm_map(letters,stemDocument)
#Create document-term matrix
dtm <- DocumentTermMatrix(letters)
#convert rownames to filenames
rownames(dtm) <- ceoletters$letter_id
#collapse matrix by summing over columns
freq <- colSums(as.matrix(dtm))
#length should be total number of terms
length(freq)
#create sort order (descending)
ord <- order(freq,decreasing=TRUE)
#List all terms in decreasing order of freq and write to disk
freq[ord]
write.csv(freq[ord],"word_freq.csv")
##fitting LDA
#load topic models library
library(topicmodels)
library(doParallel)
#Set parameters for Gibbs sampling
burnin <- 2000
iter <- 2000
thin <- 500
seed <-list(2003,5,63,100001,765)
nstart <- 5
best <- TRUE
registerDoParallel(4)
#Number of topics
k <- 100
#Run LDA using Gibbs sampling
ldaOut <-LDA(dtm,k, method="Gibbs", control=list(nstart=nstart, seed = seed, best=best, burnin = burnin, iter = iter, thin=thin))
#write out results
#docs to topics
ldaOut.topics <- as.matrix(topics(ldaOut))
write.csv(ldaOut.topics,file=paste("LDAGibbs",k,"DocsToTopics.csv"))
#top 20 terms in each topic
ldaOut.terms <- as.matrix(terms(ldaOut,10))
write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))
#probabilities associated with each topic assignment
topicProbabilities <- as.data.frame(ldaOut@gamma)
write.csv(topicProbabilities,file=paste("LDAGibbs",k,"TopicProbabilities.csv"))
and here are my codes to visualize the results:
library(LDAvis)
library(servr)
topicmodels2LDAvis <- function(x, ...){
post <- topicmodels::posterior(x)
if (ncol(post[["topics"]]) < 3) stop("The model must contain > 2 topics")
mat <- x@wordassignments
LDAvis::createJSON(
phi = post[["terms"]],
theta = post[["topics"]],
vocab = colnames(post[["terms"]]),
doc.length = slam::row_sums(mat, na.rm = TRUE),
term.frequency = slam::col_sums(mat, na.rm = TRUE)
)
}
serVis(topicmodels2LDAvis(ldaOut))
Any idea to solve this problem?
Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
as a beginner in R and topic modelling with R, I am following what other people suggested to first fit a lda model on a corpus of corporate annual reports, and then visualize the results thorugh LDAvis. Everything works fine until the very last step, when I open the directory on the browser and get the following error:
Here are my codes:
and here are my codes to visualize the results:
Any idea to solve this problem?
Thanks.
The text was updated successfully, but these errors were encountered: