Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mr.Blei, pls tell me how to use it. #4

Open
9crk opened this issue Mar 2, 2017 · 2 comments
Open

Mr.Blei, pls tell me how to use it. #4

9crk opened this issue Mar 2, 2017 · 2 comments

Comments

@9crk
Copy link

9crk commented Mar 2, 2017

I'm a beginer of LDA. pls tell me how to use this lda command.
usage : lda est [initial alpha] [k] [settings] [data] [random/seeded/*] [directory]
lda inf [settings] [model] [data] [name]

in my mind. this is a tool to get the topic words of an article.
so if I have handreds of articles by hand(like 0000.txt-1000.txt). how can I use lda to get the topic words of an article?

@chanansh
Copy link

chanansh commented Aug 3, 2017

the data format is explained in the readme.txt file. Each line should have a count of number of tokens followed by a term index and it counts. The term index should correspond to a vocabulary file.

See https://github.com/blei-lab/lda-c/blob/master/readme.txt for more details:

  1. Data format

Under LDA, the words of each document are assumed exchangeable. Thus,
each document is succinctly represented as a sparse vector of word
counts. The data is a file where each line is of the form:

 [M] [term_1]:[count] [term_2]:[count] ...  [term_N]:[count]

where [M] is the number of unique terms in the document, and the
[count] associated with each term is how many times that term appeared
in the document. Note that [term_1] is an integer which indexes the
term; it is not a string.

@kitescat
Copy link

So how can i transform my data into this format,is there any script useful?
pls tell me if anyone seeing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants