add_doc() method very slow

Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD). 

I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)

Thank you in advance for your prompt answer!

Cheers,
Eric
 
Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add_doc() method very slow #199

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

add_doc() method very slow #199

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions