Skip to content

add_doc() method very slow #199

@erima2020

Description

@erima2020

Hello,
Since I wrote the original message, I could find the issue (I had two columns in my data file, which slowed things down). Now I have the same issue of slowness, but less slow: in 5 minutes, I can import around 100,000 documents of several millions (currently trying with a subsample). My data file is simply one text per line. Can things occur faster?
(I'm on a i7 Mac here, 32GB Ram, 1TB SDD).

I also have a question regarding model evaluation with different k1 and k2 values with Pachinko TM: can this be achieved (e.g., like optimizing coherence with LDA)

Thank you in advance for your prompt answer!

Cheers,
Eric

Original message:
Hello,
I tried to use tomotopy with a few thousand tweets and it went quick on google colab. Now I am trying with several millions on a local machine, and the add_doc() method seems to be a bottleneck. In five minutes, it added around 50 documents. Is this a known issue? Do I need to configure something to get it run quicker?
Best wishes,
Eric

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions