Removed multiple iterations of corpus in p_boolean_document. #1325

danielchamberlain · 2017-05-15T17:00:35Z

In the previous version of p_boolean_document, the entire
corpus was iterated through for each top_id. In this new
version, the corpus is iterated through a single time
and the matching top_ids for each document are extracted
simultaneously.

In the previous version of p_boolean_document, the entire corpus was iterated through for each top_id. In this new version, the corpus is iterated through a single time and the matching docs for each top_id are extracted simultaneously.

danielchamberlain · 2017-05-15T17:23:28Z

The build failed because of trailing whitespace after a colon. Should I submit a new pull request?

menshikh-iv · 2017-05-15T17:27:18Z

@danielchamberlain No, you can fix it in current PR

danielchamberlain · 2017-05-15T18:01:00Z

Thanks. All set now.

tmylk · 2017-05-15T20:27:34Z

Thanks for the improvement. Confirm regression test testPBooleanDocument passe.

piskvorky · 2017-05-17T01:22:40Z

gensim/topic_coherence/probability_estimation.py

+    for n, document in enumerate(corpus):
+        doc_words = frozenset(x[0] for x in document)
+        top_ids_in_doc = top_ids.intersection(doc_words)
+        if len(top_ids_in_doc) > 0:


if top_ids_in doc: in Python.

In fact, better remove the condition completely, it doesn't seem to be needed: for word_id in top_ids.intersection(doc_words): ...

piskvorky · 2017-05-17T01:25:08Z

gensim/topic_coherence/probability_estimation.py

+        top_ids_in_doc = top_ids.intersection(doc_words)
+        if len(top_ids_in_doc) > 0:
+            for id in top_ids_in_doc:
+                per_topic_postings[id].add(n)


id is a reserved keyword, best use something else (like word_id).

Removed extra whitespace after for loop :

fc28062

tmylk merged commit 2996884 into piskvorky:develop May 15, 2017

piskvorky reviewed May 17, 2017

View reviewed changes

danielchamberlain deleted the pboolean_speedup branch May 18, 2017 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed multiple iterations of corpus in p_boolean_document. #1325

Removed multiple iterations of corpus in p_boolean_document. #1325

danielchamberlain commented May 15, 2017

danielchamberlain commented May 15, 2017

menshikh-iv commented May 15, 2017

danielchamberlain commented May 15, 2017

tmylk commented May 15, 2017

piskvorky May 17, 2017 •

edited

Loading

piskvorky May 17, 2017 •

edited

Loading

Removed multiple iterations of corpus in p_boolean_document. #1325

Removed multiple iterations of corpus in p_boolean_document. #1325

Conversation

danielchamberlain commented May 15, 2017

danielchamberlain commented May 15, 2017

menshikh-iv commented May 15, 2017

danielchamberlain commented May 15, 2017

tmylk commented May 15, 2017

piskvorky May 17, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky May 17, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky May 17, 2017 •

edited

Loading

piskvorky May 17, 2017 •

edited

Loading