Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch get_coherence_per_topic and get_coherence for single topic #3181

Closed
silviatti opened this issue Jun 22, 2021 · 0 comments
Closed

Mismatch get_coherence_per_topic and get_coherence for single topic #3181

silviatti opened this issue Jun 22, 2021 · 0 comments

Comments

@silviatti
Copy link
Contributor

Problem description

Hi! I am using Gensim to compute the NPMI coherence for each of my topics. I used the method get_coherence_per_topic() and also get_coherence() (in this case, just passing a list with a single topic), and I noticed that the coherences per topic do not match with the ones returned by get_coherence() of the corresponding topics. In my understanding, the NPMI of a topic should be independent of the number of topics or of the other input topics.
This happens also with the other c_* coherences, not with the UMASS version.

Thank you!

Steps/code/corpus to reproduce

from gensim.test.utils import common_texts, common_dictionary
from gensim.models.ldamodel import LdaModel
from gensim.models.coherencemodel import CoherenceModel

topics = [
    ['human', 'computer', 'system', 'interface'],
    ['graph', 'minors', 'trees', 'eps']
]

cm = CoherenceModel(topics=topics, texts=common_texts, coherence='c_npmi', 
                    dictionary=common_dictionary)
coherence = cm.get_coherence_per_topic()  
print(coherence) # got [0.23583958321789514, -0.24456941091456053]

cm_topic0 = CoherenceModel(topics=[topics[0]], texts=common_texts, 
                           coherence='c_npmi', dictionary=common_dictionary)
coherence_topic0 = cm_topic0.get_coherence()  
print(coherence_topic0) # expect this to be == coherence[0] but got -0.14624062517782566

cm_topic1 = CoherenceModel(topics=[topics[1]], texts=common_texts, 
                           coherence='c_npmi', dictionary=common_dictionary)
coherence_topic1 = cm_topic1.get_coherence()  
print(coherence_topic1) # expect this to be == coherence[1] but got -0.31633310918174923

Versions

Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python 3.7.10 (default, May 3 2021, 02:48:31)
[GCC 7.5.0]
Bits 64
NumPy 1.19.5
SciPy 1.4.1
gensim 3.8.3
FAST_VERSION 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant