-
Notifications
You must be signed in to change notification settings - Fork 756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dominant topic in a document #954
Comments
When you say dominant topic do you mean the one with the most members or something else? |
After training the model, you can access the assigned topics for each document with However, if you want to model the distribution of topics in the documents, it might be worthwhile to use |
The document is represented by a set of topics. The topic that is most talked about in the document is the "dominant topic". |
In the documentation with topic_model.topics_ most recent topics are tracked. Is it the same as the dominant topic? Because when I checked the probabilities of the topics with the documents ( by setting the parameter " calculate_probabilities=True " ). The topic that got max probability differs from the results I get from the topic_model.topics_ . |
Yes, there are the dominant topics per document.
The probabilities that are calculated are approximations as a result of how HDBSCAN generates these probabilities. As a result, they indeed may be different and are used as just that, an approximation. I have had this question a couple of times before, so I'll definitely make sure to make this a bit more clear in the documentation. |
* Add representation models * bertopic.representation.KeyBERTInspired * bertopic.representation.PartOfSpeech * bertopic.representation.MaximalMarginalRelevance * bertopic.representation.Cohere * bertopic.representation.OpenAI * bertopic.representation.TextGeneration * bertopic.representation.LangChain * bertopic.representation.ZeroShotClassification * Fix topic selection when extracting repr docs * Improve documentation, #769, #954, #912 * Add wordcloud example to documentation * Add title param for each graph, #800 * Improved nr_topics procedure * Fix #952, #903, #911, #965. Add #976
Is there any way that I can get the dominant topic from the documnets?
The text was updated successfully, but these errors were encountered: