Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BerTopic Model - Visualization ignores 0th index #667

Closed
bala1802 opened this issue Aug 10, 2022 · 2 comments · Fixed by #668
Closed

BerTopic Model - Visualization ignores 0th index #667

bala1802 opened this issue Aug 10, 2022 · 2 comments · Fixed by #668

Comments

@bala1802
Copy link

The BerTopic model resulted the below Topics:

image

As you can see from the above, the model is finetuned to generate lesser outliers '-1' which has the count of 3 and it appears in the last.

While visualizing the Topics per class,

topic_model.visualize_topics_per_class(topics_per_class)

the below interactive visual is generated, and however it ignored the 0th index, to be precise the Topic 0. The Global Topic Representations are displayed from 1, 2, 3, 4, 5, 6, -1

image

  1. Is the BerTopic designed in a way that it always assumes the very first index will be an outlier (-1), and eliminates it blindly?
  2. Are the generated topics always accessed based on the count size, may be in descending order?
@MaartenGr
Copy link
Owner

MaartenGr commented Aug 10, 2022

Thank you for the extensive description! This is indeed a known issue and has to do with how the topics are accessed. It will be fixed in the next release as there will be some changes to the internal structure and the way topics are accessed. For now, it should work by running the following: topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=None)

@bala1802
Copy link
Author

Thanks a lot for the quick response @MaartenGr , I appreciate your effort. After configuring the mentioned attribute top_n_topics=None, I could see the 0th Topic.

@MaartenGr MaartenGr mentioned this issue Aug 31, 2022
MaartenGr added a commit that referenced this issue Sep 11, 2022
* Online/incremental topic modeling with .partial_fit
* Expose c-TF-IDF model for customization with bertopic.vectorizers.ClassTfidfTransformer
* Expose attributes for easier access to internal data
* Major changes to the Algorithm page of the documentation, which now contains three overviews of the algorithm
* Added an example of combining BERTopic with KeyBERT
* Added many tests with the intention of making development a bit more stable
* Fix #632, #648, #673, #682, #667, #664
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants