-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems with merge_topics #648
Comments
Fixed it after running below found in another discussion. Thanks! topics= model.map_predictions(model.hdbscan_model.labels) |
Hi @MaartenGr , turns out that i'm still having issues with this. After executing the above commands, I just realize the representative docs are not assigned correctly to the new topics after merging. I'm still confused on how to assign the new topics from merging to the documents. Any help is appreciated. |
@iamsha5q There is indeed currently a bug in Having said that, I believe you can fix it by running the following: self._map_representative_docs()
updated_probs = self._map_probabilities(probs) There is already quite some code for the new release, so I am hoping to get a PR in the coming weeks so that you can already use the fix. |
Thanks Maarten, I might just wait for the next release then. Even after the map_representative_docs() it's still not mapped properly. |
* Online/incremental topic modeling with .partial_fit * Expose c-TF-IDF model for customization with bertopic.vectorizers.ClassTfidfTransformer * Expose attributes for easier access to internal data * Major changes to the Algorithm page of the documentation, which now contains three overviews of the algorithm * Added an example of combining BERTopic with KeyBERT * Added many tests with the intention of making development a bit more stable * Fix #632, #648, #673, #682, #667, #664
With the new release, this should be fixed! However, if you still run into any issues, please let me know. |
I created the following dataframe from the model output
And then I created another dataframe which consist of my messages, the topic from model output, and assign the highest probability topic when message is assigned to topic -1.
df above works perfectly, until i decided to merge some topics as follow
and then when i run df again, some messages are still assigned to topics that were deleted because of the topic merging. But when i run the topic_df it correctly showed the newly merged topic.
Say message[1] was allocated to topic 141, and before the topic merging if i do probs[1] or model.visualize_distribution(probs[1]) it will show some values. But not after merging.. I've reduced 140 topics to 115 topics. So any messages assigned to topics > 115 previously now have no topics to map.
When I run len(probs[1]) the size is still about 141 topics, which means the probs are not updated with the new probs from merging? but if i did the following i get an error
topics_merge, probs_merge = model.merge_topics(vic_msg, topics, topics_to_merge)
TypeError: cannot unpack non-iterable NoneType object
Do you have any idea what happen here @MaartenGr ?
The text was updated successfully, but these errors were encountered: