-
Notifications
You must be signed in to change notification settings - Fork 764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in get_topic_info? #581
Comments
Thanks for sharing this! In part, the goal of |
I understand. It isn't critical - but this was unexpected behavior since the topics are re-numbered sequentially from -1 to n where the 0 to n are in order of size. It is entirely possible to have a -1 that is the smallest or in the middle. I'm not sure that the assumption is warranted. While it is minor, when it happens it is easy to miss and if you are relying on the order can mess things up. |
Agreed, I think I will change this in the upcoming release with your suggestion, simply sorting by |
This was fixed in v0.11 and this issue will be closed. If you continue to run into this problem, let me know and I'll make sure to re-open the issue. |
I'm formatting output to use in a graph. I have two new BERTopic instances, similar setting except that the input text and the hdbscan_models are different. Here is what I get when calling get_topic_info()
Note that in the first output the -1 topic is the first, and in the second the last. In the second model the number of -1 is very small - smaller than all the others. I think the bug is on line 769 of _bertopic:
In this case:
info = pd.DataFrame(BERT_2.topic_sizes.items(), columns=['Topic', 'Count']).sort_values("Count", ascending=False)
info["Name"] = info.Topic.map(BERT_2.topic_names)
info
produces:
Sorting by Topic/ascending fixes this - but may break something else. I didn't trace back to figure out if the topic list is already re-ordered in all cases or not.
info = pd.DataFrame(BERT_2.topic_sizes.items(), columns=['Topic', 'Count']).sort_values("Topic", ascending=True)
info["Name"] = info.Topic.map(BERT_2.topic_names)
info
The text was updated successfully, but these errors were encountered: