problems with merge_topics #648

iamsha5q · 2022-07-31T11:30:39Z

I created the following dataframe from the model output

topics, probs = model.fit_transform(vic_msg)
topic_df = model.get_topic_info()

And then I created another dataframe which consist of my messages, the topic from model output, and assign the highest probability topic when message is assigned to topic -1.

# create dataframe with topics
df = pd.DataFrame({'topic': topics, 'message': vic_msg})
df['topic_assigned'] = " "
for i, row in df.iterrows():
    if row.topic == -1:
        df.at[i,'topic_assigned'] = np.where(probs[i] == probs[i].max())[0][0]
    else:
        df.at[i,'topic_assigned'] = row.topic
df = df.merge(topic_df[['Topic', 'Name']], how='left', left_on='topic_assigned', right_on='Topic' )
df.rename(columns = {'Name':'topic_keywords'}, inplace = True)
df = df[['topic','topic_assigned', 'topic_keywords', 'message']]

df above works perfectly, until i decided to merge some topics as follow

topics_to_merge = [[141,142],[143,144]]
model.merge_topics(vic_msg, topics, topics_to_merge)

and then when i run df again, some messages are still assigned to topics that were deleted because of the topic merging. But when i run the topic_df it correctly showed the newly merged topic.

Say message[1] was allocated to topic 141, and before the topic merging if i do probs[1] or model.visualize_distribution(probs[1]) it will show some values. But not after merging.. I've reduced 140 topics to 115 topics. So any messages assigned to topics > 115 previously now have no topics to map.

When I run len(probs[1]) the size is still about 141 topics, which means the probs are not updated with the new probs from merging? but if i did the following i get an error

topics_merge, probs_merge = model.merge_topics(vic_msg, topics, topics_to_merge)
TypeError: cannot unpack non-iterable NoneType object

Do you have any idea what happen here @MaartenGr ?

The text was updated successfully, but these errors were encountered:

iamsha5q · 2022-07-31T12:55:14Z

Fixed it after running below found in another discussion. Thanks!

topics= model.map_predictions(model.hdbscan_model.labels)
probs = hdbscan.all_points_membership_vectors(model.hdbscan_model)
probs = model._map_probabilities(probs, original_topics=True)

iamsha5q · 2022-08-09T02:52:37Z

Hi @MaartenGr , turns out that i'm still having issues with this. After executing the above commands, I just realize the representative docs are not assigned correctly to the new topics after merging. I'm still confused on how to assign the new topics from merging to the documents. Any help is appreciated.

MaartenGr · 2022-08-09T07:38:34Z

@iamsha5q There is indeed currently a bug in merge_topics. It will be fixed in the next release but there will be some significant changes to the internal structure so a quick fix will come with a new full release as a PR will not cover it entirely.

Having said that, I believe you can fix it by running the following:

self._map_representative_docs()
updated_probs = self._map_probabilities(probs)

There is already quite some code for the new release, so I am hoping to get a PR in the coming weeks so that you can already use the fix.

iamsha5q · 2022-08-10T23:59:33Z

Thanks Maarten, I might just wait for the next release then. Even after the map_representative_docs() it's still not mapped properly.

* Online/incremental topic modeling with .partial_fit * Expose c-TF-IDF model for customization with bertopic.vectorizers.ClassTfidfTransformer * Expose attributes for easier access to internal data * Major changes to the Algorithm page of the documentation, which now contains three overviews of the algorithm * Added an example of combining BERTopic with KeyBERT * Added many tests with the intention of making development a bit more stable * Fix #632, #648, #673, #682, #667, #664

MaartenGr · 2022-09-27T08:41:18Z

With the new release, this should be fixed! However, if you still run into any issues, please let me know.

MaartenGr added a commit that referenced this issue Aug 10, 2022

Fix #632 and #648

a6bbf49

MaartenGr mentioned this issue Aug 10, 2022

v0.12 #668

Merged

MaartenGr closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problems with merge_topics #648

problems with merge_topics #648

iamsha5q commented Jul 31, 2022 •

edited

Loading

iamsha5q commented Jul 31, 2022

iamsha5q commented Aug 9, 2022

MaartenGr commented Aug 9, 2022

iamsha5q commented Aug 10, 2022

MaartenGr commented Sep 27, 2022

problems with merge_topics #648

problems with merge_topics #648

Comments

iamsha5q commented Jul 31, 2022 • edited Loading

iamsha5q commented Jul 31, 2022

iamsha5q commented Aug 9, 2022

MaartenGr commented Aug 9, 2022

iamsha5q commented Aug 10, 2022

MaartenGr commented Sep 27, 2022

iamsha5q commented Jul 31, 2022 •

edited

Loading