Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom labels in dynamic modeling and topic per class #2154

Open
1 task done
sercankiyak opened this issue Sep 20, 2024 · 4 comments
Open
1 task done

Custom labels in dynamic modeling and topic per class #2154

sercankiyak opened this issue Sep 20, 2024 · 4 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@sercankiyak
Copy link

sercankiyak commented Sep 20, 2024

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Desribe the bug

Hi,

I’ve come across a small issue that’s left me a bit puzzled. I’m not entirely sure if it’s a bug, but it does seem confusing. When I run topics_per_class with my topic model:

topics_per_class = topic_model.topics_per_class(docs, classes=classes, global_tuning=False)

The resulting topics_per_class looks like this:

Topic | Words | Frequency | Class
-1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl

This has no custom_labels. That is not ideal but OK.

But when I run this

fig = topic_model.visualize_topics_per_class(topics_per_class, custom_labels=True)

Then rerun topics_per_class in my jupyter, I get now this:

  | Topic | Words | Frequency | Class | Name
-1 | jaar, mensen, zegt, vrouwen, politie | 1421 | nl | -1_mensen_vrouwen_politie_kinderen

Now the "Name" is added. It’s unclear why it isn’t already included or why there’s no parameter like custom_labels=True when creating topics_per_class. I checked the documentation, and this parameter doesn’t exist, and attempting to use it gives an error in the first step.

I just wanted to bring this up. As title indicates, I’ve also experienced the same issue with dynamic topic modeling as well.

I sometimes find the table version easier to use, so I’m curious why it works like this, or maybe I’m missing something?

This is only my second time reporting something (on GitHub in general), so I hope I explained it clearly. Thanks again for everyone’s work and effort on BERTopic.

Best,

Reproduction

from bertopic import BERTopic

topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)

topics_per_class = topic_model.topics_per_class(docs, classes=classes)

print(topics_per_class.columns)
# No custom labels at this step

topic_model.visualize_topics_per_class(topics_per_class, top_n_topics=10)

print(topics_per_class.columns )
# Now custom labels are the last column in the df.

BERTopic Version

0.16.2

@sercankiyak sercankiyak added the bug Something isn't working label Sep 20, 2024
@MaartenGr
Copy link
Owner

This might be because .visualize_topics_per_class does not create a copy of topics_per_class but instead directly affects the variable. I believe that's why topics_per_class gets that additional column after running that function. It should an easy fix by simply creating a copy of the dataframe before attempting to update it.

@MaartenGr MaartenGr added the good first issue Good for newcomers label Sep 21, 2024
@sercankiyak
Copy link
Author

sercankiyak commented Sep 22, 2024

Thanks for the explanation. I understand your suggestion from a programming point of view. In addition, however, as a user, I would prefer that there was parameter in topics_per_class(), like custom_labels=TRUE to create the df with custom labels. I would prefer that because I think analyzing the tables is much better than the visual. I can highlight things, take notes etc. Would adding such a parameter be a bad idea, if so why? Or maybe there is a way that I did not figure out yet. Or what is the proper method to create the table with the custom labels while using dynamic modeling or topic per class?

@MaartenGr
Copy link
Owner

I think the reason something like is not implemented is because you can add that information yourself since you have access to the custom labels (e.g., topic_model.custom_labels_) and can then map them to the topics that you have in topics_per_class. I believe it should be straightforward to implement since I guess it would be one or two lines of code, so something like:

topics_per_class["Custom_Label"] = topics_per_class.apply(lambda row: topic_model.custom_labels_[row["Topic"]], axis=1)

I haven't tested this line but you should get the general intention.

@sercankiyak
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants