Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading topic model with multiple topic aspects changes their format #1487

Closed
zilch42 opened this issue Aug 25, 2023 · 3 comments
Closed

Loading topic model with multiple topic aspects changes their format #1487

zilch42 opened this issue Aug 25, 2023 · 3 comments

Comments

@zilch42
Copy link
Contributor

zilch42 commented Aug 25, 2023

Hi Maarten,

Really enjoying the multi aspect feature. Great work with that. I am finding that when loading a topic model saved with safetensors, the display format of the extra aspects changes. It's including a large float with each keyword.

multi aspect

from bertopic.representation import KeyBERTInspired
from bertopic.representation import PartOfSpeech
from bertopic.representation import MaximalMarginalRelevance
from sklearn.datasets import fetch_20newsgroups
from bertopic import BERTopic

# Documents to train on
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

# The main representation of a topic
main_representation = KeyBERTInspired()

# Additional ways of representing a topic
aspect_model1 = PartOfSpeech("en_core_web_sm")
aspect_model2 = [KeyBERTInspired(top_n_words=30), MaximalMarginalRelevance(diversity=.5)]

# Add all models together to be run in a single `fit`
representation_model = {
   "Main": main_representation,
   "Aspect1":  aspect_model1,
   "Aspect2":  aspect_model2 
}
topic_model = BERTopic(representation_model=representation_model).fit(docs[0:200])

topic_model.save("example_model/", "safetensors")

topic_model.get_topic_info()

Restart python and load model...

from bertopic import BERTopic
topic_model = BERTopic.load("example_model/", embedding_model="sentence-transformers/all-MiniLM-L6-v2")
topic_model.get_topic_info()
@MaartenGr
Copy link
Owner

Thanks! It seems that I check for a tuple (which it was initially) but after loading in from json it converted it to a list instead. That should be a one-liner to fix.

@MaartenGr
Copy link
Owner

It should be fixed in the main branch 😄

@zilch42
Copy link
Contributor Author

zilch42 commented Sep 6, 2023

Yep looks good! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants