Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get approximate_distribution Error #871

Closed
alfandindarahmawan opened this issue Dec 9, 2022 · 2 comments
Closed

Get approximate_distribution Error #871

alfandindarahmawan opened this issue Dec 9, 2022 · 2 comments

Comments

@alfandindarahmawan
Copy link

Hello @MaartenGr i got issue after trying to get multiple topic for each document,
i running this script,
topic_distr, topic_token_distr = topic_model.approximate_distribution(news_content[:3], calculate_tokens=True)
and i got this error

[/usr/local/lib/python3.8/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in approximate_distribution(self, documents, window, stride, min_similarity, batch_size, padding, use_embedding_model, calculate_tokens, separator)
   1096                 sentences = [separator.join(token) for token in token_sets]
   1097                 all_sentences.extend(sentences)
-> 1098                 all_token_sets_ids.extend(token_sets_ids)
   1099                 all_indices.append(all_indices[-1] + len(sentences))
   1100 

UnboundLocalError: local variable 'token_sets_ids' referenced before assignment

news_content contained list of long sentences.

@alfandindarahmawan
Copy link
Author

after i see your code, we need to take out token_sets, token_sets_ids under looping instead of inside else, how @MaartenGr ?

MaartenGr added a commit that referenced this issue Dec 9, 2022
@MaartenGr
Copy link
Owner

@alfandindarahmawan Great, thanks for finding this bug! I believe it should be fixed as it seems that was an issue handling documents that have fewer tokens than the window size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants