Get approximate_distribution Error #871

alfandindarahmawan · 2022-12-09T03:06:32Z

Hello @MaartenGr i got issue after trying to get multiple topic for each document,
i running this script,
topic_distr, topic_token_distr = topic_model.approximate_distribution(news_content[:3], calculate_tokens=True)
and i got this error

[/usr/local/lib/python3.8/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in approximate_distribution(self, documents, window, stride, min_similarity, batch_size, padding, use_embedding_model, calculate_tokens, separator)
   1096                 sentences = [separator.join(token) for token in token_sets]
   1097                 all_sentences.extend(sentences)
-> 1098                 all_token_sets_ids.extend(token_sets_ids)
   1099                 all_indices.append(all_indices[-1] + len(sentences))
   1100 

UnboundLocalError: local variable 'token_sets_ids' referenced before assignment

news_content contained list of long sentences.

The text was updated successfully, but these errors were encountered:

alfandindarahmawan · 2022-12-09T03:51:10Z

after i see your code, we need to take out token_sets, token_sets_ids under looping instead of inside else, how @MaartenGr ?

MaartenGr · 2022-12-09T10:54:32Z

@alfandindarahmawan Great, thanks for finding this bug! I believe it should be fixed as it seems that was an issue handling documents that have fewer tokens than the window size.

MaartenGr added a commit that referenced this issue Dec 9, 2022

Fix #871

3d02de3

MaartenGr closed this as completed Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get approximate_distribution Error #871

Get approximate_distribution Error #871

alfandindarahmawan commented Dec 9, 2022

alfandindarahmawan commented Dec 9, 2022

MaartenGr commented Dec 9, 2022

Get approximate_distribution Error #871

Get approximate_distribution Error #871

Comments

alfandindarahmawan commented Dec 9, 2022

alfandindarahmawan commented Dec 9, 2022

MaartenGr commented Dec 9, 2022