Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in TextGeneration prompt creation when prompt template does not contain [DOCUMENTS] #1726

Merged

Conversation

manveersadhal
Copy link
Contributor

Prevent attempted iteration over NoneType when prompt template does not contain [DOCUMENTS] in TextGeneration class.

When DEFAULT_PROMPT is used or the user-provided prompt does not contain "[DOCUMENTS]", all of the values in repr_docs_mappings are None. When defining truncated_docs, iteration is attempted over NoneType. This change prevents the attempted iteration and instead passes the None value to _create_prompt.

@manveersadhal
Copy link
Contributor Author

This is to address issue #1720

@MaartenGr
Copy link
Owner

Thanks for sharing this fix! Do you by chance have a reproducible example that allows me to test whether this test works for the issue that you are facing?

@manveersadhal
Copy link
Contributor Author

manveersadhal commented Jan 7, 2024

Hi @MaartenGr . Sure - here is a minimal example I used to test based on the issue submitted (#1720). I tested with the default prompt, the prompt in the issue (although it seems that was not passed to TextGeneration in the example), and another variant including [DOCUMENTS].

from bertopic import BERTopic
from transformers import pipeline
from bertopic.representation import TextGeneration
from sklearn.datasets import fetch_20newsgroups

prompt = "I have a topic described by the following keywords: [KEYWORDS]. What is this topic about?"

# Create your representation model
generator = pipeline('text2text-generation', model='google/flan-t5-small')
representation_model = TextGeneration(generator, prompt=prompt)

topic_model_t5 = BERTopic(representation_model=representation_model)
newsgroup_docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data'][:100]

topics, probs = topic_model_t5.fit_transform(newsgroup_docs)
print(topic_model_t5.get_topic_info())

@MaartenGr
Copy link
Owner

Thanks for the PR, works great!

@MaartenGr MaartenGr merged commit 6316c1e into MaartenGr:master Jan 17, 2024
2 checks passed
@leoschet
Copy link

Any idea when this will be released?

@MaartenGr
Copy link
Owner

@leoschet It is merged, so you can install BERTopic from the main branch. An official release might take a while since there are a couple PRs open that need to be merged/checked.

@leoschet
Copy link

I've been doping that, just not always possible to pin main for the package version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants