Cannot use [DOCUMENTS] in prompt for respresentation_model #1004

ohmeow · 2023-02-15T20:28:14Z

Here is the code:

# embeddings
sentence_model = SentenceTransformer("all-MiniLM-L6-v2")
# dimensionality reduction
umap_model = UMAP(n_neighbors=5, n_components=2, min_dist=0.0, metric="cosine")
# clustering
hdbscan_model = HDBSCAN(min_cluster_size=3, metric="euclidean", cluster_selection_method="eom", prediction_data=True)
# vectorizer
vectorizer_model = CountVectorizer()
# representation
prompt = """
I have topic that contains the following documents: [DOCUMENTS]. 
The topic is described by the following keywords: [KEYWORDS].
Based on the above information, can you give a short label of the topic?
"""
generator = pipeline('text2text-generation', model='google/flan-t5-base')
representation_model = TextGeneration(generator, prompt=prompt)

# build topic model and get predictions
topic_model = BERTopic(
    embedding_model=sentence_model,
    umap_model=umap_model,
    hdbscan_model=hdbscan_model,
    vectorizer_model=vectorizer_model,
    representation_model=representation_model,
    min_topic_size=3,
)

docs = df["_seq"].values.tolist()
topics, probs = topic_model.fit_transform(documents=docs)

The error:

File ~/mambaforge/envs/myenv/lib/python3.10/site-packages/bertopic/_bertopic.py:2950, in BERTopic._extract_topics(self, documents)
   2948 documents_per_topic = documents.groupby(['Topic'], as_index=False).agg({'Document': ' '.join})
   2949 self.c_tf_idf_, words = self._c_tf_idf(documents_per_topic)
-> 2950 self.topic_representations_ = self._extract_words_per_topic(words, documents)
   2951 self._create_topic_vectors()
   2952 self.topic_labels_ = {key: f"{key}_" + "_".join([word[0] for word in values[:4]])
...
--> 135     for doc in docs:
    136         to_replace += f"- {doc[:255]}\n"
    137     prompt += self.prompt.replace("[DOCUMENTS]", to_replace)

TypeError: 'NoneType' object is not iterable

Thanks much - wg

The text was updated successfully, but these errors were encountered:

MaartenGr · 2023-02-16T05:23:32Z

Hmmm, strange. Do you perhaps have a reproducible example of this? To me, it is not immediately clear why it is doing this but I'll make sure to test some things out!

MaartenGr · 2023-02-16T06:31:56Z

There was a small typo in the code of TextGeneration that made it pass a None value instead of the documents. I believe this is fixed with the latest commit to the main branch. So installing BERTopic there should resolve your issue. With these major releases, there are often bugs that were overlooked, so I typically wait a couple of weeks to release a quickfix in order to gather any issues that may come up.

For now, you can install BERTopic either from its latest commit:

pip install git+https://github.com/MaartenGr/BERTopic.git@1ee8141d65063a37f6ee3fd56b30e3f9e2f43d6e

or you can adjust the code yourself as was done here.

ohmeow · 2023-02-16T16:42:18Z

I believe this is related to another issue where instead of [DOCUMENTS] it was looking for [DOCUMENT]. Changing it to the later and I was up and running again. Will look back at this later today to verify. Thanks - wg

…

On Wed, Feb 15, 2023 at 10:32 PM Maarten Grootendorst < ***@***.***> wrote: There was a small typo in the code of TextGeneration that made it pass a None value instead of the documents. I believe this is fixed with the latest commit to the main branch. So installing BERTopic there should resolve your issue. With these major releases, there are often bugs that were overlooked, so I typically wait a couple of weeks to release a quickfix in order to gather any issues that may come up. For now, you can install BERTopic either from its latest commit: pip install ***@***.*** or you can adjust the code yourself as was done here <#1002>. — Reply to this email directly, view it on GitHub <#1004 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADNMFCAAG7PBXS6GYOUTLWXXCWPANCNFSM6AAAAAAU5JZCEA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

MaartenGr · 2023-02-16T17:25:00Z

That is correct. In the link I posted above, you will find the relevant PR. You can change it yourself or simply install from the most recent commit.

ohmeow · 2023-02-16T18:12:08Z

Thanks, I'll check it out.

…

On Thu, Feb 16, 2023 at 9:25 AM Maarten Grootendorst < ***@***.***> wrote: That is correct. In the link I posted above, you will find the relevant PR. You can change it yourself or simply install from the most recent commit. — Reply to this email directly, view it on GitHub <#1004 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADNMGTQO5H7SH6LJW6HTLWXZPHPANCNFSM6AAAAAAU5JZCEA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

MaartenGr closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use [DOCUMENTS] in prompt for respresentation_model #1004

Cannot use [DOCUMENTS] in prompt for respresentation_model #1004

ohmeow commented Feb 15, 2023

MaartenGr commented Feb 16, 2023

MaartenGr commented Feb 16, 2023

ohmeow commented Feb 16, 2023 via email

MaartenGr commented Feb 16, 2023

ohmeow commented Feb 16, 2023 via email

Cannot use [DOCUMENTS] in prompt for respresentation_model #1004

Cannot use [DOCUMENTS] in prompt for respresentation_model #1004

Comments

ohmeow commented Feb 15, 2023

MaartenGr commented Feb 16, 2023

MaartenGr commented Feb 16, 2023

ohmeow commented Feb 16, 2023 via email

MaartenGr commented Feb 16, 2023

ohmeow commented Feb 16, 2023 via email