Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cohere client.embed TypeError #1904

Merged
merged 1 commit into from
Apr 10, 2024
Merged

Fix Cohere client.embed TypeError #1904

merged 1 commit into from
Apr 10, 2024

Conversation

MaartenGr
Copy link
Owner

The following code:

import cohere
from bertopic import BERTopic
from bertopic.backend import CohereBackend

# Create the Cohere model with specific embedding settings
client = cohere.Client("MY_KEY")
cohere_model = CohereBackend(
    client,
    embedding_model="embed-english-v3.0",
    embed_kwargs={"input_type": "clustering"}
)

# Initialize BERTopic with the CohereBackend model
topic_model = BERTopic(embedding_model=cohere_model)

# Fit the BERTopic model 
topics, probabilities = topic_model.fit_transform(["test one", "test two"])

Gives me the following error on BERTopic v0.16 and cohere v5.1.7:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-3-be5151b26d2a>](https://localhost:8080/#) in <cell line: 17>()
     15 
     16 # Fit the BERTopic model using 'tweet_list'
---> 17 topics, probabilities = topic_model.fit_transform(["test one", "test two"])

3 frames
[/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in fit_transform(self, documents, embeddings, images, y)
    385             self.embedding_model = select_backend(self.embedding_model,
    386                                                   language=self.language)
--> 387             embeddings = self._extract_embeddings(documents.Document.values.tolist(),
    388                                                   images=images,
    389                                                   method="document",

[/usr/local/lib/python3.10/dist-packages/bertopic/_bertopic.py](https://localhost:8080/#) in _extract_embeddings(self, documents, images, method, verbose)
   3408             embeddings = self.embedding_model.embed_words(words=documents, verbose=verbose)
   3409         elif method == "document":
-> 3410             embeddings = self.embedding_model.embed_documents(documents, verbose=verbose)
   3411         elif documents[0] is None and images is None:
   3412             raise ValueError("Make sure to use an embedding model that can either embed documents"

[/usr/local/lib/python3.10/dist-packages/bertopic/backend/_base.py](https://localhost:8080/#) in embed_documents(self, document, verbose)
     67             that each have an embeddings size of `m`
     68         """
---> 69         return self.embed(document, verbose)

[/usr/local/lib/python3.10/dist-packages/bertopic/backend/_cohere.py](https://localhost:8080/#) in embed(self, documents, verbose)
     86         # Extract embeddings all at once
     87         else:
---> 88             response = self.client.embed(documents, **self.embed_kwargs)
     89             embeddings = response.embeddings
     90         return np.array(embeddings)

TypeError: Client.embed() takes 1 positional argument but 2 positional arguments (and 2 keyword-only arguments) were given

I believe the solution should be straightforward based on the updated version of the cohere package by simply adding the texts parameter specifically to prevent issues with keyword assignments.

@Jenni-Hawk
Copy link

Hi @MaartenGr
I've tested the changes in this PR and here's my feedback:
Environment: Tested on Anaconda, BERTopic v0.16, Python 3.11
Functionality: Everything worked as expected
Performance: No issues with performances or stability were noted. The changes integrate well with the existing codebase.
Thank you for addressing this issue! It looks ready for merging from my end, pending any further reviews.

@MaartenGr
Copy link
Owner Author

@Jenni-Hawk Thanks for testing this!

@MaartenGr MaartenGr merged commit de7376d into master Apr 10, 2024
2 checks passed
@MaartenGr MaartenGr deleted the fix_cohere_api branch July 22, 2024 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants