Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Watsonx as embedder is not working - script errors and stops #1790

Open
mtcolman opened this issue Dec 20, 2024 · 12 comments
Open

[BUG] Watsonx as embedder is not working - script errors and stops #1790

mtcolman opened this issue Dec 20, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@mtcolman
Copy link

mtcolman commented Dec 20, 2024

Description

I'm following https://docs.crewai.com/concepts/knowledge#embedder-configuration and it states:

Embedder Configuration
You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.

...
string_source = StringKnowledgeSource(
   content="Users name is John. He is 30 years old and lives in San Francisco.",
)
crew = Crew(
   ...
   knowledge_sources=[string_source],
   embedder={
       "provider": "openai",
       "config": {"model": "text-embedding-3-small"},
   },
)

I try running my crew with this configuration (as I want to use Watsonx for embedder):

	@crew
	def crew(self) -> Crew:
		"""Creates the ResearchReport crew"""
		return Crew(
			agents=self.agents,
			tasks=[self.determine_requirement_set()],
			knowledge_sources=[
				StringKnowledgeSource(
					content="User's name is John. He is 30 years old and lives in San Francisco."
				)
			],
			embedder={
				"provider": "watson",
				"config": {
					"model": "ibm/slate-125m-english-rtrvr",
					"api_url": WATSONX_URL,
					"api_key": WATSONX_APIKEY,
					"project_id": WATSONX_PROJECT_ID,
				}
			},
			process=Process.sequential,
			verbose=True,
		)

Which is inline with the guidance given here: https://docs.crewai.com/concepts/memory#using-watson-embeddings.

However it always errors and asks me for the OpenAI API key:

  File "/crewai/.venv/lib/python3.10/site-packages/crewai/project/annotations.py", line 112, in wrapper
    crew = func(self, *args, **kwargs)
  File "/crewai/src/research_report/crew.py", line 246, in crew
    StringKnowledgeSource(
  File "/crewai/.venv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 40, in __init__
    self._initialize_app(embedder_config or {})
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 74, in _initialize_app
    self._set_embedder_config(embedder_config)
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 131, in _set_embedder_config
    else self._create_default_embedding_function()
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 115, in _create_default_embedding_function
    return OpenAIEmbeddingFunction(
  File "/crewai/.venv/lib/python3.10/site-packages/chromadb/utils/embedding_functions/openai_embedding_function.py", line 56, in __init__
    raise ValueError(
ValueError: Please provide an OpenAI API key. You can get one at https://platform.openai.com/account/api-keys

Steps to Reproduce

See previous detail

Expected behavior

I expect Watsonx embedding to be used, and not be asked for openAI API key.

Screenshots/Code snippets

Given in description

Operating System

Ubuntu 22.04

Python Version

3.10

crewAI Version

0.83.0

crewAI Tools Version

0.14.0

Virtual Environment

Venv

Evidence

Given in description

Possible Solution

Correctly use the watsonx embedding.

https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/knowledge/storage/knowledge_storage.py#L131
https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/utilities/embedding_configurator.py#L21

Additional context

Might be linked to #1770

If looks like the code tagged as 0.83.0 (https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/crew.py#L283) is configured for crew to have the knowledge parameter, but not the knowledge_sources parameter

    @model_validator(mode="after")
    def create_crew_knowledge(self) -> "Crew":
        if self.knowledge:
            try:
                self.knowledge = Knowledge(**self.knowledge) if isinstance(self.knowledge, dict) else self.knowledge
            except (TypeError, ValueError) as e:
                raise ValueError(f"Invalid knowledge configuration: {str(e)}")
        return self

however, on the main branch (https://github.com/crewAIInc/crewAI/blob/main/src/crewai/crew.py#L201 and https://github.com/crewAIInc/crewAI/blob/main/src/crewai/crew.py#L282) I can see:

...
    knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field(
        default=None,
        description="Knowledge sources for the crew. Add knowledge sources to the knowledge object.",
    )
...
...
    @model_validator(mode="after")
    def create_crew_knowledge(self) -> "Crew":
        """Create the knowledge for the crew."""
        if self.knowledge_sources:
            try:
                if isinstance(self.knowledge_sources, list) and all(
                    isinstance(k, BaseKnowledgeSource) for k in self.knowledge_sources
                ):
                    self._knowledge = Knowledge(
                        sources=self.knowledge_sources,
                        embedder_config=self.embedder,
                        collection_name="crew",
                    )

            except Exception as e:
                self._logger.log(
                    "warning", f"Failed to init knowledge: {e}", color="yellow"
                )
        return self
@mtcolman mtcolman added the bug Something isn't working label Dec 20, 2024
@mtcolman mtcolman changed the title [BUG] Watsonx as embedder is not working [BUG] Watsonx as embedder is not working - errors on loading Dec 20, 2024
@mtcolman mtcolman changed the title [BUG] Watsonx as embedder is not working - errors on loading [BUG] Watsonx as embedder is not working - script errors and stops Dec 20, 2024
@imrohankataria
Copy link

any solutions?

@VictorCostaOliveira
Copy link

I'm having this exact same issue using Ollama, and I'm using crewai version: 0.86.0.
Do we have any solution yet?

@mtcolman
Copy link
Author

mtcolman commented Jan 6, 2025

Sadly this was not fixed by #1804

@mtcolman
Copy link
Author

mtcolman commented Jan 6, 2025

I've applied the similar fix from #1804 to site-packages/crewai/knowledge/source/string_knowledge_source.py file:

from typing import List, Optional

from pydantic import Field

from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

class StringKnowledgeSource(BaseKnowledgeSource):
    """A knowledge source that stores and queries plain text content using embeddings."""

    content: str = Field(...)
    storage: Optional[KnowledgeStorage] = Field(default=None)

(addition of storage: Optional[KnowledgeStorage] = Field(default=None) and that is enabling the crew to now start).

I haven't been able to validate if it works beyond that yet, will test and update comments.

Copy link

github-actions bot commented Feb 6, 2025

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@VictorCostaOliveira
Copy link

Some solution here?

@mtcolman
Copy link
Author

This appears to have been fixed now for crew scoped knowledge_sources. However getting this error when I try at the agent level:

Exception in thread Thread-1 (thread_target):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/main.py", line 69, in thread_target
    asyncio.run(run_async(inputs))
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/main.py", line 75, in run_async
    CrewaiBase().crew().kickoff(inputs=inputs)
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 36, in __init__
    self.map_all_task_variables()
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 203, in map_all_task_variables
    self._map_task_variables(
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 236, in _map_task_variables
    self.tasks_config[task_name]["agent"] = agents[agent_name]()
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/utils.py", line 11, in memoized_func
    cache[key] = func(*args, **kwargs)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/crew.py", line 106, in rag_reader
    return Agent(
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Agent
  Value error, Invalid Knowledge Configuration: Please provide an OpenAI API key. You can get one at https://platform.openai.com/account/api-keys [type=value_error, input_value={'verbose': True, 'llm': ...a Senior Consultant.\n'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

Copy link

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@VictorCostaOliveira
Copy link

Some solution here?

@mtcolman
Copy link
Author

@VictorCostaOliveira - this works for me since moving to 0.105.0

@VictorCostaOliveira
Copy link

@VictorCostaOliveira - this works for me since moving to 0.105.0

Tks :), i will test

@amdjedbens
Copy link

@VictorCostaOliveira @mtcolman did you find a fix? or help out with this similar issue!

When using CrewAI with knowledge sources, I'm encountering an embedding dimension mismatch error if I've previously used a different embedding model in the same project. This appears to happen because CrewAI uses ChromaDB as its default vector database, and ChromaDB enforces consistent embedding dimensions across operations.

[ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models.
Try resetting the collection using `crewai reset-memories -a`

ValueError: Invalid Knowledge Configuration: Embedding dimension mismatch. Make sure you're using the same embedding model across all operations with this collection.
Try resetting the collection using `crewai reset-memories -a`

The issue shows up as a dimension mismatch error (e.g., 768 vs 1536) between current embeddings and previously stored embeddings.

Steps to Reproduce

  1. Create a CrewAI project with agents that use knowledge sources
  2. Run the project with one embedding model (e.g., OpenAI's model with 1536 dimensions)
  3. Change the embedding model to a different one (e.g., Ollama's nomic-embed-text with 768 dimensions)
  4. Run the project again without clearing previous embeddings

Expected Behavior

The project should either:

  • Detect the embedding model change and automatically reset collections
  • Convert embeddings to be compatible
  • Provide a clearer error message with automated recovery

Current Behavior

The project fails with a cryptic ChromaDB error about dimension mismatch that is confusing since there's no clear indication that CrewAI is using ChromaDB under the hood.

I've tried running the suggested command crewai reset-memories -a but didn't work as well

Help Needed

Has anyone encountered this issue and found a reliable solution? I need a way to either:

  1. Properly reset the ChromaDB collections
  2. Configure CrewAI to use a different vector database
  3. Ensure consistent embedding dimensions across runs

Environment

  • CrewAI version: 0.108.0 (latest)
  • Python version: 3.12
  • OS: macOS

Additional Context

This issue typically happens when:

  1. Switching between embedding providers (OpenAI to local models or vice versa)
  2. Changing embedding models within the same provider
  3. Testing different configurations with the same codebase

Any help would be greatly appreciated as this is blocking my development workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants