[BUG] Watsonx as embedder is not working - script errors and stops #1790

mtcolman · 2024-12-20T15:57:17Z

Description

I'm following https://docs.crewai.com/concepts/knowledge#embedder-configuration and it states:

Embedder Configuration
You can also configure the embedder for the knowledge store. This is useful if you want to use a different embedder for the knowledge store than the one used for the agents.
...
string_source = StringKnowledgeSource(
   content="Users name is John. He is 30 years old and lives in San Francisco.",
)
crew = Crew(
   ...
   knowledge_sources=[string_source],
   embedder={
       "provider": "openai",
       "config": {"model": "text-embedding-3-small"},
   },
)

I try running my crew with this configuration (as I want to use Watsonx for embedder):

	@crew
	def crew(self) -> Crew:
		"""Creates the ResearchReport crew"""
		return Crew(
			agents=self.agents,
			tasks=[self.determine_requirement_set()],
			knowledge_sources=[
				StringKnowledgeSource(
					content="User's name is John. He is 30 years old and lives in San Francisco."
				)
			],
			embedder={
				"provider": "watson",
				"config": {
					"model": "ibm/slate-125m-english-rtrvr",
					"api_url": WATSONX_URL,
					"api_key": WATSONX_APIKEY,
					"project_id": WATSONX_PROJECT_ID,
				}
			},
			process=Process.sequential,
			verbose=True,
		)

Which is inline with the guidance given here: https://docs.crewai.com/concepts/memory#using-watson-embeddings.

However it always errors and asks me for the OpenAI API key:

  File "/crewai/.venv/lib/python3.10/site-packages/crewai/project/annotations.py", line 112, in wrapper
    crew = func(self, *args, **kwargs)
  File "/crewai/src/research_report/crew.py", line 246, in crew
    StringKnowledgeSource(
  File "/crewai/.venv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 40, in __init__
    self._initialize_app(embedder_config or {})
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 74, in _initialize_app
    self._set_embedder_config(embedder_config)
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 131, in _set_embedder_config
    else self._create_default_embedding_function()
  File "/crewai/.venv/lib/python3.10/site-packages/crewai/knowledge/storage/knowledge_storage.py", line 115, in _create_default_embedding_function
    return OpenAIEmbeddingFunction(
  File "/crewai/.venv/lib/python3.10/site-packages/chromadb/utils/embedding_functions/openai_embedding_function.py", line 56, in __init__
    raise ValueError(
ValueError: Please provide an OpenAI API key. You can get one at https://platform.openai.com/account/api-keys

Steps to Reproduce

See previous detail

Expected behavior

I expect Watsonx embedding to be used, and not be asked for openAI API key.

Screenshots/Code snippets

Given in description

Operating System

Ubuntu 22.04

Python Version

3.10

crewAI Version

0.83.0

crewAI Tools Version

0.14.0

Virtual Environment

Venv

Evidence

Given in description

Possible Solution

Correctly use the watsonx embedding.

https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/knowledge/storage/knowledge_storage.py#L131
https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/utilities/embedding_configurator.py#L21

Additional context

Might be linked to #1770

If looks like the code tagged as 0.83.0 (https://github.com/crewAIInc/crewAI/blob/v0.83.0/src/crewai/crew.py#L283) is configured for crew to have the knowledge parameter, but not the knowledge_sources parameter

    @model_validator(mode="after")
    def create_crew_knowledge(self) -> "Crew":
        if self.knowledge:
            try:
                self.knowledge = Knowledge(**self.knowledge) if isinstance(self.knowledge, dict) else self.knowledge
            except (TypeError, ValueError) as e:
                raise ValueError(f"Invalid knowledge configuration: {str(e)}")
        return self

however, on the main branch (https://github.com/crewAIInc/crewAI/blob/main/src/crewai/crew.py#L201 and https://github.com/crewAIInc/crewAI/blob/main/src/crewai/crew.py#L282) I can see:

...
    knowledge_sources: Optional[List[BaseKnowledgeSource]] = Field(
        default=None,
        description="Knowledge sources for the crew. Add knowledge sources to the knowledge object.",
    )
...
...
    @model_validator(mode="after")
    def create_crew_knowledge(self) -> "Crew":
        """Create the knowledge for the crew."""
        if self.knowledge_sources:
            try:
                if isinstance(self.knowledge_sources, list) and all(
                    isinstance(k, BaseKnowledgeSource) for k in self.knowledge_sources
                ):
                    self._knowledge = Knowledge(
                        sources=self.knowledge_sources,
                        embedder_config=self.embedder,
                        collection_name="crew",
                    )

            except Exception as e:
                self._logger.log(
                    "warning", f"Failed to init knowledge: {e}", color="yellow"
                )
        return self

The text was updated successfully, but these errors were encountered:

imrohankataria · 2024-12-24T22:54:13Z

any solutions?

VictorCostaOliveira · 2025-01-04T06:32:11Z

I'm having this exact same issue using Ollama, and I'm using crewai version: 0.86.0.
Do we have any solution yet?

mtcolman · 2025-01-06T09:24:06Z

Sadly this was not fixed by #1804

mtcolman · 2025-01-06T09:46:19Z

I've applied the similar fix from #1804 to site-packages/crewai/knowledge/source/string_knowledge_source.py file:

from typing import List, Optional

from pydantic import Field

from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage

class StringKnowledgeSource(BaseKnowledgeSource):
    """A knowledge source that stores and queries plain text content using embeddings."""

    content: str = Field(...)
    storage: Optional[KnowledgeStorage] = Field(default=None)

(addition of storage: Optional[KnowledgeStorage] = Field(default=None) and that is enabling the crew to now start).

I haven't been able to validate if it works beyond that yet, will test and update comments.

github-actions · 2025-02-06T12:17:08Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

VictorCostaOliveira · 2025-02-06T13:59:59Z

Some solution here?

mtcolman · 2025-02-19T14:55:50Z

This appears to have been fixed now for crew scoped knowledge_sources. However getting this error when I try at the agent level:

Exception in thread Thread-1 (thread_target):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/main.py", line 69, in thread_target
    asyncio.run(run_async(inputs))
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/main.py", line 75, in run_async
    CrewaiBase().crew().kickoff(inputs=inputs)
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 36, in __init__
    self.map_all_task_variables()
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 203, in map_all_task_variables
    self._map_task_variables(
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/crew_base.py", line 236, in _map_task_variables
    self.tasks_config[task_name]["agent"] = agents[agent_name]()
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/crewai/project/utils.py", line 11, in memoized_func
    cache[key] = func(*args, **kwargs)
  File "/GitHub/crewai_base/crewai_base/src/crewai_base/crew.py", line 106, in rag_reader
    return Agent(
  File "/GitHub/crewai_base/crewai_base/.venv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Agent
  Value error, Invalid Knowledge Configuration: Please provide an OpenAI API key. You can get one at https://platform.openai.com/account/api-keys [type=value_error, input_value={'verbose': True, 'llm': ...a Senior Consultant.\n'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

github-actions · 2025-03-22T12:16:51Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

VictorCostaOliveira · 2025-03-24T13:29:23Z

Some solution here?

mtcolman · 2025-03-24T13:30:57Z

@VictorCostaOliveira - this works for me since moving to 0.105.0

VictorCostaOliveira · 2025-03-24T13:55:00Z

@VictorCostaOliveira - this works for me since moving to 0.105.0

Tks :), i will test

amdjedbens · 2025-03-25T13:43:27Z

@VictorCostaOliveira @mtcolman did you find a fix? or help out with this similar issue!

When using CrewAI with knowledge sources, I'm encountering an embedding dimension mismatch error if I've previously used a different embedding model in the same project. This appears to happen because CrewAI uses ChromaDB as its default vector database, and ChromaDB enforces consistent embedding dimensions across operations.

[ERROR]: Embedding dimension mismatch. This usually happens when mixing different embedding models.
Try resetting the collection using `crewai reset-memories -a`

ValueError: Invalid Knowledge Configuration: Embedding dimension mismatch. Make sure you're using the same embedding model across all operations with this collection.
Try resetting the collection using `crewai reset-memories -a`

The issue shows up as a dimension mismatch error (e.g., 768 vs 1536) between current embeddings and previously stored embeddings.

Steps to Reproduce

Create a CrewAI project with agents that use knowledge sources
Run the project with one embedding model (e.g., OpenAI's model with 1536 dimensions)
Change the embedding model to a different one (e.g., Ollama's nomic-embed-text with 768 dimensions)
Run the project again without clearing previous embeddings

Expected Behavior

The project should either:

Detect the embedding model change and automatically reset collections
Convert embeddings to be compatible
Provide a clearer error message with automated recovery

Current Behavior

The project fails with a cryptic ChromaDB error about dimension mismatch that is confusing since there's no clear indication that CrewAI is using ChromaDB under the hood.

I've tried running the suggested command crewai reset-memories -a but didn't work as well

Help Needed

Has anyone encountered this issue and found a reliable solution? I need a way to either:

Properly reset the ChromaDB collections
Configure CrewAI to use a different vector database
Ensure consistent embedding dimensions across runs

Environment

CrewAI version: 0.108.0 (latest)
Python version: 3.12
OS: macOS

Additional Context

This issue typically happens when:

Switching between embedding providers (OpenAI to local models or vice versa)
Changing embedding models within the same provider
Testing different configurations with the same codebase

Any help would be greatly appreciated as this is blocking my development workflow.

mtcolman added the bug Something isn't working label Dec 20, 2024

mtcolman changed the title ~~[BUG] Watsonx as embedder is not working~~ [BUG] Watsonx as embedder is not working - errors on loading Dec 20, 2024

mtcolman changed the title ~~[BUG] Watsonx as embedder is not working - errors on loading~~ [BUG] Watsonx as embedder is not working - script errors and stops Dec 20, 2024

mtcolman mentioned this issue Dec 24, 2024

[BUG] Default embedder initialization attempts OpenAI API key even with Azure configuration #1797

Closed

ericklima-ca mentioned this issue Dec 27, 2024

fix: Change storage initialization to None for KnowledgeStorage #1804

Merged

github-actions bot added the no-issue-activity label Feb 6, 2025

github-actions bot removed the no-issue-activity label Feb 7, 2025

github-actions bot added the no-issue-activity label Mar 22, 2025

github-actions bot removed the no-issue-activity label Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Watsonx as embedder is not working - script errors and stops #1790

[BUG] Watsonx as embedder is not working - script errors and stops #1790

mtcolman commented Dec 20, 2024 •

edited

Loading

imrohankataria commented Dec 24, 2024

VictorCostaOliveira commented Jan 4, 2025

mtcolman commented Jan 6, 2025

mtcolman commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Feb 6, 2025

VictorCostaOliveira commented Feb 6, 2025

mtcolman commented Feb 19, 2025

github-actions bot commented Mar 22, 2025

VictorCostaOliveira commented Mar 24, 2025

mtcolman commented Mar 24, 2025

VictorCostaOliveira commented Mar 24, 2025

amdjedbens commented Mar 25, 2025

[BUG] Watsonx as embedder is not working - script errors and stops #1790

[BUG] Watsonx as embedder is not working - script errors and stops #1790

Comments

mtcolman commented Dec 20, 2024 • edited Loading

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

imrohankataria commented Dec 24, 2024

VictorCostaOliveira commented Jan 4, 2025

mtcolman commented Jan 6, 2025

mtcolman commented Jan 6, 2025 • edited Loading

github-actions bot commented Feb 6, 2025

VictorCostaOliveira commented Feb 6, 2025

mtcolman commented Feb 19, 2025

github-actions bot commented Mar 22, 2025

VictorCostaOliveira commented Mar 24, 2025

mtcolman commented Mar 24, 2025

VictorCostaOliveira commented Mar 24, 2025

amdjedbens commented Mar 25, 2025

Steps to Reproduce

Expected Behavior

Current Behavior

Help Needed

Environment

Additional Context

mtcolman commented Dec 20, 2024 •

edited

Loading

mtcolman commented Jan 6, 2025 •

edited

Loading