Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Langchain Chroma bug when add documents #2840

Open
Spycsh opened this issue Feb 7, 2025 · 1 comment
Open

Langchain Chroma bug when add documents #2840

Spycsh opened this issue Feb 7, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Spycsh
Copy link

Spycsh commented Feb 7, 2025

Describe the bug

Hi, I'm using the latest huggingface_hub 0.28.1 and it will crash and show messy embedding output when doing vectordb.add_documents. Fallback to 0.27.1 can fix the issue.

Could you have a look on this issue, thanks!

Reproduction

  1. Start a TEI server
model=BAAI/bge-base-en-v1.5
volume=$PWD/data
docker run -d -p 6060:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $model --auto-truncate
  1. Run the code below with python test_chroma_add_documents.py > err.log 2>&1 (The code basically uses that TEI server as the embdding function of Chroma, and do an simple add_document)
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain.schema import Document
import random

# Function to generate random documents
def generate_random_documents(num_documents=10, doc_length=50):
    documents = []
    for _ in range(num_documents):
        text = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz ', k=doc_length))
        documents.append(Document(page_content=text))
    return documents

# Generate random documents
random_documents = generate_random_documents()

tei_embedding_endpoint = "http://localhost:6060"

vector_db = Chroma(
    embedding_function=HuggingFaceEndpointEmbeddings(model=tei_embedding_endpoint),
)

vector_db.add_documents(random_documents)

Logs

The above code succeeded when I installed huggingface-hub==0.27.1, but failed when installing huggingface-hub==0.28.1. The error log seems to print huge embeddings, but with no explicit errors... And when I save the log in a txt file, I find the error log start as following:


/home/sdp/sihanche/test_chroma_add_documents.py:19: LangChainDeprecationWarning: The class `Chroma` was deprecated in LangChain 0.2.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-chroma package and should be used instead. To use it run `pip install -U :class:`~langchain-chroma` and import as `from :class:`~langchain_chroma import Chroma``.
  vector_db = Chroma(
/root/sihanche_venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'post' (from 'huggingface_hub.inference._client') is deprecated and will be removed from version '0.31.0'. Making direct POST requests to the inference server is not supported anymore. Please use task methods instead (e.g. `InferenceClient.chat_completion`). If your use case is not supported, please open an issue in https://github.com/huggingface/huggingface_hub.
  warnings.warn(warning_message, FutureWarning)
Traceback (most recent call last):
  File "/root/sihanche_venv/lib/python3.12/site-packages/chromadb/api/models/CollectionCommon.py", line 90, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/sihanche_venv/lib/python3.12/site-packages/chromadb/api/models/CollectionCommon.py", line 389, in _validate_and_prepare_upsert_request
    upsert_records = normalize_insert_record_set(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/sihanche_venv/lib/python3.12/site-packages/chromadb/api/types.py", line 187, in normalize_insert_record_set
    base_record_set = normalize_base_record_set(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/sihanche_venv/lib/python3.12/site-packages/chromadb/api/types.py", line 164, in normalize_base_record_set
    embeddings=normalize_embeddings(embeddings),
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/sihanche_venv/lib/python3.12/site-packages/chromadb/api/types.py", line 88, in normalize_embeddings
    raise ValueError(
ValueError: Expected embeddings to be a list of floats or ints, a list of lists, a numpy array, or a list of numpy arrays, got [[[[3.005876302719116, 2.1381373405456543, 0.9481216669082642, 8.15536880493164, 1.4738448858261108, 2.220341682434082, 0.04728309437632561, -1.8110272884368896, -0.5441240668296814, 0.32091569900512695, 1.4771777391433716, 1.627296805381775, 0.9266064167022705, 1.2391126155853271, -0.48665082454681396, -1.1952226161956787, -0.4621836245059967, 2.351332187652588, 0.7196410298347473, -2.577847480773926, 0.8533875346183777, -3.2535433769226074, 1.9740791320800781, 1.375751256942749, 1.3318135738372803, -4.031718730926514, 2.391464948654175, -39.04726028442383, 2.434889078140259, -1.9554494619369507, 1.0033003091812134, -0.6465897560119629, 1.0250786542892456, -1.0325400829315186, -2.477579355239868, 1.0192513465881348, 0.386062353849411, -0.4952068030834198, 1.8315757513046265, -0.8859990835189819, -0.4211207926273346, -2.7653861045837402, -0.8620356321334839, -1.4569767713546753, -1.4128832817077637, 2.05782413482666, -0.11006411910057068, -0.5385697484016418, -1.4141700267791748, 0.5901338458061218, 1.840973138809204, 1.2116774320602417, -2.1155853271484375, -0.9022896885871887, -3.381002187728882, -0.22921833395957947, 1.783563494682312, -1.6340497732162476, 1.26601243019104, 1.12820303440094, 0.825856626033783, 1.2516766786575317, 1.4689359664916992, -0.441506952047348, -2.6451971530914307, -0.11896096169948578, -3.4909539222717285, -1.255056619644165, 0.4156877398490906, 0.9307308793067932, 0.703662633895874, 0.03833493962883949, 1.3183388710021973, -0.1265009045600891, -2.537118434906006, -0.14639760553836823, 1.2940855026245117, -0.6394643187522888, -1.9029948711395264, 0.6988447308540344, -0.4429458975791931, 2.89530611038208, -3.234011650085449, -0.980390191078186, 0.9419354200363159, 1.2371937036514282, 3.685396432876587, 0.49835142493247986, -0.5340624451637268, -3.2260754108428955, 0.3335322439670563, 1.23859703540802, -0.71039217710495, -0.1361103355884552, -0.24770885705947876, 0.7179780006408691, -1.8748948574066162, 1.3112727403640747, -2.4263319969177246, -0.7393266558647156, -0.7231326103210449, -2.8478591442108154, -2.774787187576294, -1.7655466794967651, 2.1261065006256104, 0.43441200256347656, 0.6733825206756592, -0.6575789451599121, -0.4900713860988617, -1.5264394283294678, 2.433136224746704, -1.4786337614059448, -0.07689659297466278, 0.3222745358943939, -0.7179399728775024, -1.0739233493804932, 10.620174407958984, -3.5914392471313477, 0.15320725739002228, 0.1542065292596817, 1.4247353076934814, 0.4752683639526367, 0.08139839768409729, 3.1361100673675537, 0.09998984634876251, -0.7244051098823547, -0.4922061860561371, 1.100483775138855, -1.5067870616912842, -1.0973107814788818, 0.12099336832761765, 1.9025170803070068, 1.7372387647628784, 1.867568016052246, -1.808851718902588, -0.004878419451415539, -0.5992695093154907, 2.5557920932769775, 0.6204664707183838, -0.6507046818733215, 1.3751416206359863, -0.

System info

- huggingface_hub version: 0.28.1
- Platform: Linux-6.8.0-50-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.6.0
- Jinja2: 3.1.5
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 11.1.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 2.2.1
- pydantic: 2.10.5
- aiohttp: 3.11.11
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /root/.cache/huggingface/hub
- HF_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /root/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@julien-c
Copy link
Member

julien-c commented Feb 7, 2025

We might need to open a PR to update langchain_huggingface.HuggingFaceEndpointEmbeddings (and maybe ping a recent huggingface_hub in there)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants