ainsert()的时候维度始终不对 #75

learningpro · 2024-10-21T08:11:05Z

使用源代码中的examples来进行尝试，不论是使用ollama，还是openai，或者是openai兼容的接口，都会报同样的错误：
提示：
/LightRAG/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))

all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 768 and the array at index 1 has size 1024

LarFii · 2024-10-21T08:42:32Z

可以尝试新建一个工作路径再次运行

amenhere · 2024-10-21T08:57:52Z

可以尝试新建一个工作路径再次运行

不行，还是这个错误，是有哪里设置为1024了吗

LarFii · 2024-10-21T09:34:22Z

给我看看log吧，还有具体的代码

Kilimajaro · 2024-10-22T03:02:36Z

同样的报错，清除工作路径缓存无效：(cail) (base) dell@dell-PowerEdge-R750:~/test/LexiLaw-main/demo$ /home/dell/anaconda3/envs/cail/bin/python /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag_openai_compatible_demo.py
INFO:lightrag:Logger initialized for working directory: /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new
DEBUG:lightrag:LightRAG init with param:
working_dir = /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 4096, 'max_token_size': 8192, 'func': <function embedding_func at 0x73bddf912170>},
embedding_batch_num = 32,
embedding_func_max_async = 16,
llm_model_func = <function llm_model_func at 0x73bddf911fc0>,
llm_model_name = meta-llama/Llama-3.2-1B-Instruct,
llm_model_max_token_size = 32768,
llm_model_max_async = 16,
key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
vector_db_storage_cls_kwargs = {},
graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
enable_llm_cache = True,
addon_params = {},
convert_response_to_json_func = <function convert_response_to_json at 0x73bc8848a8c0>

INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_chunks.json'} 0 data
INFO:lightrag:Creating a new event loop in a sub-thread.
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 1 chunks
INFO:lightrag:Inserting 1 vectors to chunks
INFO:httpx:HTTP Request: POST https://dashscope.aliyuncs.com/compatible-mode/v1/embeddings "HTTP/1.1 200 OK"
32842 32842
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag_openai_compatible_demo.py", line 59, in
rag.insert(f.read())
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/home/dell/anaconda3/envs/cail/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/lightrag.py", line 206, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
File "/home/dell/anaconda3/envs/cail/lib/python3.10/site-packages/nano_vectordb/dbs.py", line 108, in upsert
self.__storage["matrix"] = np.vstack([self.__storage["matrix"], new_matrix])
File "/home/dell/anaconda3/envs/cail/lib/python3.10/site-packages/numpy/core/shape_base.py", line 289, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4096 and the array at index 1 has size 1536

amenhere · 2024-10-22T06:58:56Z

给我看看log吧，还有具体的代码

INFO:httpx:HTTP Request: POST http://192.168.31.100:3001/v1/chat/completions "HTTP/1.1 200 OK"
llm_model_func: I'm just a program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today?
INFO:httpx:HTTP Request: POST http://192.168.31.100:21434/v1/embeddings "HTTP/1.1 200 OK"
embedding_func: [[-0.02448499 0.06301917 0.02070389 ... 0.02287727 0.05131468
-0.03886142]]
INFO:lightrag:Logger initialized for working directory: ./aka
DEBUG:lightrag:LightRAG init with param:
working_dir = ./aka,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 1792, 'max_token_size': 8192, 'func': <function embedding_func at 0x7fa653b6b490>},
embedding_batch_num = 32,
embedding_func_max_async = 16,
llm_model_func = <function llm_model_func at 0x7fa751ba7d90>,
llm_model_name = meta-llama/Llama-3.2-1B-Instruct,
llm_model_max_token_size = 32768,
llm_model_max_async = 16,
key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
vector_db_storage_cls_kwargs = {},
graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
enable_llm_cache = True,
addon_params = {},
convert_response_to_json_func = <function convert_response_to_json at 0x7fa6544d6b00>

INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_chunks.json'} 0 data
INFO:lightrag:Creating a new event loop in a sub-thread.
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 2 chunks
INFO:lightrag:Inserting 2 vectors to chunks
INFO:httpx:HTTP Request: POST http://192.168.31.100:21434/v1/embeddings "HTTP/1.1 200 OK"
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/root/LightRAG/examples/lightrag_openai_compatible_demo.py", line 61, in
rag.insert(f.read())
File "/root/LightRAG/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/root/anaconda3/envs/lightrag/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/root/LightRAG/lightrag/lightrag.py", line 206, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/root/LightRAG/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
File "/root/anaconda3/envs/lightrag/lib/python3.10/site-packages/nano_vectordb/dbs.py", line 108, in upsert
self.__storage["matrix"] = np.vstack([self.__storage["matrix"], new_matrix])
File "/root/anaconda3/envs/lightrag/lib/python3.10/site-packages/numpy/core/shape_base.py", line 289, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1792 and the array at index 1 has size 1024

amenhere · 2024-10-22T07:00:32Z

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete_if_cache, openai_embedding
from lightrag.utils import EmbeddingFunc
import numpy as np
from dotenv import load_dotenv
load_dotenv()

WORKING_DIR = "./aka"

if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)

async def llm_model_func(
prompt, system_prompt=None, history_messages=[], **kwargs
) -> str:
return await openai_complete_if_cache(
"gpt-4o-mini-2024-07-18",
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="http://192.168.31.100:3001/v1/",
**kwargs,
)

async def embedding_func(texts: list[str]) -> np.ndarray:
return await openai_embedding(
texts,
model="aerok/xiaobu-embedding-v2:latest", # ZNV-Embedding
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="http://192.168.31.100:21434/v1/",
)

function test

async def test_funcs():
result = await llm_model_func("How are you?")
print("llm_model_func: ", result)

result = await embedding_func(["How are you?"])
print("embedding_func: ", result)

asyncio.run(test_funcs())

exit()

rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=llm_model_func,
embedding_func=EmbeddingFunc(
embedding_dim=1792, max_token_size=8192, func=embedding_func
),
) #1792 #4096

with open("./book.txt") as f:
rag.insert(f.read())

Perform naive search

print(
rag.query("讲了一个什么故事，主题是什么？", param=QueryParam(mode="naive"))
)

Perform local search

print(
rag.query("讲了一个什么故事，主题是什么？", param=QueryParam(mode="local"))
)

Perform global search

print(
rag.query("讲了一个什么故事，主题是什么？", param=QueryParam(mode="global"))
)

Perform hybrid search

print(
rag.query("讲了一个什么故事，主题是什么？", param=QueryParam(mode="hybrid"))
)

Kilimajaro · 2024-10-22T07:52:16Z

类似报错，求大佬解答:《

nicklhy · 2024-10-23T01:52:29Z

+1

nicklhy · 2024-10-23T03:00:28Z

Solved this problem by modify the embedding_dim in lightrag_openai_compatible_demo.py.

In my case, I changed the default embedding model to bge-m3 and set the corresponding embedding_dim parameter as below

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=llm_model_func,
    embedding_func=EmbeddingFunc(
        embedding_dim=1024, max_token_size=8192, func=embedding_func
    ),
)

Keito654 · 2024-10-23T04:27:09Z

I resolved this problem by changing the embedding_dim to match the number following the array at index 1 has size in the error message.
For example, if you encounter an error like:

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1792 and the array at index 1 has size 1024

In this case, change embedding_dim to 1024.

TianyuFan0504 · 2024-10-23T07:22:28Z

可能是embedding_dim没设置对，查询时的emb_dim和你的本地存储的dim不一样

[hotfix-#75][embedding] Fix the potential embedding problem

Dormiveglia-elf added a commit to Dormiveglia-elf/LightRAG that referenced this issue Oct 23, 2024

[hotfix-HKUDS#75][embedding] Fix the potential embedding problem

f0856b9

Dormiveglia-elf mentioned this issue Oct 23, 2024

[hotfix-#75][embedding] Fix the potential embedding problem #116

Merged

LarFii added a commit that referenced this issue Oct 24, 2024

Merge pull request #116 from Dormiveglia-elf/hotfix/embedding-dim

1a25c85

[hotfix-#75][embedding] Fix the potential embedding problem

LarFii closed this as completed Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ainsert()的时候维度始终不对 #75

ainsert()的时候维度始终不对 #75

learningpro commented Oct 21, 2024

LarFii commented Oct 21, 2024

amenhere commented Oct 21, 2024

LarFii commented Oct 21, 2024

Kilimajaro commented Oct 22, 2024

amenhere commented Oct 22, 2024

amenhere commented Oct 22, 2024

Kilimajaro commented Oct 22, 2024

nicklhy commented Oct 23, 2024

nicklhy commented Oct 23, 2024

Keito654 commented Oct 23, 2024

TianyuFan0504 commented Oct 23, 2024

ainsert()的时候维度始终不对 #75

ainsert()的时候维度始终不对 #75

Comments

learningpro commented Oct 21, 2024

LarFii commented Oct 21, 2024

amenhere commented Oct 21, 2024

LarFii commented Oct 21, 2024

Kilimajaro commented Oct 22, 2024

amenhere commented Oct 22, 2024

amenhere commented Oct 22, 2024

function test

exit()

Perform naive search

Perform local search

Perform global search

Perform hybrid search

Kilimajaro commented Oct 22, 2024

nicklhy commented Oct 23, 2024

nicklhy commented Oct 23, 2024

Keito654 commented Oct 23, 2024

TianyuFan0504 commented Oct 23, 2024