Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ainsert()的时候维度始终不对 #75

Closed
learningpro opened this issue Oct 21, 2024 · 11 comments
Closed

ainsert()的时候维度始终不对 #75

learningpro opened this issue Oct 21, 2024 · 11 comments

Comments

@learningpro
Copy link

使用源代码中的examples来进行尝试,不论是使用ollama,还是openai,或者是openai兼容的接口,都会报同样的错误:
提示:
/LightRAG/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))

all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 768 and the array at index 1 has size 1024

@LarFii
Copy link
Collaborator

LarFii commented Oct 21, 2024

可以尝试新建一个工作路径再次运行

@amenhere
Copy link

可以尝试新建一个工作路径再次运行

不行,还是这个错误,是有哪里设置为1024了吗

@LarFii
Copy link
Collaborator

LarFii commented Oct 21, 2024

给我看看log吧,还有具体的代码

@Kilimajaro
Copy link

同样的报错,清除工作路径缓存无效:(cail) (base) dell@dell-PowerEdge-R750:~/test/LexiLaw-main/demo$ /home/dell/anaconda3/envs/cail/bin/python /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag_openai_compatible_demo.py
INFO:lightrag:Logger initialized for working directory: /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new
DEBUG:lightrag:LightRAG init with param:
working_dir = /home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 4096, 'max_token_size': 8192, 'func': <function embedding_func at 0x73bddf912170>},
embedding_batch_num = 32,
embedding_func_max_async = 16,
llm_model_func = <function llm_model_func at 0x73bddf911fc0>,
llm_model_name = meta-llama/Llama-3.2-1B-Instruct,
llm_model_max_token_size = 32768,
llm_model_max_async = 16,
key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
vector_db_storage_cls_kwargs = {},
graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
enable_llm_cache = True,
addon_params = {},
convert_response_to_json_func = <function convert_response_to_json at 0x73bc8848a8c0>

INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 4096, 'metric': 'cosine', 'storage_file': '/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/new/vdb_chunks.json'} 0 data
INFO:lightrag:Creating a new event loop in a sub-thread.
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 1 chunks
INFO:lightrag:Inserting 1 vectors to chunks
INFO:httpx:HTTP Request: POST https://dashscope.aliyuncs.com/compatible-mode/v1/embeddings "HTTP/1.1 200 OK"
32842 32842
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag_openai_compatible_demo.py", line 59, in
rag.insert(f.read())
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/home/dell/anaconda3/envs/cail/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/lightrag.py", line 206, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/home/dell/test/LexiLaw-main/demo/LightRAG-main/examples/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
File "/home/dell/anaconda3/envs/cail/lib/python3.10/site-packages/nano_vectordb/dbs.py", line 108, in upsert
self.__storage["matrix"] = np.vstack([self.__storage["matrix"], new_matrix])
File "/home/dell/anaconda3/envs/cail/lib/python3.10/site-packages/numpy/core/shape_base.py", line 289, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4096 and the array at index 1 has size 1536

@amenhere
Copy link

给我看看log吧,还有具体的代码

INFO:httpx:HTTP Request: POST http://192.168.31.100:3001/v1/chat/completions "HTTP/1.1 200 OK"
llm_model_func: I'm just a program, so I don't have feelings, but I'm here and ready to help you! How can I assist you today?
INFO:httpx:HTTP Request: POST http://192.168.31.100:21434/v1/embeddings "HTTP/1.1 200 OK"
embedding_func: [[-0.02448499 0.06301917 0.02070389 ... 0.02287727 0.05131468
-0.03886142]]
INFO:lightrag:Logger initialized for working directory: ./aka
DEBUG:lightrag:LightRAG init with param:
working_dir = ./aka,
chunk_token_size = 1200,
chunk_overlap_token_size = 100,
tiktoken_model_name = gpt-4o-mini,
entity_extract_max_gleaning = 1,
entity_summary_to_max_tokens = 500,
node_embedding_algorithm = node2vec,
node2vec_params = {'dimensions': 1536, 'num_walks': 10, 'walk_length': 40, 'window_size': 2, 'iterations': 3, 'random_seed': 3},
embedding_func = {'embedding_dim': 1792, 'max_token_size': 8192, 'func': <function embedding_func at 0x7fa653b6b490>},
embedding_batch_num = 32,
embedding_func_max_async = 16,
llm_model_func = <function llm_model_func at 0x7fa751ba7d90>,
llm_model_name = meta-llama/Llama-3.2-1B-Instruct,
llm_model_max_token_size = 32768,
llm_model_max_async = 16,
key_string_value_json_storage_cls = <class 'lightrag.storage.JsonKVStorage'>,
vector_db_storage_cls = <class 'lightrag.storage.NanoVectorDBStorage'>,
vector_db_storage_cls_kwargs = {},
graph_storage_cls = <class 'lightrag.storage.NetworkXStorage'>,
enable_llm_cache = True,
addon_params = {},
convert_response_to_json_func = <function convert_response_to_json at 0x7fa6544d6b00>

INFO:lightrag:Load KV full_docs with 0 data
INFO:lightrag:Load KV text_chunks with 0 data
INFO:lightrag:Load KV llm_response_cache with 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_entities.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_relationships.json'} 0 data
INFO:nano-vectordb:Init {'embedding_dim': 1792, 'metric': 'cosine', 'storage_file': './aka/vdb_chunks.json'} 0 data
INFO:lightrag:Creating a new event loop in a sub-thread.
INFO:lightrag:[New Docs] inserting 1 docs
INFO:lightrag:[New Chunks] inserting 2 chunks
INFO:lightrag:Inserting 2 vectors to chunks
INFO:httpx:HTTP Request: POST http://192.168.31.100:21434/v1/embeddings "HTTP/1.1 200 OK"
INFO:lightrag:Writing graph with 0 nodes, 0 edges
Traceback (most recent call last):
File "/root/LightRAG/examples/lightrag_openai_compatible_demo.py", line 61, in
rag.insert(f.read())
File "/root/LightRAG/lightrag/lightrag.py", line 162, in insert
return loop.run_until_complete(self.ainsert(string_or_strings))
File "/root/anaconda3/envs/lightrag/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/root/LightRAG/lightrag/lightrag.py", line 206, in ainsert
await self.chunks_vdb.upsert(inserting_chunks)
File "/root/LightRAG/lightrag/storage.py", line 98, in upsert
results = self._client.upsert(datas=list_data)
File "/root/anaconda3/envs/lightrag/lib/python3.10/site-packages/nano_vectordb/dbs.py", line 108, in upsert
self.__storage["matrix"] = np.vstack([self.__storage["matrix"], new_matrix])
File "/root/anaconda3/envs/lightrag/lib/python3.10/site-packages/numpy/core/shape_base.py", line 289, in vstack
return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting)
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1792 and the array at index 1 has size 1024

@amenhere
Copy link

import os
import asyncio
from lightrag import LightRAG, QueryParam
from lightrag.llm import openai_complete_if_cache, openai_embedding
from lightrag.utils import EmbeddingFunc
import numpy as np
from dotenv import load_dotenv
load_dotenv()

WORKING_DIR = "./aka"

if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)

async def llm_model_func(
prompt, system_prompt=None, history_messages=[], **kwargs
) -> str:
return await openai_complete_if_cache(
"gpt-4o-mini-2024-07-18",
prompt,
system_prompt=system_prompt,
history_messages=history_messages,
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="http://192.168.31.100:3001/v1/",
**kwargs,
)

async def embedding_func(texts: list[str]) -> np.ndarray:
return await openai_embedding(
texts,
model="aerok/xiaobu-embedding-v2:latest", # ZNV-Embedding
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
base_url="http://192.168.31.100:21434/v1/",
)

function test

async def test_funcs():
result = await llm_model_func("How are you?")
print("llm_model_func: ", result)

result = await embedding_func(["How are you?"])
print("embedding_func: ", result)

asyncio.run(test_funcs())

exit()

rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=llm_model_func,
embedding_func=EmbeddingFunc(
embedding_dim=1792, max_token_size=8192, func=embedding_func
),
) #1792 #4096

with open("./book.txt") as f:
rag.insert(f.read())

Perform naive search

print(
rag.query("讲了一个什么故事,主题是什么?", param=QueryParam(mode="naive"))
)

Perform local search

print(
rag.query("讲了一个什么故事,主题是什么?", param=QueryParam(mode="local"))
)

Perform global search

print(
rag.query("讲了一个什么故事,主题是什么?", param=QueryParam(mode="global"))
)

Perform hybrid search

print(
rag.query("讲了一个什么故事,主题是什么?", param=QueryParam(mode="hybrid"))
)

@Kilimajaro
Copy link

类似报错,求大佬解答:《

@nicklhy
Copy link

nicklhy commented Oct 23, 2024

+1

@nicklhy
Copy link

nicklhy commented Oct 23, 2024

Solved this problem by modify the embedding_dim in lightrag_openai_compatible_demo.py.

In my case, I changed the default embedding model to bge-m3 and set the corresponding embedding_dim parameter as below

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=llm_model_func,
    embedding_func=EmbeddingFunc(
        embedding_dim=1024, max_token_size=8192, func=embedding_func
    ),
)

@Keito654
Copy link

I resolved this problem by changing the embedding_dim to match the number following the array at index 1 has size in the error message.
For example, if you encounter an error like:

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1792 and the array at index 1 has size 1024

In this case, change embedding_dim to 1024.

@TianyuFan0504
Copy link
Contributor

可能是embedding_dim没设置对,查询时的emb_dim和你的本地存储的dim不一样

Dormiveglia-elf added a commit to Dormiveglia-elf/LightRAG that referenced this issue Oct 23, 2024
LarFii added a commit that referenced this issue Oct 24, 2024
[hotfix-#75][embedding] Fix the potential embedding problem
@LarFii LarFii closed this as completed Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants