ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} #455

BovineOverlord · 2024-07-09T04:24:47Z

Describe the bug

{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}

Steps to reproduce

I was using a local ollama model to use the tool. It ran fine and loaded the test file before the error occurred.

Expected Behavior

The tool should have proceeded with the following step "create_base_text_units" rather than cease operation. It appears to be a bug with the graphing function.

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: command-r-plus:104b-q4_0
model_supports_json: true # recommended if this is available for your model.

max_tokens: 2000

request_timeout: 180.0

api_base: http://localhost:11434/v1

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 1

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 1 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: qwen2:7b-instruct
# api_base: http://localhost:11434/api
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 1
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 1 # the number of parallel inflight requests that may be made
# batch_size: 1 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

No change to the remainder

Logs and screenshots

Additional Information

GraphRAG Version: Current of this posting
Operating System: Windows 10
Python Version: 3.10
Related Issues:

AlonsoGuevara · 2024-07-09T21:55:39Z

Hi!
My general rule of thumb when facing this issues is:

Check the outputs of the entity extraction, this will show if the graph is empty
If the graph is empty, then it can be either faulty llm responses (unparseable) or, LLM calling failures

Can you please check your cache entries for Entity Extraction to check if the LLM is providing faulty responses?

BovineOverlord · 2024-07-09T22:21:48Z

Entity extraction directory is empty. I attempted with 2 other different models and was met with the same result.

zubu007 · 2024-07-12T01:46:30Z

Facing the same thing. cache/entity_extraction is empty. same exact error in the logs.

huangyuanzhuo-coder · 2024-07-12T03:03:21Z

same error

flikeok · 2024-07-12T07:52:28Z

same error

menghongtao · 2024-07-12T09:04:33Z

same error

CyanMystery · 2024-07-15T06:58:21Z

same error:

this is my indexing-engine.log:
indexing-engine.log

Xls1994 · 2024-07-16T10:19:07Z

same error：
this is my log：
indexing-engine.log

The entity_extraction directory is not empty.

BochenYIN · 2024-07-17T05:27:31Z

same error, Entity extraction directory is empty.

chenfujv · 2024-07-18T02:21:57Z

same error：
But entity_extraction directory is not empty.

chenfujv · 2024-07-18T02:22:28Z

settings.yaml

Bai1026 · 2024-07-19T04:07:12Z

same error lol
But entity_extraction and summarize_descriptions directories are also not empty.

yinjianjie · 2024-07-19T06:27:22Z

same error
why

yurochang · 2024-07-19T10:11:20Z

same problem.

ayanjiushishuai · 2024-07-22T07:57:18Z

+1

kiljos · 2024-07-22T16:30:23Z

+1

natoverse · 2024-07-22T23:24:16Z

Consolidating alternate model issues here: #657

night666e · 2024-08-08T07:43:40Z

面对同样的事情。cache/entity_extraction 为空。日志中出现完全相同的错误。

解决了吗

night666e · 2024-08-08T07:44:42Z

实体提取目录为空。我尝试了其他 2 种不同的模型，得到了相同的结果。

解决了吗

night666e · 2024-08-08T09:07:45Z

描述错误

{“type”： “error”， “data”： “在create_base_entity_graph中执行动词”cluster_graph“时出错：列的长度必须与键相同”， “stack”： “回溯（最近一次调用）：\n 文件 ”C：\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py“， line 410， in _execute_verb\n result = node.verb.func（**verb_args）\n 文件 ”C：\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py“，第 102 行，在 cluster_graph\n output_df[[level_to， to]] = PD。DataFrame（\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\frame.py”，第 4299 行， 在 setitem\n self._setitem_array（键，值）\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\frame.py”，行 4341，在 _setitem_array\n check_key_length（self.columns，键，值）\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py”，第 390 行，在 check_key_length\n 引发 ValueError（“列必须与键的长度相同”）\nValueError：列的长度必须与键相同“， ”source“： ”列的长度必须与键的长度相同“， ”details“： null} {”type“： ”错误“， ”data“： ”运行管道时出错！“， ”stack“： ”回溯（最近一次调用最后一次）：\n 文件 “C：\Program Files\Python310\lib\site-packages\graphrag\index\run.py”，第 323 行，run_pipeline\n 结果 = await workflow.run（context， callbacks）\n 文件 “C：\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py”，第 369 行，运行\n 计时 = 等待self._execute_verb（节点、上下文、回调）\n 文件 “C：\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py”，第 410 行，_execute_verb\n 结果 = node.verb.func（**verb_args）\n 文件“C：\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py“，第 102 行，cluster_graph\n output_df[[level_to， to]] = pd。DataFrame（\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\frame.py”，第 4299 行， 在 setitem\n self._setitem_array（键，值）\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\frame.py”，行 4341，在 _setitem_array\n check_key_length（self.columns，键，值）\n 文件 “C：\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py”，第 390 行，在 check_key_length\n 中引发 ValueError（“列必须与键的长度相同”）\nValueError：列的长度必须与键相同“， ”source“： ”列的长度必须与键的长度相同“， ”details“： null}

重现步骤

我正在使用本地 ollama 模型来使用该工具。它运行良好，并在错误发生之前加载了测试文件。

预期行为

该工具应继续执行以下步骤“create_base_text_units”，而不是停止操作。这似乎是绘图功能的一个错误。

使用的 GraphRAG 配置

encoding_model： cl100k_base skip_workflows： [] LLM： api_key： ${GRAPHRAG_API_KEY} type： openai_chat # 或 azure_openai_chat model： command-r-plus：104b-q4_0 model_supports_json： true # 如果这适用于您的模型，则推荐使用。

max_tokens： 2000

request_timeout： 180.0

api_base： http://localhost:11434/v1

api_version： 2024-02-15-preview

组织机构： <organization_id>

deployment_name： <azure_model_deployment_name>

tokens_per_minute： 150_000 # 设置漏斗油门

requests_per_minute： 10_000 # 设置漏斗油门

max_retries： 1

max_retry_wait：10.0

sleep_on_rate_limit_recommendation： true # 当 Azure 建议等待时间时是否休眠

concurrent_requests： 1 # 可以发出的并行飞行请求的数量

并行化：交错： 0.3

num_threads： 50 # 用于并行处理的线程数

async_mode：threaded # 或 asyncio

嵌入：

并行化：覆盖嵌入的全局并行化设置

async_mode： threaded # 或 asyncio llm： api_key： ${GRAPHRAG_API_KEY} type： openai_embedding # 或 azure_openai_embedding model： qwen2：7b-instruct # api_base： http://localhost:11434/api # api_version： 2024-02-15-preview # 组织： <organization_id> # deployment_name： <azure_model_deployment_name> # tokens_per_minute： 150_000 # 设置漏桶油门 # requests_per_minute： 10_000 # 设置漏桶限制 # max_retries： 1 # max_retry_wait： 10.0 # sleep_on_rate_limit_recommendation： true # 当 Azure 建议等待时间时是否休眠 # concurrent_requests： 1 # 可以发出的并行飞行请求数 # batch_size： 1 # 单次请求中要发送的文档数量# batch_max_tokens： 8191 # 单个请求中发送的最大令牌数 # 目标：必填 # 或可选

其余部分不变

日志和屏幕截图
### 其他信息 * GraphRAG 版本：此帖子的当前内容 * 操作系统：Windows 10 * Python版本：3.10 * 相关问题：

解决了吗，兄弟

night666e · 2024-08-09T07:24:54Z

同样的错误：这是我的日志：indexing-engine.log

entity_extraction目录不为空。

你解决了吗

night666e · 2024-08-09T07:26:02Z

同样的错误：但是entity_extraction目录不是空的。

解决了吗

teneous · 2024-08-09T09:33:50Z

I use openAI GPT-4o-mini，after I reduce chunks size from 1000 to 200 and decrease overlay to 10. it works for me!

chunks:
  size: 200
  overlap: 10
  group_by_columns: [id] # by default, we don't allow chunks to cross documents

Friman04 · 2024-08-09T15:18:18Z

same

maverick001 · 2024-08-12T04:02:18Z

Same issue here. I used gpt-4o-mini, along with default text-embedding-3-small, max_token set to 1700.
Any official solution yet?

FULLK · 2024-12-24T02:15:09Z

I also encountered this issue, and the root cause is that the results extracted by your model are not good enough. On one hand, you can choose a more powerful large model; on the other hand, you can adjust the llm:max_token in the settings.yaml to be smaller, or reduce the chunks:size and overlap as well.

BovineOverlord added bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 9, 2024

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024

natoverse added community_support Issue handled by community members and removed bug Something isn't working triage Default label assignment, indicates new issue needs reviewed by a maintainer labels Jul 22, 2024

ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} #455

ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} #455

Comments

BovineOverlord commented Jul 9, 2024

Describe the bug

Steps to reproduce

Expected Behavior

GraphRAG Config Used

max_tokens: 2000

request_timeout: 180.0

api_base: http://localhost:11434/v1

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 1

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 1 # the number of parallel inflight requests that may be made

num_threads: 50 # the number of threads to use for parallel processing

parallelization: override the global parallelization settings for embeddings

Logs and screenshots

Additional Information

AlonsoGuevara commented Jul 9, 2024

BovineOverlord commented Jul 9, 2024

zubu007 commented Jul 12, 2024

huangyuanzhuo-coder commented Jul 12, 2024

flikeok commented Jul 12, 2024

menghongtao commented Jul 12, 2024

CyanMystery commented Jul 15, 2024

Xls1994 commented Jul 16, 2024

BochenYIN commented Jul 17, 2024

chenfujv commented Jul 18, 2024

chenfujv commented Jul 18, 2024

Bai1026 commented Jul 19, 2024 • edited Loading

yinjianjie commented Jul 19, 2024

yurochang commented Jul 19, 2024

ayanjiushishuai commented Jul 22, 2024

kiljos commented Jul 22, 2024

natoverse commented Jul 22, 2024

night666e commented Aug 8, 2024

night666e commented Aug 8, 2024

night666e commented Aug 8, 2024

描述错误

重现步骤

预期行为

使用的 GraphRAG 配置

max_tokens： 2000

request_timeout： 180.0

api_base： http://localhost:11434/v1

api_version： 2024-02-15-preview

组织机构： <organization_id>

deployment_name： <azure_model_deployment_name>

tokens_per_minute： 150_000 # 设置漏斗油门

requests_per_minute： 10_000 # 设置漏斗油门

max_retries： 1

max_retry_wait：10.0

sleep_on_rate_limit_recommendation： true # 当 Azure 建议等待时间时是否休眠

concurrent_requests： 1 # 可以发出的并行飞行请求的数量

num_threads： 50 # 用于并行处理的线程数

并行化：覆盖嵌入的全局并行化设置

日志和屏幕截图

night666e commented Aug 9, 2024

night666e commented Aug 9, 2024

teneous commented Aug 9, 2024

Friman04 commented Aug 9, 2024

maverick001 commented Aug 12, 2024

FULLK commented Dec 24, 2024

Bai1026 commented Jul 19, 2024 •

edited

Loading