-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null} #455
Comments
Hi!
Can you please check your cache entries for Entity Extraction to check if the LLM is providing faulty responses? |
Entity extraction directory is empty. I attempted with 2 other different models and was met with the same result. |
Facing the same thing. cache/entity_extraction is empty. same exact error in the logs. |
same error |
2 similar comments
same error |
same error |
same error: this is my indexing-engine.log: |
same error: The entity_extraction directory is not empty. |
same error, Entity extraction directory is empty. |
same error lol |
same error |
same problem. |
+1 |
1 similar comment
+1 |
Consolidating alternate model issues here: #657 |
解决了吗 |
解决了吗 |
解决了吗,兄弟 |
你解决了吗 |
same |
Same issue here. I used gpt-4o-mini, along with default text-embedding-3-small, max_token set to 1700. |
I also encountered this issue, and the root cause is that the results extracted by your model are not good enough. On one hand, you can choose a more powerful large model; on the other hand, you can adjust the llm:max_token in the settings.yaml to be smaller, or reduce the chunks:size and overlap as well. |
Describe the bug
{"type": "error", "data": "Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n File "C:\Program Files\Python310\lib\site-packages\datashaper\workflow\workflow.py", line 410, in _execute_verb\n result = node.verb.func(**verb_args)\n File "C:\Program Files\Python310\lib\site-packages\graphrag\index\verbs\graph\clustering\cluster_graph.py", line 102, in cluster_graph\n output_df[[level_to, to]] = pd.DataFrame(\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4299, in setitem\n self._setitem_array(key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\frame.py", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File "C:\Program Files\Python310\lib\site-packages\pandas\core\indexers\utils.py", line 390, in check_key_length\n raise ValueError("Columns must be same length as key")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
Steps to reproduce
I was using a local ollama model to use the tool. It ran fine and loaded the test file before the error occurred.
Expected Behavior
The tool should have proceeded with the following step "create_base_text_units" rather than cease operation. It appears to be a bug with the graphing function.
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: command-r-plus:104b-q4_0
model_supports_json: true # recommended if this is available for your model.
max_tokens: 2000
request_timeout: 180.0
api_base: http://localhost:11434/v1
api_version: 2024-02-15-preview
organization: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 1
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 1 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: qwen2:7b-instruct
# api_base: http://localhost:11434/api
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 1
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 1 # the number of parallel inflight requests that may be made
# batch_size: 1 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
No change to the remainder
Logs and screenshots
Additional Information
The text was updated successfully, but these errors were encountered: