[问题]: 运行到create_final_entities报错 #994

JKYtydt · 2024-08-21T10:15:12Z

Do you need to file an issue?

I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

运行到create_final_entities发生报错

16:32:28,146 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: list index out of range
Traceback (most recent call last):
  File "/data/anaconda3/envs/graphrag/lib/python3.12/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
    result = await result
             ^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed
    return await _text_embed_in_memory(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory
    result = await strategy_exec(texts, callbacks, cache, strategy_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/strategies/openai.py", line 63, in run
    embeddings = _reconstitute_embeddings(embeddings, input_sizes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/strategies/openai.py", line 172, in _reconstitute_embeddings
    embedding = raw_embeddings[cursor]
                ~~~~~~~~~~~~~~^^^^^^^^
IndexError: list index out of range
16:32:28,164 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "text_embed" in create_final_entities: list index out of range details=None
16:32:28,165 graphrag.index.run ERROR error running workflow create_final_entities

Steps to reproduce

本地运行

大模型服务
CUDA_VISIBLE_DEVICES=3 python -m vllm.entrypoints.openai.api_server --model /sdc/pre_trained_model/Qwen2-7B-Instruct --gpu-memory-utilization 0.6 --port 8003

向量服务
# !/bin/bash

# 启动控制器
python -m fastchat.serve.controller --host 0.0.0.0 --port 21003 &

# 为横型工作进程设置环境变量，并启动
export CUDA_VISIBLE_DEVICES=5
python -m fastchat.serve.model_worker --model-path /sdc/pre_trained_model/bge-large-zh/ --model-names gpt-4 --num-gpus 1 --host 0.0.0.0 --port 21005 --controller-address http://0.0.0.0:21003 &

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8200 --controller-address http://0.0.0.0:21003

python -m graphrag.index --root ./

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: /sdc/pre_trained_model/Qwen2-7B-Instruct
  model_supports_json: false # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://172.70.10.43:8003/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing
embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: gpt-4
    api_base: http://172.70.10.43:8200/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    batch_size: 8 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

Logs and screenshots

建索引报错

向量模型服务报错

Additional Information

GraphRAG Version: 1.0.13.dev49
Operating System:Ubuntu
Python Version:3.12.0
Related Issues:

The text was updated successfully, but these errors were encountered:

AlonsoGuevara · 2024-08-21T17:08:50Z

Hi!
We are consolidating alternate model issues here: #657

JKYtydt added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 21, 2024

AlonsoGuevara closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024

AlonsoGuevara removed the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[问题]: 运行到create_final_entities报错 #994

[问题]: 运行到create_final_entities报错 #994

JKYtydt commented Aug 21, 2024

AlonsoGuevara commented Aug 21, 2024

[问题]: 运行到create_final_entities报错 #994

[问题]: 运行到create_final_entities报错 #994

Comments

JKYtydt commented Aug 21, 2024

Do you need to file an issue?

Describe the issue

Steps to reproduce

GraphRAG Config Used

Logs and screenshots

Additional Information

AlonsoGuevara commented Aug 21, 2024