Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[问题]: 运行到create_final_entities报错 #994

Closed
1 of 3 tasks
JKYtydt opened this issue Aug 21, 2024 · 1 comment
Closed
1 of 3 tasks

[问题]: 运行到create_final_entities报错 #994

JKYtydt opened this issue Aug 21, 2024 · 1 comment

Comments

@JKYtydt
Copy link

JKYtydt commented Aug 21, 2024

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.

Describe the issue

运行到create_final_entities发生报错

16:32:28,146 datashaper.workflow.workflow ERROR Error executing verb "text_embed" in create_final_entities: list index out of range
Traceback (most recent call last):
  File "/data/anaconda3/envs/graphrag/lib/python3.12/site-packages/datashaper/workflow/workflow.py", line 415, in _execute_verb
    result = await result
             ^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/text_embed.py", line 105, in text_embed
    return await _text_embed_in_memory(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/text_embed.py", line 130, in _text_embed_in_memory
    result = await strategy_exec(texts, callbacks, cache, strategy_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/strategies/openai.py", line 63, in run
    embeddings = _reconstitute_embeddings(embeddings, input_sizes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sdc/jky/GraphRAG-Local-UI/graphrag/graphrag/index/verbs/text/embed/strategies/openai.py", line 172, in _reconstitute_embeddings
    embedding = raw_embeddings[cursor]
                ~~~~~~~~~~~~~~^^^^^^^^
IndexError: list index out of range
16:32:28,164 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "text_embed" in create_final_entities: list index out of range details=None
16:32:28,165 graphrag.index.run ERROR error running workflow create_final_entities

Steps to reproduce

本地运行

大模型服务
CUDA_VISIBLE_DEVICES=3 python -m vllm.entrypoints.openai.api_server --model /sdc/pre_trained_model/Qwen2-7B-Instruct --gpu-memory-utilization 0.6 --port 8003

向量服务
# !/bin/bash

# 启动控制器
python -m fastchat.serve.controller --host 0.0.0.0 --port 21003 &

# 为横型工作进程设置环境变量,并启动
export CUDA_VISIBLE_DEVICES=5
python -m fastchat.serve.model_worker --model-path /sdc/pre_trained_model/bge-large-zh/ --model-names gpt-4 --num-gpus 1 --host 0.0.0.0 --port 21005 --controller-address http://0.0.0.0:21003 &

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8200 --controller-address http://0.0.0.0:21003

python -m graphrag.index --root ./

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: /sdc/pre_trained_model/Qwen2-7B-Instruct
  model_supports_json: false # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://172.70.10.43:8003/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  # top_p: 1 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing
embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: gpt-4
    api_base: http://172.70.10.43:8200/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    batch_size: 8 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

Logs and screenshots

建索引报错
26d2421058691a940e0187f5f4be81d
向量模型服务报错
d1b1ed44c3f9a06da1653d6b35e7e40

Additional Information

  • GraphRAG Version: 1.0.13.dev49
  • Operating System:Ubuntu
  • Python Version:3.12.0
  • Related Issues:
@JKYtydt JKYtydt added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 21, 2024
@AlonsoGuevara
Copy link
Contributor

Hi!
We are consolidating alternate model issues here: #657

@AlonsoGuevara AlonsoGuevara closed this as not planned Won't fix, can't repro, duplicate, stale Aug 21, 2024
@AlonsoGuevara AlonsoGuevara removed the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants