Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

启动 xinference 在加载模型时卡住不动 #2403

Closed
1 of 3 tasks
moqimoqidea opened this issue Oct 8, 2024 · 4 comments
Closed
1 of 3 tasks

启动 xinference 在加载模型时卡住不动 #2403

moqimoqidea opened this issue Oct 8, 2024 · 4 comments
Labels
Milestone

Comments

@moqimoqidea
Copy link

System Info / 系統信息

  • CUDA Version: 12.4
  • NVIDIA-SMI 535.54.03
  • NVIDIA A800-SXM4-80GB

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

xinference, version 0.15.1

The command used to start Xinference / 用以启动 xinference 的命令

docker run -d \
  --name xinference \
  -p 9997:9997 \
  xprobe/xinference:v0.15.1 \
  bash -c "xinference-local -H 0.0.0.0 --log-level debug & sleep 20 && xinference launch --model-name bge-reranker-large --model-type rerank --replica 2 && tail -f /dev/null"

Reproduction / 复现过程

执行 docker 启动后日志显示在加载模型,模型之前已经下载过。之前可以启动成功,不知道为啥现在启动不成功了。

日志:

2024-10-08 19:48:58,643 xinference.core.supervisor 278 INFO     Xinference supervisor 0.0.0.0:22390 started
2024-10-08 19:48:58,726 xinference.core.worker 278 INFO     Starting metrics export server at 0.0.0.0:None
2024-10-08 19:48:58,728 xinference.core.worker 278 INFO     Checking metrics export server...
2024-10-08 19:49:00,208 xinference.core.worker 278 INFO     Metrics server is started at: http://0.0.0.0:43763
2024-10-08 19:49:00,209 xinference.core.worker 278 INFO     Purge cache directory: /root/.xinference/cache
2024-10-08 19:49:00,210 xinference.core.supervisor 278 DEBUG    [request 4ed14b44-856b-11ef-9106-00163e788596] Enter add_worker, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>,0.0.0.0:22390, kwargs: 
2024-10-08 19:49:00,211 xinference.core.supervisor 278 DEBUG    Worker 0.0.0.0:22390 has been added successfully
2024-10-08 19:49:00,211 xinference.core.supervisor 278 DEBUG    [request 4ed14b44-856b-11ef-9106-00163e788596] Leave add_worker, elapsed time: 0 s
2024-10-08 19:49:00,211 xinference.core.worker 278 INFO     Connected to supervisor as a fresh worker
2024-10-08 19:49:00,221 xinference.core.worker 278 INFO     Xinference worker 0.0.0.0:22390 started
2024-10-08 19:49:00,224 xinference.core.supervisor 278 DEBUG    Worker 0.0.0.0:22390 resources: {'cpu': ResourceStatus(usage=0.0, total=128, memory_used=37924220928, memory_available=2112589045760, memory_total=2164168032256), 'gpu-0': GPUStatus(mem_total=85899345920, mem_free=85168685056, mem_used=730660864)}
2024-10-08 19:49:03,639 xinference.core.supervisor 278 DEBUG    Enter get_status, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>, kwargs: 
2024-10-08 19:49:03,639 xinference.core.supervisor 278 DEBUG    Leave get_status, elapsed time: 0 s
sleep finish
2024-10-08 19:49:04,771 xinference.api.restful_api 143 INFO     Starting Xinference at endpoint: http://0.0.0.0:9997
2024-10-08 19:49:04,908 uvicorn.error 143 INFO     Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit)
Launch model name: bge-reranker-large with kwargs: {}
2024-10-08 19:49:08,025 xinference.core.supervisor 278 DEBUG    Enter launch_builtin_model, model_uid: bge-reranker-large, model_name: bge-reranker-large, model_size: , model_format: None, quantization: None, replica: 2, kwargs: {'trust_remote_code': True}
2024-10-08 19:49:08,025 xinference.core.worker 278 DEBUG    Enter get_model_count, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: 
2024-10-08 19:49:08,026 xinference.core.worker 278 DEBUG    Leave get_model_count, elapsed time: 0 s
2024-10-08 19:49:08,026 xinference.core.worker 278 INFO     [request 5379d170-856b-11ef-9106-00163e788596] Enter launch_builtin_model, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: model_uid=bge-reranker-large-2-0,model_name=bge-reranker-large,model_size_in_billions=None,model_format=None,quantization=None,model_engine=None,model_type=rerank,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=None,download_hub=None,model_path=None,trust_remote_code=True
2024-10-08 19:49:08,026 xinference.core.worker 278 DEBUG    GPU selected: [0] for model bge-reranker-large-2-0
2024-10-08 19:49:12,385 xinference.model.rerank.core 278 DEBUG    Rerank model bge-reranker-large found in ModelScope.
2024-10-08 19:50:06,123 xinference.core.supervisor 278 DEBUG    [request 761ab046-856b-11ef-9106-00163e788596] Enter list_model_registrations, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>,LLM, kwargs: detailed=True
2024-10-08 19:50:06,219 xinference.core.supervisor 278 DEBUG    [request 761ab046-856b-11ef-9106-00163e788596] Leave list_model_registrations, elapsed time: 0 s
2024-10-08 19:50:07,412 xinference.core.supervisor 278 DEBUG    [request 76df73f4-856b-11ef-9106-00163e788596] Enter list_model_registrations, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>,rerank, kwargs: detailed=True
2024-10-08 19:50:07,414 xinference.core.supervisor 278 DEBUG    [request 76df73f4-856b-11ef-9106-00163e788596] Leave list_model_registrations, elapsed time: 0 s
2024-10-08 19:50:12,033 xinference.core.supervisor 278 DEBUG    [request 79a09a82-856b-11ef-9106-00163e788596] Enter list_models, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>, kwargs: 
2024-10-08 19:50:12,034 xinference.core.worker 278 DEBUG    [request 79a0a496-856b-11ef-9106-00163e788596] Enter list_models, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: 
2024-10-08 19:50:12,034 xinference.core.worker 278 DEBUG    [request 79a0a496-856b-11ef-9106-00163e788596] Leave list_models, elapsed time: 0 s
2024-10-08 19:50:12,034 xinference.core.supervisor 278 DEBUG    [request 79a09a82-856b-11ef-9106-00163e788596] Leave list_models, elapsed time: 0 s
2024-10-08 19:50:14,236 xinference.core.supervisor 278 DEBUG    [request 7af0a594-856b-11ef-9106-00163e788596] Enter list_model_registrations, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>,LLM, kwargs: detailed=True
2024-10-08 19:50:14,331 xinference.core.supervisor 278 DEBUG    [request 7af0a594-856b-11ef-9106-00163e788596] Leave list_model_registrations, elapsed time: 0 s
2024-10-08 19:50:18,570 xinference.core.supervisor 278 DEBUG    [request 7d85f480-856b-11ef-9106-00163e788596] Enter list_model_registrations, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>,embedding, kwargs: detailed=True
2024-10-08 19:50:18,578 xinference.core.supervisor 278 DEBUG    [request 7d85f480-856b-11ef-9106-00163e788596] Leave list_model_registrations, elapsed time: 0 s
2024-10-08 19:50:24,968 xinference.core.supervisor 278 DEBUG    Enter launch_builtin_model, model_uid: bce-embedding-base_v1, model_name: bce-embedding-base_v1, model_size: , model_format: None, quantization: None, replica: 1, kwargs: {}
2024-10-08 19:50:24,969 xinference.core.worker 278 DEBUG    Enter get_model_count, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: 
2024-10-08 19:50:24,969 xinference.core.worker 278 DEBUG    Leave get_model_count, elapsed time: 0 s
2024-10-08 19:50:24,969 xinference.core.worker 278 INFO     [request 8156756c-856b-11ef-9106-00163e788596] Enter launch_builtin_model, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: model_uid=bce-embedding-base_v1-1-0,model_name=bce-embedding-base_v1,model_size_in_billions=None,model_format=None,quantization=None,model_engine=None,model_type=embedding,n_gpu=auto,request_limits=None,peft_model_config=None,gpu_idx=None,download_hub=None,model_path=None
2024-10-08 19:50:24,970 xinference.core.worker 278 DEBUG    GPU selected: [0] for model bce-embedding-base_v1-1-0
2024-10-08 19:50:29,341 xinference.model.embedding.core 278 DEBUG    Embedding model bce-embedding-base_v1 found in ModelScope.
2024-10-08 19:52:59,824 xinference.core.supervisor 278 DEBUG    [request dda361cc-856b-11ef-9106-00163e788596] Enter list_models, args: <xinference.core.supervisor.SupervisorActor object at 0x7fe96d83b970>, kwargs: 
2024-10-08 19:52:59,824 xinference.core.worker 278 DEBUG    [request dda36b22-856b-11ef-9106-00163e788596] Enter list_models, args: <xinference.core.worker.WorkerActor object at 0x7fe96d83b920>, kwargs: 
2024-10-08 19:52:59,824 xinference.core.worker 278 DEBUG    [request dda36b22-856b-11ef-9106-00163e788596] Leave list_models, elapsed time: 0 s
2024-10-08 19:52:59,824 xinference.core.supervisor 278 DEBUG    [request dda361cc-856b-11ef-9106-00163e788596] Leave list_models, elapsed time: 0 s

Expected behavior / 期待表现

成功启动;或者启动有问题直接报错了,而不是卡住。

谢谢。

@XprobeBot XprobeBot added the gpu label Oct 8, 2024
@XprobeBot XprobeBot added this to the v0.15 milestone Oct 8, 2024
@948024326
Copy link

请问解决了吗? 我后面也是要用a800的卡 看了两个人提issues有模型启动问题了- -

@moqimoqidea
Copy link
Author

请问解决了吗? 我后面也是要用a800的卡 看了两个人提issues有模型启动问题了- -

我自己测试主要还是网络问题,配置模型启动时携带指定模型路径会大大降低这种情况,例如:

xinference launch --model-name bge-reranker-large --model-type rerank --replica 3 --model_path /root/.cache/modelscope/hub/Xorbits/bge-reranker-large

@948024326
Copy link

请问解决了吗? 我后面也是要用a800的卡 看了两个人提issues有模型启动问题了- -

我自己测试主要还是网络问题,配置模型启动时携带指定模型路径会大大降低这种情况,例如:

xinference launch --model-name bge-reranker-large --model-type rerank --replica 3 --model_path /root/.cache/modelscope/hub/Xorbits/bge-reranker-large

哦哦好的 我刚再去问了下 好像其他的人没遇到过这样问题 都正常 我准备用a800部署下 qwen2.5-72b-instruct vllm

@moqimoqidea
Copy link
Author

moqimoqidea commented Nov 1, 2024

请问解决了吗? 我后面也是要用a800的卡 看了两个人提issues有模型启动问题了- -

我自己测试主要还是网络问题,配置模型启动时携带指定模型路径会大大降低这种情况,例如:
xinference launch --model-name bge-reranker-large --model-type rerank --replica 3 --model_path /root/.cache/modelscope/hub/Xorbits/bge-reranker-large

哦哦好的 我刚再去问了下 好像其他的人没遇到过这样问题 都正常 我准备用a800部署下 qwen2.5-72b-instruct vllm

如果是运行 qwen2.5-72b-instruct 这种 chat model, 我建议你可以尝试例如 ollama 会比较成熟,因为模型体积比较大一张 a800 也只能部署一个。在我的场景里使用 xinference 主要是对一些 embedding/rerank 类型模型支持的比较好。

仅供参考哈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants