Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker运行报错:multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method #305

Closed
2 tasks done
syusama opened this issue Aug 14, 2024 · 3 comments

Comments

@syusama
Copy link

syusama commented Aug 14, 2024

提交前必须检查以下项目 | The following items must be checked before submission

  • 请确保使用的是仓库最新代码(git pull),一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
  • 我已阅读项目文档FAQ章节并且已在Issue中对问题进行了搜索,没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

Ubuntu上docker-compose部署Qwen2-7B-Instruct报错

(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 123, in init_device
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]     torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 420, in set_device
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]     torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]   File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 300, in _lazy_init
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226]     raise RuntimeError(
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
(VllmWorkerProcess pid=294) ERROR 08-14 16:36:35 multiproc_worker_utils.py:226] 

其中关键信息是
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Dependencies

# 请在此处粘贴依赖情况
# Please paste the dependencies here

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here

微信截图_20240815003843

@syusama syusama changed the title ERROR 08-14 16:36:35 multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method Aug 14, 2024
@syusama syusama changed the title multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method docker运行报错:multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method Aug 14, 2024
@xusenlinzy
Copy link
Owner

单卡可以启动吗

@syusama
Copy link
Author

syusama commented Aug 16, 2024

单卡可以启动吗

单卡的话不会报上面这个错误了,但是有新的错误,报显存不足
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 6.30 GiB. GPU 0 has a total capacity of 23.64 GiB of which 4.28 GiB is free. Process 9684 has 19.35 GiB memory in use. Of the allocated memory 18.84 GiB is allocated by PyTorch, and 56.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

但问题在于,我是4090 24G显存,并且显存是空的没有起任何服务也没有被占用
`ps@ps:~/AI/projects/api-for-open-llm$ nvidia-smi
Fri Aug 16 11:59:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:18:00.0 Off | Off |
| 46% 41C P2 54W / 450W | 1656MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 On | 00000000:3B:00.0 Off | Off |
| 44% 33C P8 26W / 450W | 15MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 On | 00000000:86:00.0 Off | Off |
| 45% 33C P8 12W / 450W | 15MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 On | 00000000:AF:00.0 Off | Off |
| 45% 33C P8 21W / 450W | 15MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 5116 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 21666 C python 1638MiB |
| 1 N/A N/A 5116 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 5116 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 5116 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
`
这是显卡占用情况,尝试过降低MAX_NUM_SEQS,以及运行torch.cuda.empty_cache()清空显存,可还是始终报这个错

@syusama
Copy link
Author

syusama commented Aug 21, 2024

单独部署llm没有问题,但是同时部署llm和embedding就会报这个错,新开一个issue:#308

@syusama syusama closed this as completed Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants