-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大批量请求后 服务直接卡死 #1291
Comments
What's the first error? |
我也遇到了,可以看这个issue,我的是cuda显存溢出导致的 |
把 gpu-memory-utilization 设置的小一点试一下呢,我有一些类似的 error 就是显存 oom 的原因 |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
运行了两个模型 qwen1.5-32b-awq 和 qwen1.5-72b-gptq-int4
多线程并行请求 12线程 同时请求 72b , 会把服务器卡死, 任何请求都不接收
To Reproduce
To help us to reproduce this bug, please provide information below:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: