-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot run Continuous batching Demo with GPU #2770
Comments
@jsapede What kind of GPU do you use on your host? The copied logs from ovms don't include the error message but I suspect the models size is too big for the GPU. Try converting the model with lower quantization like int4 and use less cache in graph.pbtxt like 2GB. |
i'm on an old i5 gen 7 |
@jsapede yes, the same graph can be shared with all models. |
well looks like i spoke a little too fast ... it workED quite lost, reinstalled LXC container from begining , made all procedure from the begining inclufing int4 corrections but impossible to get the docker to work again ... [EDIT] it seems that the container entered in a GPU loop, had to turn off/on the whole proxmox. will do further tests tomorrow, thanks for the help ! |
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
installation is made on a proxmox homelab, on a debian 12 LXC with GPU passthrough for openvino :
GPU passthrough is usually working well wtih many other services i use : immich / frigate / jellyfin ...
Steps to reproduce the behavior: as specified on demo !
firts i downloaded the docker version :
docker pull openvino/model_server:latest-gpu
then prepared the model :
prepared the folders :
installed huggingface-cli and logged in :
then ran optimum-cli :
and got this firts kinda warning :
then created the graph.pbtxt using the GPU template in the demo :
then added the config.json at workspace root :
then ran the container with GPU passthrough :
then tried V1 API from another machine :
but got empty response !
container seems to work as theres some heavy activity :
but collapses after some minutes :
then test with V3 :
Expected behavior
run the demo
Logs
Logs from OVMS, ideally with --log_level DEBUG. Logs from client.
Configuration
OVMS version : latest-gpu
OVMS config.json :
CPU, accelerator's versions if applicable : OpenVINO / GPU passthrough
Model repository directory structure :
Additional context
graph.pbtxt content :
The text was updated successfully, but these errors were encountered: