-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couldnt run any vision model #935
Comments
Hi @GraphicalDot! Thanks for the issue. I see there are some problems, I've enumerated and addressed them below:
To test out the Qwen2-VL model as I have merged #936?
Thanks! |
Thanks for the prompt response. Took a git pull
However, it first downloaded the weights and the .uqff file. The server started smoothly. I used this script to test the model The error that i got from the server is
The error i got from the python script is
This is for llama3.2 vision models. |
@GraphicalDot I merged #937, can you please try the server again after |
Thanks a ton! What else needs to be done to run Qwen/Qwen2-VL-2B-Instruct? Could you please guide me on what needs to be done to support new upcoming models for vision? |
@GraphicalDot I think it should run Qwen2-VL now. If it's using 100% GPU, that good! It means that it is using the resources efficiently. Memory footprint, however, is different. Is the 30GB during inference or during loading? Does it spike or does it stay about constant? If you could provide a graph that would be great! What other models do you have in mind? Right now we have Phi 3.5, the Llava models, Idefics 2, Llama 3.2 V, and Qwen2-VL. |
Please see the attached video. It increases and then maximizes at around 30GB during inference only. Almost similar to how llama.cpp does during inference. It consumes around 5GB of Vram while the server is running. |
Do I need to convert https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct this to .uqff format before running it like i did with Lllama3.2-11B vision model? DO I also need to do the same for Flux Architecture models? i.e converting them to .uqff format? |
@GraphicalDot I merged #938 which brings the usage down significantly on my machine. Can you please confirm it works for you after
Yes. |
Yes, Now it consumes only 7-8 GB of ram on my M4 Pro machine. The other models that we are trying to run with are:- HuggingFaceTB/SmolVLM-Instruct Second question:-
|
create cd H:\ml\Qwen2-VL-2B-Instruct
E:\dev\rust\mistral.rs\target\release\mistralrs-server.exe --isq Q4K -i vision-plain -m ./ -a qwen2vl --write-uqff ./uqff/q4k.uqff load cd uqff
E:\dev\rust\mistral.rs\target\release\mistralrs-server.exe -i vision-plain -a qwen2vl -m ./ --from-uqff ./q4k.uqff |
@GraphicalDot, following up:
I added HuggingFaceTB/SmolVLM-Instruct and Idefics 3 support!
Yes, here:
Please let me know if there are any other questions! |
This has stopped working
If I am trying to run it in the server mode, It just hangs there
Also getting weird errors
However, Compiling it with metal doesnt show any error but fails on inference.
|
@GraphicalDot I don't think this error originates from mistral.rs, but instead maybe from some change in your system? I would recommend checking out this issue: gfx-rs/gfx#2309. Can you run the following:
|
I pulled the code two days ago, and this problem started occurring. Earlier, there was no issue. |
I reverted to old commit 739ea3e
i reverted to an older commit
and ran the same command and everything started working again.
|
I went furthur and switched to another commit to see if the implementation of idefic3 is working or not.
and it worked
I also tried to get an inferenceover an HTTP server and it also worked fine
|
@GraphicalDot i was also getting same error. now it's working fine. In my case I had deleted earlier xcode from applications and reinstalling it fixed it. you can check whether you metal API is available or not with this command |
This seems to be related to
xcode-select --print-path
/Library/Developer/CommandLineTools
where xcrun
/usr/bin/xcrun
xcrun -sdk macosx metal -E -x metal -P -
xcrun: error: unable to find utility "metal", not a developer tool or in PATH // Run the `xcrun` command, taking input from the `echo` command's output
let output = Command::new("xcrun")
.arg("-sdk")
.arg("macosx")
.arg("metal")
.arg("-E")
.arg("-x")
.arg("metal")
.arg("-P")
.arg("-")
.stdin(echo.stdout.unwrap())
.output()
.expect("Failed to run xcrun command"); |
@sgrebnov, do you think moving this logic of (Note that the logic is necessary as the Metal attn softmax uses bfloat vectorized types which fallback implementations for metal < 3.10 don't support). |
@EricLBuehler - yeah, would definitely happy to help here and move this to |
Describe the bug
Hey everyone,
I’m trying to run vision models in Rust on my M4 Pro (48GB RAM). After some research, I found Mistral.rs, which seems like the best library out there for running vision models locally. However, I’ve been running into some serious roadblocks, and I’m hoping someone here can help!
What I Tried
cargo run --features metal --release -- -i --isq Q4K vision-plain -m lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k -a vllama
cargo run --features metal --release -- -i vision-plain -m Qwen/Qwen2-VL-2B-Instruct -a qwen2vl
Neither of these worked. When I tried to process an image using Qwen2-VL-2B-Instruct, I got the following error:
This means the preprocessing step failed. Not sure how to fix this.
Quantization Runtime Issues: The commands above download the entire model and perform runtime quantization. This consumes a huge amount of resources and isn't feasible for my setup.
Hosting as a Server: I tried running the model as an HTTP server using mistralrs-server:
./mistralrs-server gguf -m /Users/sauravverma/.pyano/models/ -f Llama-3.2-11B-Vision-Instruct.Q4_K_M.gguf
This gave me the following error:
However, when I tried running another model:
./mistralrs-server -p 52554 gguf -m /Users/sauravverma/.pyano/models/ -f MiniCPM-V-2_6-Q6_K_L.gguf
What I Need Help With
Latest commit or version
Latest commit
The text was updated successfully, but these errors were encountered: