-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently converting VIT-L/14 model #3
Comments
Are you seeing a runtime error like Apparently it's possible to build with max memory limit set to 4GB instead of 2GB as mentioned here: microsoft/onnxruntime#10957 (comment) Please comment on that issue, briefly explaining your use case to help motivate the ORT Web authors to raise the limit to 4GB. I think we're waiting for this wasm proposal to allow more than 4GB memory. Also note that I did briefly try to quantize the models using the ONNX tooling, but for some reason it wasn't working. From the README of this repo:
The code to quantize is actually really simple: from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("clip-image-vit-32.onnx", "clip-image-vit-32-uint8.onnx", weight_type=QuantType.QUInt8, extra_options={"MatMulConstBOnly":False}) Again, I only briefly tried to get it to work - if you have time to experiment then I'd be interested to see whether you manage to get quantization working. |
Thanks I will try out quantization, also will reach out to ONNX to see if its supported. |
Getting |
I now think that this is a likely onnx export bug since the uploaded imgbedding model works fine but if I try to re-export it with same config it fails. I am gonna try downgrading onnx exporter and see if that works. |
Your quantization code is correct the error is in the jsbin. You just need to use the latest version of ORT https://cdn.jsdelivr.net/npm/onnxruntime-web@1.12.0/dist/ort.js I was able to build a 291.9 MB VIT L-14 quantized model that produces correct embedding. |
@SuperVisualApp Great to hear you got quantization working! Thanks for sharing your work/progress on this. I am a bit confused though: Which jsbin are you referring to? I might be misremembering, but I thought the jsbins that I linked were working fine (using minimaxir's models), but the problem was that they output a 768 dim vector (instead of 512) because they were missing the projection head. And when I tried to do the conversion in this notebook, it produced a model that's has the same file size as the original? |
I used the quantized weights produced by your notebook. The only change was in the JSBin that I used to verify if it was correct. |
The final model is definitely smaller. The unquantized version is 580Mb. |
@SuperVisualApp Oh, weird. I just opened up the notebook, clicked "Runtime > Run all" without making any changes, and then checked the output and it doesn't reduce the file size for me (both files are ~167mb). I also tried switching to |
Ah it turns out that it was not the model exported using your notebook but rather the Jina AI CLIP as a Service If you quantized that one it works correctly and produce 292MB model. |
@SuperVisualApp Ah okay, thanks! |
Okay, very strange, this works (using the original onnx files rather than producing new ones with the !pip install onnxruntime
!pip install onnx
!wget https://huggingface.co/rocca/openai-clip-js/resolve/main/clip-image-vit-32-float32.onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("clip-image-vit-32-float32.onnx", "clip-image-vit-32-uint8.onnx", weight_type=QuantType.QUInt8) So here are the quantized models:
I'm guessing something changed in ONNX/PyTorch since I last exported. Oh, actually, I just realised while typing this comment: It's probably related to the post-conversion float16-to-float32 stuff that I had to do, mentioned in a bullet-point in the readme. So the conversion now works without any errors, and the file sizes are a quarter of the size, as expected, but their embeddings seem to be inaccurate - the results are noticeably worse than the normal models when testing it with the clip-image-sorter demo. I've added these quantized models to the demos in this repo and to the clip-image-sorter for testing in any case. I'll also linked this comment from the readme in case anyone wants to explore this further. |
You can potentially compare it with quantized JINA ai onnx exported model to see if its better. |
You can try out what I am building with CLIP running in the browser here: https://www.supervisual.app/ |
Very cool! |
Hi, I am currently trying to convert the VIT L/14 model, but running into some memory issue when I try to load the model in the ONNX web runtime. Do you have any ideas?
I might have to just wait for it to be quantized to INT8.
Thanks,
The text was updated successfully, but these errors were encountered: