Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Currently converting VIT-L/14 model #3

Closed
ghost opened this issue Jul 13, 2022 · 16 comments
Closed

Currently converting VIT-L/14 model #3

ghost opened this issue Jul 13, 2022 · 16 comments

Comments

@ghost
Copy link

ghost commented Jul 13, 2022

Hi, I am currently trying to convert the VIT L/14 model, but running into some memory issue when I try to load the model in the ONNX web runtime. Do you have any ideas?
I might have to just wait for it to be quantized to INT8.

Thanks,

@josephrocca
Copy link
Owner

Are you seeing a runtime error like Uncaught (in promise) 1991937888? If so, and if the number is close to 2 billion like in that example, then you're likely running into this issue: microsoft/onnxruntime#10957

Apparently it's possible to build with max memory limit set to 4GB instead of 2GB as mentioned here: microsoft/onnxruntime#10957 (comment)

Please comment on that issue, briefly explaining your use case to help motivate the ORT Web authors to raise the limit to 4GB.

I think we're waiting for this wasm proposal to allow more than 4GB memory.

Also note that I did briefly try to quantize the models using the ONNX tooling, but for some reason it wasn't working. From the README of this repo:

The model files are about 4x larger than they actually need to be - params are float32 instead of uint8. If you're using CLIP in a "real" web app, you should probably quantize it. @minimaxir has done it (1, 2), and that model worked first try with ORT Web (which is amazing), but it outputs a 768 element vector instead of 512, which I think is because @minimaxir's model is missing the final projection head which puts image embeddings into same-sized space as text embeddings. I had a quick attempt at it in the ONNX export notebook (see cell after ONNX conversion), but it doesn't seem to be working. If you investigate this and get it working, please open an issue. Thanks to @congraIiIso on Twitter for bringing the uint8 quantization to my attention!

The code to quantize is actually really simple:

from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("clip-image-vit-32.onnx", "clip-image-vit-32-uint8.onnx", weight_type=QuantType.QUInt8, extra_options={"MatMulConstBOnly":False})

Again, I only briefly tried to get it to work - if you have time to experiment then I'd be interested to see whether you manage to get quantization working.

@ghost
Copy link
Author

ghost commented Jul 15, 2022

Thanks I will try out quantization, also will reach out to ONNX to see if its supported.

@ghost
Copy link
Author

ghost commented Aug 5, 2022

Getting Uncaught (in promise) 621137752 with quantization, the imgbedding model without final projection head works fine without any memory issue so I doubt it has anything to do with memory. Maybe its older version of ONNX producing different ops etc.

@ghost
Copy link
Author

ghost commented Aug 6, 2022

I now think that this is a likely onnx export bug since the uploaded imgbedding model works fine but if I try to re-export it with same config it fails. I am gonna try downgrading onnx exporter and see if that works.

@ghost
Copy link
Author

ghost commented Aug 19, 2022

Your quantization code is correct the error is in the jsbin. You just need to use the latest version of ORT

https://cdn.jsdelivr.net/npm/onnxruntime-web@1.12.0/dist/ort.js

I was able to build a 291.9 MB VIT L-14 quantized model that produces correct embedding.
Thanks again for your effort in openai-clip-js and the clip_sorter repo 🎉 .

@ghost ghost closed this as completed Aug 19, 2022
@josephrocca
Copy link
Owner

@SuperVisualApp Great to hear you got quantization working! Thanks for sharing your work/progress on this. I am a bit confused though: Which jsbin are you referring to? I might be misremembering, but I thought the jsbins that I linked were working fine (using minimaxir's models), but the problem was that they output a 768 dim vector (instead of 512) because they were missing the projection head. And when I tried to do the conversion in this notebook, it produced a model that's has the same file size as the original?

@ghost
Copy link
Author

ghost commented Aug 20, 2022

I used the quantized weights produced by your notebook. The only change was in the JSBin that I used to verify if it was correct.

@ghost
Copy link
Author

ghost commented Aug 20, 2022

image

@ghost
Copy link
Author

ghost commented Aug 20, 2022

The final model is definitely smaller. The unquantized version is 580Mb.

@josephrocca
Copy link
Owner

@SuperVisualApp Oh, weird. I just opened up the notebook, clicked "Runtime > Run all" without making any changes, and then checked the output and it doesn't reduce the file size for me (both files are ~167mb). I also tried switching to ViT-L/14 (from ViT-B/32) and it has the same problem - uint8 output is ~580mb. Could you share the notebook that you're using? Perhaps you made some slight changes?

@ghost
Copy link
Author

ghost commented Aug 21, 2022

Ah it turns out that it was not the model exported using your notebook but rather the Jina AI CLIP as a Service

https://clip-as-service.s3.us-east-2.amazonaws.com/models-436c69702d61732d53657276696365/onnx/ViT-L-14/visual.onnx

If you quantized that one it works correctly and produce 292MB model.

@josephrocca
Copy link
Owner

@SuperVisualApp Ah okay, thanks!

@josephrocca
Copy link
Owner

josephrocca commented Aug 21, 2022

Okay, very strange, this works (using the original onnx files rather than producing new ones with the Export_CLIP_to_ONNX_tflite_tfjs_tf_saved_model.ipynb notebook in this repo):

!pip install onnxruntime
!pip install onnx
!wget https://huggingface.co/rocca/openai-clip-js/resolve/main/clip-image-vit-32-float32.onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic("clip-image-vit-32-float32.onnx", "clip-image-vit-32-uint8.onnx", weight_type=QuantType.QUInt8)

So here are the quantized models:

I'm guessing something changed in ONNX/PyTorch since I last exported. Oh, actually, I just realised while typing this comment: It's probably related to the post-conversion float16-to-float32 stuff that I had to do, mentioned in a bullet-point in the readme.

So the conversion now works without any errors, and the file sizes are a quarter of the size, as expected, but their embeddings seem to be inaccurate - the results are noticeably worse than the normal models when testing it with the clip-image-sorter demo. I've added these quantized models to the demos in this repo and to the clip-image-sorter for testing in any case.

I'll also linked this comment from the readme in case anyone wants to explore this further.

@ghost
Copy link
Author

ghost commented Aug 22, 2022

You can potentially compare it with quantized JINA ai onnx exported model to see if its better.

@ghost
Copy link
Author

ghost commented Sep 16, 2022

You can try out what I am building with CLIP running in the browser here: https://www.supervisual.app/

@josephrocca
Copy link
Owner

Very cool!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant