-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Huggingface2CoreML convert file #18
Conversation
@jxiong22 |
I've run your scripts, and this is what I've found: Text Encodermax_seq_length=76
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555, 0.0733, -0.2448, -0.2212, -0.1934, 0.2052, -0.3175, -0.7824,
-0.1816, 0.1943], grad_fn=<SliceBackward0>)
CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [ 0.15597123 0.07306742 -0.24520203 -0.22147842 -0.1928216 0.20461065
-0.31734398 -0.7811562 -0.1815224 0.19413628] max_seq_length=77
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555, 0.0733, -0.2448, -0.2212, -0.1934, 0.2052, -0.3175, -0.7824,
-0.1816, 0.1943], grad_fn=<SliceBackward0>)
CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [ 0.15597123 0.07306742 -0.24520203 -0.22147842 -0.1928216 0.20461065
-0.31734398 -0.7811562 -0.1815224 0.19413628] It seems to be the same output whenever you set Image EncoderI've compared both my original hard-coded norm vs Using hard-coded norm:
PyTorch ImageEncoder ckpt out for IMG_2115.jpg:
>>> tensor([ 0.2361, -0.0980, -0.0022, 0.2364, 0.1279, -0.1041, 0.3530, 0.0853,
-0.0293, 0.0784], grad_fn=<SliceBackward0>)
CoreML ImageEncoder ckpt out for IMG_2115.jpg:
>>> [ 0.350561 0.025289 -0.13452446 0.1267291 -0.1897895 -0.14739564
0.05819088 0.30193368 -0.2142085 0.27992135] The average Using transformers.Normalize():
PyTorch ImageEncoder ckpt out for jpg:
>>> tensor([ 0.3709, 0.0213, -0.0549, 0.1092, -0.2229, -0.2131, 0.0909, 0.2031,
-0.2159, 0.2603], grad_fn=<SliceBackward0>)
CoreML ImageEncoder ckpt out for jpg:
>>> [ 0.34969723 0.02547197 -0.134884 0.12836841 -0.19035722 -0.14880568
0.05857127 0.30046463 -0.21491489 0.2801294 ] The average Really thanks for the PR ! This would help me to improve current Could you validate the precision influence for |
That's weird. On my end, I can't figure out why I'm getting bad results when I set
|
My envs are:
|
There has been no improvement in your environment. The results are still unsatisfactory with |
Okay, then I'll try to verify this issue, and feedback here if there is a progress. |
Thank you. If 77 is functional for you, how about setting it as the default value and including a comment for future reference in case others encounter the same issue as I did? |
Is is OKay that I merge this PR now and add a notes in README, so others can refer to your notebook for export ? |
Sounds good to me. Thanks!
…On Fri, Sep 22, 2023 at 00:56 Ke Fang ***@***.***> wrote:
Is is OKay that I merge this PR now and add a notes in README, so others
can refer to your notebook for export ?
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APO45DQRKUCFSKFTELUBXTTX3VADZANCNFSM6AAAAAA5CR66JA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This file offers a Hugging Face version, clip-vit-base-patch32, with separate text and image encoders. It also fixes image encoder precision errors.