Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zipformer Onnx FP16 #1671

Merged
merged 5 commits into from
Jun 27, 2024
Merged

Zipformer Onnx FP16 #1671

merged 5 commits into from
Jun 27, 2024

Conversation

manickavela29
Copy link
Contributor

@manickavela29 manickavela29 commented Jun 26, 2024

Hi @csukuangfj , @yaozengwei
Exporting zipformer onnx model in FP16
Model is trained in mixed precision, therefore having fp16 shouldn't give any loss

Tested with data, and model accuracy is exactly same as fp32.
cc: k2-fsa/sherpa-onnx#41 , k2-fsa/sherpa-onnx#40

Signed-off-by: manickavela29 <manickavela1998@gmail.com>
@csukuangfj
Copy link
Collaborator

Could you describe how you test it? Does it work on CPU.

Also, could you use fp16.onnx as the suffix when fp16 is used?

Signed-off-by: manickavela29 <manickavela1998@gmail.com>
@manickavela29
Copy link
Contributor Author

manickavela29 commented Jun 27, 2024

I have tested with A10 GPU and it is holding good in sherpa-onnx

I havn't tested this on CPU, but AVX512 has support for sure, I think AVX2 will also handle it.

Generally onnxrt will have fall back mechanism if fp16 inherently not there in fp32 conversion happen implicitly
but I have not tested it, and it is also subjected to Ops availability(should not be an issue with zipformer)

This is an optional export flag, and wanted to keep it simple with the existing model itself.
thats the reason didn't prefer having a new model with '-fp16' separately. It is also backward compatible 🙂

But if this is still required, I will modify the filename still.

FYI,
I have only tested with Zipformer encoder model, so raised this patch.
later if I can time to experiment with decoder/joiner too I will just extend the support

@csukuangfj
Copy link
Collaborator

Could you test the fp16 model either in icefall with onnx_pretrained.py or in sherpa-onnx with CPU?

thats the reason didn't prefer having a new model with '-fp16' separately. It is also backward compatible

Currently, we have .onnx for fp32, int8.onnx for int8. Using fp16.onnx makes it clearer to users the model they use is fp16.

Signed-off-by: manickavela29 <manickavela1998@gmail.com>
@manickavela29
Copy link
Contributor Author

I ran it once with sherpa, and it was up and running,
some minor warnings from onnxrt optimizaiton but it is stable

4/Exp'
2024-06-27 05:57:13.449610676 [W:onnxruntime:, constant_folding.cc:269 ApplyImpl] Could not find a CPU kernel and hence can't constant fold Exp node '/norm_15/Exp'
2024-06-27 05:57:13.449640846 [W:onnxruntime:, constant_folding.cc:269 ApplyImpl] Could not find a CPU kernel and hence can't constant fold Softmax node '/downsample_output/Softmax'

Signed-off-by: manickavela29 <manickavela1998@gmail.com>
@csukuangfj
Copy link
Collaborator

Thanks! Is it ready to merge?

@manickavela29
Copy link
Contributor Author

yes, completed from my end!

@csukuangfj
Copy link
Collaborator

Thank you for your contribution!

@csukuangfj csukuangfj merged commit eaab2c8 into k2-fsa:master Jun 27, 2024
250 of 253 checks passed
@manickavela29 manickavela29 deleted the zip_onnx_fp16 branch June 27, 2024 08:13
yfyeung pushed a commit to yfyeung/icefall that referenced this pull request Aug 9, 2024
Signed-off-by: manickavela29 <manickavela1998@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants