-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Onnx Conversion scripts #4
Comments
Hi! Here is an example of exporting an image model from the HF hub. You can adjust it to the different types of models. I use PyTorch 1.13.1, ONNX 1.13.0, and Transformers 4.26.0. I noticed that you want to add a new text model. Please note that models except the T5 family are not supported at the moment because they use different tokenizers. I plan to re-work tokenizers' logic in the next month to support all kinds of models. Can you share what model you want to add? |
Thanks for the quick reply! I'm interested in T5-family models. How do you define
|
For T5 models, I use fastT5.
Then you can quantize the model:
Note that Web AI doesn't use a decoder with past because managing huge tensors of past states in memory in JS slows the generation significantly. |
Great, this solves the issue. Thanks so much! We wanted to add Flan-based models. How did you ensure that this tokenizer works for Flan-based models? |
That's exciting! Do you have a Regarding the tokenizer, I just checked that the model produces reasonable output when using this tokenizer config. This one is from the Efficient T5 model. Flan T5 config has some differences which are not compatible with the current implementation of the tokenizer. |
The opposite: trying something with |
Do you observe degradation in performance when running a non-quantized model? What are your metrics for the performance? |
Qualitatively we could not get the same output as non-onnx (pytorch) version. Another question: Is it possible to feed a batch of input? Let's say we want to run the seq2seq model for each sentence provided in parallel. Does the current implementation support that? |
Did you use the same sampling strategy? Web AI, for now, supports only greedy sampling. No, the current implementation doesn't support batch processing. |
@visheratin out of curiosity, why do you use fastT5 (which seems to be abandoned and requires an old ORT version) instead of 🤗 Optimum to export T5 models to ONNX? I tried the latter, but I get a "encoderOutput is undefined" error when running the model, so I wonder if there's something I'm missing. |
@GarciaLnk When I started working on this project, I had some issues with Optimum. fastT5 worked with no problems. There should be no difference in what tool to use. Can you share the code for exporting and running the model in Web AI (preferably in a separate GitHub issue)? |
@volkancirik Web AI now fully supports HF tokenization. You can use original Flan T5 config. |
Hello,
I would like to add a new model to this file. Can you provide more details on how you run Onnx conversions and which package versions you used?
The text was updated successfully, but these errors were encountered: