Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onnx Conversion scripts #4

Open
volkancirik opened this issue Feb 15, 2023 · 12 comments
Open

Onnx Conversion scripts #4

volkancirik opened this issue Feb 15, 2023 · 12 comments

Comments

@volkancirik
Copy link

Hello,

I would like to add a new model to this file. Can you provide more details on how you run Onnx conversions and which package versions you used?

@visheratin
Copy link
Owner

Hi! Here is an example of exporting an image model from the HF hub. You can adjust it to the different types of models. I use PyTorch 1.13.1, ONNX 1.13.0, and Transformers 4.26.0.

I noticed that you want to add a new text model. Please note that models except the T5 family are not supported at the moment because they use different tokenizers. I plan to re-work tokenizers' logic in the next month to support all kinds of models. Can you share what model you want to add?

@volkancirik
Copy link
Author

Thanks for the quick reply!

I'm interested in T5-family models.

How do you define output_names, input_names names for T5 models? Do you have a similar notebook for a T5 model?

torch.onnx.export(model, input_sample, onnx_path, export_params=True, 
                        opset_version=15,
                        input_names=["input"],
                        output_names=["logits", "boxes", "output"],
                        dynamic_axes={
                          "input": {0: "batch", 2: "width", 3: "height"},
                        })

@visheratin
Copy link
Owner

For T5 models, I use fastT5.

from fastT5 import generate_onnx_representation

generate_onnx_representation(pretrained_version=<model path from HF hub>, model=<your model instance>, output_path=<output path>)

Then you can quantize the model:

from onnxruntime.quantization import quantize_dynamic, QuantType

quantize_dynamic("/path/to/init-decoder.onnx", "/path/to/result/decoder-quant.onnx", weight_type=QuantType.QInt8, per_channel=True, reduce_range=True, optimize_model=False)
quantize_dynamic("/path/to/encoder.onnx", "/path/to/result/encoder-quant.onnx", weight_type=QuantType.QInt8, per_channel=True, reduce_range=True, optimize_model=False)

Note that Web AI doesn't use a decoder with past because managing huge tensors of past states in memory in JS slows the generation significantly.

@volkancirik
Copy link
Author

volkancirik commented Feb 17, 2023

Great, this solves the issue. Thanks so much!

We wanted to add Flan-based models. How did you ensure that this tokenizer works for Flan-based models?

@visheratin
Copy link
Owner

That's exciting! Do you have a small version of Flan or something even more distilled?

Regarding the tokenizer, I just checked that the model produces reasonable output when using this tokenizer config. This one is from the Efficient T5 model. Flan T5 config has some differences which are not compatible with the current implementation of the tokenizer.

@volkancirik
Copy link
Author

The opposite: trying something with Flan-base. We also observed degradation of performance. We are wondering if that's quantization or tokenization.

@visheratin
Copy link
Owner

Do you observe degradation in performance when running a non-quantized model? What are your metrics for the performance?

@volkancirik
Copy link
Author

Qualitatively we could not get the same output as non-onnx (pytorch) version.

Another question: Is it possible to feed a batch of input?

Let's say we want to run the seq2seq model for each sentence provided in parallel. Does the current implementation support that?

@visheratin
Copy link
Owner

Did you use the same sampling strategy? Web AI, for now, supports only greedy sampling.

No, the current implementation doesn't support batch processing.

@GarciaLnk
Copy link
Contributor

@visheratin out of curiosity, why do you use fastT5 (which seems to be abandoned and requires an old ORT version) instead of 🤗 Optimum to export T5 models to ONNX? I tried the latter, but I get a "encoderOutput is undefined" error when running the model, so I wonder if there's something I'm missing.

@visheratin
Copy link
Owner

@GarciaLnk When I started working on this project, I had some issues with Optimum. fastT5 worked with no problems. There should be no difference in what tool to use. Can you share the code for exporting and running the model in Web AI (preferably in a separate GitHub issue)?

@visheratin
Copy link
Owner

@volkancirik Web AI now fully supports HF tokenization. You can use original Flan T5 config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants