Onnx Conversion scripts #4

volkancirik · 2023-02-15T09:43:48Z

Hello,

I would like to add a new model to this file. Can you provide more details on how you run Onnx conversions and which package versions you used?

visheratin · 2023-02-15T17:55:45Z

Hi! Here is an example of exporting an image model from the HF hub. You can adjust it to the different types of models. I use PyTorch 1.13.1, ONNX 1.13.0, and Transformers 4.26.0.

I noticed that you want to add a new text model. Please note that models except the T5 family are not supported at the moment because they use different tokenizers. I plan to re-work tokenizers' logic in the next month to support all kinds of models. Can you share what model you want to add?

volkancirik · 2023-02-15T18:41:19Z

Thanks for the quick reply!

I'm interested in T5-family models.

How do you define output_names, input_names names for T5 models? Do you have a similar notebook for a T5 model?

torch.onnx.export(model, input_sample, onnx_path, export_params=True, 
                        opset_version=15,
                        input_names=["input"],
                        output_names=["logits", "boxes", "output"],
                        dynamic_axes={
                          "input": {0: "batch", 2: "width", 3: "height"},
                        })

visheratin · 2023-02-15T18:48:51Z

For T5 models, I use fastT5.

from fastT5 import generate_onnx_representation

generate_onnx_representation(pretrained_version=<model path from HF hub>, model=<your model instance>, output_path=<output path>)

Then you can quantize the model:

from onnxruntime.quantization import quantize_dynamic, QuantType

quantize_dynamic("/path/to/init-decoder.onnx", "/path/to/result/decoder-quant.onnx", weight_type=QuantType.QInt8, per_channel=True, reduce_range=True, optimize_model=False)
quantize_dynamic("/path/to/encoder.onnx", "/path/to/result/encoder-quant.onnx", weight_type=QuantType.QInt8, per_channel=True, reduce_range=True, optimize_model=False)

Note that Web AI doesn't use a decoder with past because managing huge tensors of past states in memory in JS slows the generation significantly.

volkancirik · 2023-02-17T09:32:51Z

Great, this solves the issue. Thanks so much!

We wanted to add Flan-based models. How did you ensure that this tokenizer works for Flan-based models?

visheratin · 2023-02-17T13:14:48Z

That's exciting! Do you have a small version of Flan or something even more distilled?

Regarding the tokenizer, I just checked that the model produces reasonable output when using this tokenizer config. This one is from the Efficient T5 model. Flan T5 config has some differences which are not compatible with the current implementation of the tokenizer.

volkancirik · 2023-02-20T08:31:25Z

The opposite: trying something with Flan-base. We also observed degradation of performance. We are wondering if that's quantization or tokenization.

visheratin · 2023-02-20T14:17:08Z

Do you observe degradation in performance when running a non-quantized model? What are your metrics for the performance?

volkancirik · 2023-02-20T18:49:31Z

Qualitatively we could not get the same output as non-onnx (pytorch) version.

Another question: Is it possible to feed a batch of input?

Let's say we want to run the seq2seq model for each sentence provided in parallel. Does the current implementation support that?

visheratin · 2023-02-21T03:27:24Z

Did you use the same sampling strategy? Web AI, for now, supports only greedy sampling.

No, the current implementation doesn't support batch processing.

GarciaLnk · 2023-02-27T13:20:14Z

@visheratin out of curiosity, why do you use fastT5 (which seems to be abandoned and requires an old ORT version) instead of 🤗 Optimum to export T5 models to ONNX? I tried the latter, but I get a "encoderOutput is undefined" error when running the model, so I wonder if there's something I'm missing.

visheratin · 2023-02-27T13:36:03Z

@GarciaLnk When I started working on this project, I had some issues with Optimum. fastT5 worked with no problems. There should be no difference in what tool to use. Can you share the code for exporting and running the model in Web AI (preferably in a separate GitHub issue)?

visheratin · 2023-03-06T06:49:12Z

@volkancirik Web AI now fully supports HF tokenization. You can use original Flan T5 config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnx Conversion scripts #4

Onnx Conversion scripts #4

volkancirik commented Feb 15, 2023

visheratin commented Feb 15, 2023

volkancirik commented Feb 15, 2023

visheratin commented Feb 15, 2023

volkancirik commented Feb 17, 2023 •

edited

Loading

visheratin commented Feb 17, 2023

volkancirik commented Feb 20, 2023

visheratin commented Feb 20, 2023

volkancirik commented Feb 20, 2023

visheratin commented Feb 21, 2023

GarciaLnk commented Feb 27, 2023

visheratin commented Feb 27, 2023

visheratin commented Mar 6, 2023

Onnx Conversion scripts #4

Onnx Conversion scripts #4

Comments

volkancirik commented Feb 15, 2023

visheratin commented Feb 15, 2023

volkancirik commented Feb 15, 2023

visheratin commented Feb 15, 2023

volkancirik commented Feb 17, 2023 • edited Loading

visheratin commented Feb 17, 2023

volkancirik commented Feb 20, 2023

visheratin commented Feb 20, 2023

volkancirik commented Feb 20, 2023

visheratin commented Feb 21, 2023

GarciaLnk commented Feb 27, 2023

visheratin commented Feb 27, 2023

visheratin commented Mar 6, 2023

volkancirik commented Feb 17, 2023 •

edited

Loading