Merging of ONNX decoder >2GB fails #894

fxmarty · 2023-03-17T09:33:52Z

System Info

optimum main

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

optimum-cli export onnx --model gpt2-large gpt2_onnx

Traceback:

(fx) felix@hf-dgx-01:~/optimum$ optimum-cli export onnx --model gpt2-large gpt2_onnx
Framework not specified. Using pt to export to ONNX.
Automatic task detection to causal-lm-with-past.
use_past = False is different than use_present_in_outputs = True, the value of use_present_in_outputs value will be used for the outputs.
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:794: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if batch_size <= 0:
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 111, in post_process_exported_models
    merge_decoders(
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 237, in merge_decoders
    raise e
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 232, in merge_decoders
    onnx.checker.check_model(merged_model)
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 218, in main
    models_and_onnx_configs, onnx_files_subpaths = onnx_config.post_process_exported_models(
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 117, in post_process_exported_models
    raise Exception(f"Unable to merge decoders. Detailed error: {e}")
Exception: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 255, in <module>
    main()
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 222, in main
    raise Exception(
Exception: The post-processing of the ONNX export failed. The export can still be performed by passing the option --no-post-process. Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

Expected behavior

no error

The text was updated successfully, but these errors were encountered:

vilsonrodrigues · 2023-04-20T02:04:21Z

Hi. I'm using the cli command:

optimum-cli export onnx --model openai/whisper-medium model/

and getting the same error:

ValueError: This protobuf of onnx model is too large (>2GB). Call check_model with model path instead.

Environment:

Colab

Optimum version tested:

1.8.2
1.8.3.dev0

fxmarty · 2023-04-20T09:20:34Z

@vilsonrodrigues This is fixed on main, thanks for notifying!

vilsonrodrigues · 2023-04-20T18:29:53Z

Thanks @fxmarty!!

typicaldigital · 2023-06-12T12:39:08Z

Dear @fxmarty

I get a similar error using these versions:

optimum version: 1.8.7
transformers version: 4.29.2
Platform: Windows-10-10.0.22621-SP0
Python version: 3.11.4
Huggingface_hub version: 0.15.1
PyTorch version (GPU?): 2.1.0.dev20230611+cu121 (cuda availabe: True)
Tensorflow version (GPU?): not installed (cuda availabe: NA)

optimum-cli export onnx --model stabilityai/stablelm-tuned-alpha-7b stablelm-tuned-alpha-7b_onnx/

ERROR: Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: gpt_neox.embed_in.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

fxmarty · 2023-06-20T09:24:35Z

Thanks, tracked in #1044

fxmarty added bug Something isn't working onnx Related to the ONNX export labels Mar 17, 2023

fxmarty mentioned this issue Mar 17, 2023

Fix check_model for large merged ONNX models #896

Merged

fxmarty closed this as completed in #896 Mar 17, 2023

fxmarty reopened this Apr 20, 2023

fxmarty mentioned this issue Apr 20, 2023

Fix protobuf max allowed size #988

Merged

fxmarty closed this as completed in #988 Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging of ONNX decoder >2GB fails #894

Merging of ONNX decoder >2GB fails #894

fxmarty commented Mar 17, 2023

vilsonrodrigues commented Apr 20, 2023

fxmarty commented Apr 20, 2023

vilsonrodrigues commented Apr 20, 2023

typicaldigital commented Jun 12, 2023

fxmarty commented Jun 20, 2023

Merging of ONNX decoder >2GB fails #894

Merging of ONNX decoder >2GB fails #894

Comments

fxmarty commented Mar 17, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

vilsonrodrigues commented Apr 20, 2023

fxmarty commented Apr 20, 2023

vilsonrodrigues commented Apr 20, 2023

typicaldigital commented Jun 12, 2023

fxmarty commented Jun 20, 2023