Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging of ONNX decoder >2GB fails #894

Closed
4 tasks
fxmarty opened this issue Mar 17, 2023 · 5 comments · Fixed by #896 or #988
Closed
4 tasks

Merging of ONNX decoder >2GB fails #894

fxmarty opened this issue Mar 17, 2023 · 5 comments · Fixed by #896 or #988
Labels
bug Something isn't working onnx Related to the ONNX export

Comments

@fxmarty
Copy link
Contributor

fxmarty commented Mar 17, 2023

System Info

optimum main

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

optimum-cli export onnx --model gpt2-large gpt2_onnx

Traceback:

(fx) felix@hf-dgx-01:~/optimum$ optimum-cli export onnx --model gpt2-large gpt2_onnx
Framework not specified. Using pt to export to ONNX.
Automatic task detection to causal-lm-with-past.
use_past = False is different than use_present_in_outputs = True, the value of use_present_in_outputs value will be used for the outputs.
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:794: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if batch_size <= 0:
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Using framework PyTorch: 2.1.0.dev20230306+cu117
Overriding 2 configuration item(s)
        - use_cache -> True
        - pad_token_id -> 0
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
======= Diagnostic Run torch.onnx.export version 2.1.0.dev20230306+cu117 =======
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 111, in post_process_exported_models
    merge_decoders(
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 237, in merge_decoders
    raise e
  File "/home/felix/optimum/optimum/onnx/graph_transformations.py", line 232, in merge_decoders
    onnx.checker.check_model(merged_model)
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/site-packages/onnx/checker.py", line 106, in check_model
    C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 218, in main
    models_and_onnx_configs, onnx_files_subpaths = onnx_config.post_process_exported_models(
  File "/home/felix/optimum/optimum/exporters/onnx/config.py", line 117, in post_process_exported_models
    raise Exception(f"Unable to merge decoders. Detailed error: {e}")
Exception: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/felix/miniconda3/envs/fx/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 255, in <module>
    main()
  File "/home/felix/optimum/optimum/exporters/onnx/__main__.py", line 222, in main
    raise Exception(
Exception: The post-processing of the ONNX export failed. The export can still be performed by passing the option --no-post-process. Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: transformer.wte.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

Expected behavior

no error

@fxmarty fxmarty added bug Something isn't working onnx Related to the ONNX export labels Mar 17, 2023
@vilsonrodrigues
Copy link

Hi. I'm using the cli command:

optimum-cli export onnx --model openai/whisper-medium model/

and getting the same error:

ValueError: This protobuf of onnx model is too large (>2GB). Call check_model with model path instead.

Environment:

  • Colab

Optimum version tested:

  • 1.8.2

  • 1.8.3.dev0

@fxmarty
Copy link
Contributor Author

fxmarty commented Apr 20, 2023

@vilsonrodrigues This is fixed on main, thanks for notifying!

@vilsonrodrigues
Copy link

Thanks @fxmarty!!

@typicaldigital
Copy link

Dear @fxmarty

I get a similar error using these versions:

optimum version: 1.8.7
transformers version: 4.29.2
Platform: Windows-10-10.0.22621-SP0
Python version: 3.11.4
Huggingface_hub version: 0.15.1
PyTorch version (GPU?): 2.1.0.dev20230611+cu121 (cuda availabe: True)
Tensorflow version (GPU?): not installed (cuda availabe: NA)

optimum-cli export onnx --model stabilityai/stablelm-tuned-alpha-7b stablelm-tuned-alpha-7b_onnx/

ERROR: Detailed error: Unable to merge decoders. Detailed error: Data of TensorProto ( tensor name: gpt_neox.embed_in.weight_merged_0) should be stored in decoder_model_merged.onnx_data, but it doesn't exist or is not accessible.

@fxmarty
Copy link
Contributor Author

fxmarty commented Jun 20, 2023

Thanks, tracked in #1044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working onnx Related to the ONNX export
Projects
None yet
3 participants