Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

Closed
2 of 4 tasks
Eichhof opened this issue Dec 19, 2022 · 12 comments
Closed
2 of 4 tasks

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

Eichhof opened this issue Dec 19, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@Eichhof
Copy link

Eichhof commented Dec 19, 2022

System Info

Optimum: 1.5.1
Python: 3.10.4
Platform: Windows 10
Cuda: 11.6

Who can help?

@JingyaHuang @echarlaix

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I Installed optimum with pip install optimum[onnxruntime-gpu]. Then I was running python -m optimum.exporters.onnx --task causal-lm-with-past --model EleutherAI/gpt-j-6B gptj_onnx/ to transform GPT-J to ONNX. To use the model, I used the following lines:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("C:/Users/myUsername/Desktop/gptj_onnx", pad_token=gpt_eos, eos_token=gpt_eos, truncation_side='left')
model = ORTModelForCausalLM.from_pretrained(
            "C:/Users/myUsername/Desktop/gptj_onnx",
            provider="TensorrtExecutionProvider",
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            use_cache=True,
            gradient_checkpointing=gradient_checkpointing
)

When running these lines of code, I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 349, in <module>
    model = Model_init()
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 166, in Model_init
    model = Model(gradient_checkpointing=False, start_prompt=start_prompt)
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\../../chatbot\gpt_j\model.py", line 58, in __init__
    self.model = ORTModelForCausalLM.from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 269, in from_pretrained
    return super().from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\modeling_base.py", line 266, in from_pretrained
    return cls._from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 324, in _from_pretrained
    model = ORTModel.load_model(os.path.join(model_id, subfolder, model_file_name), **kwargs)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 216, in load_model
    return ort.InferenceSession(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1069 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll"

I have installed Cuda 11.6 and also cuDNN 8.7.0.

Expected behavior

The model should load correctly without an error.

@Eichhof Eichhof added the bug Something isn't working label Dec 19, 2022
@michaelbenayoun
Copy link
Member

Hi @Eichhof,
Does it work with the CUDAExecutionProvider?

@JingyaHuang
Copy link
Collaborator

Hi @Eichhof,
Can you also check your TensorRT installation with the steps in our doc and give us the version you are using? Thx.

@fxmarty
Copy link
Contributor

fxmarty commented Dec 26, 2022

Hi @Eichhof , just saw this one in ONNX Runtime issues, I'm wondering if it could be related: microsoft/onnxruntime#14063

@Eichhof
Copy link
Author

Eichhof commented Dec 26, 2022

Thank you very much for the hints. I will test it in the next two days and let you know.

@Eichhof
Copy link
Author

Eichhof commented Dec 29, 2022

Hi @michaelbenayoun and @JingyaHuang
Thank you very much for your help with my problem. I tried CUDAExecutionProvider and the error does not appear anymore, thus I have to check my TensorRT installation. But when using CUDAExecutionProvider, I'm getting the out-of-memory error shown below. I think it is due to the usage of fp32. Is it possible to use fp16?

2022-12-29 13:37:33.3532630 [W:onnxruntime:, session_state.cc:1030 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2022-12-29 13:37:33.3593281 [W:onnxruntime:, session_state.cc:1032 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2022-12-29 13:38:39.0337837 [W:onnxruntime:, session_state.cc:1030 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2022-12-29 13:38:39.0417407 [W:onnxruntime:, session_state.cc:1032 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2022-12-29 13:38:39.5541455 [E:onnxruntime:, inference_session.cc:1500 onnxruntime::InferenceSession::Initialize::<lambda_d67cde18891e9d311739162a2b4aba6d>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\framework\bfc_arena.cc:342 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 67108864

Traceback (most recent call last):
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 242, in <module>
    model = Model_init()
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 48, in Model_init
    model = Model(gradient_checkpointing=False, start_prompt=start_prompt)
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\../../chatbot\gpt_j\model.py", line 61, in __init__
    self.model = ORTModelForCausalLM.from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 552, in from_pretrained
    return super().from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\modeling_base.py", line 325, in from_pretrained
    return from_pretrained_method(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 565, in _from_pretrained
    model = cls.load_model(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 449, in load_model
    decoder_with_past_session = onnxruntime.InferenceSession(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\_work\1\s\onnxruntime\core\framework\bfc_arena.cc:342 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 67108864

@Eichhof
Copy link
Author

Eichhof commented Dec 29, 2022

Now TensorrtExecutionProvider works with the correct installation but it fails when I try to provide provider_options=dict(trt_fp16_enable=1) to enable FP16. Why?

In addition, I'm also getting the same out-of-memory error as above. Probably with FP16 this problem would be solved.

Finally, I'm getting also the warning CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars and I'm getting tons of the following warnings

2022-12-29 15:05:24.0568129 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-12-29 15:05:24.7410563 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped

@fxmarty
Copy link
Contributor

fxmarty commented Dec 29, 2022

@Eichhof Sorry you encounter all those issues. I hope we can really improve the support for TensorRT in the coming days/weeks.

Do you get a

EP Error using ['TensorrtExecutionProvider', 'CUDAExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

when passing provider_options=dict(trt_fp16_enable=1) to the from_pretrained()? I at least do, and submitted a fix in #653

You can safely ignore the warnings:

2022-12-29 15:05:24.0568129 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-12-29 15:05:24.7410563 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped

I recommend you to read the issue #636 if you are using gpt2/gpt-j or alike, it's an issue in transformers and I'll fix ASAP as well.

@fxmarty
Copy link
Contributor

fxmarty commented Dec 29, 2022

@Eichhof did you manage to solve the issue LoadLibrary failed with error 126? In which case, I would prefer to close this issue and open an other one.

@Eichhof
Copy link
Author

Eichhof commented Dec 29, 2022

@fxmarty Yes, I'm getting exactly this warning when passing provider_options=dict(trt_fp16_enable=1). When will the fix be incorporated in a new release?

In Transformers, I'm using low_cpu_mem_usage. Is this also available here?

Do you recommend Cuda lazy loading?

Yes, the error LoadLibrary failed with error 126 is solved. The problem was that TensorRT was not correctly installed.

@fxmarty
Copy link
Contributor

fxmarty commented Dec 30, 2022

Hi, the PR is ready, and should be merged soon in main.

Unfortunately low_cpu_mem_usage is not available when using Optimum/ONNX Runtime.

For Cuda lazy loading, I'm not sure. Given that you get the warning I mentioned above, it's likely CUDAExecutionProvider is actually used.

I'll close this issue for now then, feel free to open one for the cuda lazy loading warning message!

@fxmarty fxmarty closed this as completed Dec 30, 2022
@Eichhof
Copy link
Author

Eichhof commented Jan 10, 2023

@fxmarty I'm still waiting for the merge of the PR. Do you have any updates when this will be the case?

@fxmarty
Copy link
Contributor

fxmarty commented Jan 10, 2023

Hi @Eichhof , it is merged: #653 and you should be able to pass provider_options=dict(trt_fp16_enable=1) . But you will need to use the version from main for this to work, there hasn't been a release yet.

If there is any other problem you encounter, feel free to open an issue, it's helpful for us to improve the lib and keep track!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants