(ONNXRuntimeError) LoadLibrary failed with error 126 #618

Eichhof · 2022-12-19T23:44:02Z

System Info

Optimum: 1.5.1
Python: 3.10.4
Platform: Windows 10
Cuda: 11.6

Who can help?

@JingyaHuang @echarlaix

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I Installed optimum with pip install optimum[onnxruntime-gpu]. Then I was running python -m optimum.exporters.onnx --task causal-lm-with-past --model EleutherAI/gpt-j-6B gptj_onnx/ to transform GPT-J to ONNX. To use the model, I used the following lines:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("C:/Users/myUsername/Desktop/gptj_onnx", pad_token=gpt_eos, eos_token=gpt_eos, truncation_side='left')
model = ORTModelForCausalLM.from_pretrained(
            "C:/Users/myUsername/Desktop/gptj_onnx",
            provider="TensorrtExecutionProvider",
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            use_cache=True,
            gradient_checkpointing=gradient_checkpointing
)

When running these lines of code, I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 349, in <module>
    model = Model_init()
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 166, in Model_init
    model = Model(gradient_checkpointing=False, start_prompt=start_prompt)
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\../../chatbot\gpt_j\model.py", line 58, in __init__
    self.model = ORTModelForCausalLM.from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 269, in from_pretrained
    return super().from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\modeling_base.py", line 266, in from_pretrained
    return cls._from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 324, in _from_pretrained
    model = ORTModel.load_model(os.path.join(model_id, subfolder, model_file_name), **kwargs)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 216, in load_model
    return ort.InferenceSession(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1069 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll"

I have installed Cuda 11.6 and also cuDNN 8.7.0.

Expected behavior

The model should load correctly without an error.

The text was updated successfully, but these errors were encountered:

michaelbenayoun · 2022-12-20T17:25:26Z

Hi @Eichhof,
Does it work with the CUDAExecutionProvider?

JingyaHuang · 2022-12-20T17:43:50Z

Hi @Eichhof,
Can you also check your TensorRT installation with the steps in our doc and give us the version you are using? Thx.

fxmarty · 2022-12-26T10:07:47Z

Hi @Eichhof , just saw this one in ONNX Runtime issues, I'm wondering if it could be related: microsoft/onnxruntime#14063

Eichhof · 2022-12-26T12:37:06Z

Thank you very much for the hints. I will test it in the next two days and let you know.

Eichhof · 2022-12-29T12:41:40Z

Hi @michaelbenayoun and @JingyaHuang
Thank you very much for your help with my problem. I tried CUDAExecutionProvider and the error does not appear anymore, thus I have to check my TensorRT installation. But when using CUDAExecutionProvider, I'm getting the out-of-memory error shown below. I think it is due to the usage of fp32. Is it possible to use fp16?

2022-12-29 13:37:33.3532630 [W:onnxruntime:, session_state.cc:1030 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2022-12-29 13:37:33.3593281 [W:onnxruntime:, session_state.cc:1032 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2022-12-29 13:38:39.0337837 [W:onnxruntime:, session_state.cc:1030 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2022-12-29 13:38:39.0417407 [W:onnxruntime:, session_state.cc:1032 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2022-12-29 13:38:39.5541455 [E:onnxruntime:, inference_session.cc:1500 onnxruntime::InferenceSession::Initialize::<lambda_d67cde18891e9d311739162a2b4aba6d>::operator ()] Exception during initialization: D:\a\_work\1\s\onnxruntime\core\framework\bfc_arena.cc:342 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 67108864

Traceback (most recent call last):
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 242, in <module>
    model = Model_init()
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\server.py", line 48, in Model_init
    model = Model(gradient_checkpointing=False, start_prompt=start_prompt)
  File "C:\Users\myUsername\PycharmProjects\chatbot\server\../../chatbot\gpt_j\model.py", line 61, in __init__
    self.model = ORTModelForCausalLM.from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_ort.py", line 552, in from_pretrained
    return super().from_pretrained(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\modeling_base.py", line 325, in from_pretrained
    return from_pretrained_method(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 565, in _from_pretrained
    model = cls.load_model(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\optimum\onnxruntime\modeling_decoder.py", line 449, in load_model
    decoder_with_past_session = onnxruntime.InferenceSession(
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "C:\Users\myUsername\Anaconda3\envs\huggingface\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 395, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\_work\1\s\onnxruntime\core\framework\bfc_arena.cc:342 onnxruntime::BFCArena::AllocateRawInternal Failed to allocate memory for requested buffer of size 67108864

Eichhof · 2022-12-29T14:48:55Z

Now TensorrtExecutionProvider works with the correct installation but it fails when I try to provide provider_options=dict(trt_fp16_enable=1) to enable FP16. Why?

In addition, I'm also getting the same out-of-memory error as above. Probably with FP16 this problem would be solved.

Finally, I'm getting also the warning CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars and I'm getting tons of the following warnings

2022-12-29 15:05:24.0568129 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-12-29 15:05:24.7410563 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped

fxmarty · 2022-12-29T14:53:52Z

@Eichhof Sorry you encounter all those issues. I hope we can really improve the support for TensorRT in the coming days/weeks.

Do you get a

EP Error using ['TensorrtExecutionProvider', 'CUDAExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

when passing provider_options=dict(trt_fp16_enable=1) to the from_pretrained()? I at least do, and submitted a fix in #653

You can safely ignore the warnings:

2022-12-29 15:05:24.0568129 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2022-12-29 15:05:24.7410563 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-12-29 14:05:24 WARNING] external\onnx-tensorrt\onnx2trt_utils.cpp:395: One or more weights outside the range of INT32 was clamped

I recommend you to read the issue #636 if you are using gpt2/gpt-j or alike, it's an issue in transformers and I'll fix ASAP as well.

fxmarty · 2022-12-29T15:17:42Z

@Eichhof did you manage to solve the issue LoadLibrary failed with error 126? In which case, I would prefer to close this issue and open an other one.

Eichhof · 2022-12-29T16:18:54Z

@fxmarty Yes, I'm getting exactly this warning when passing provider_options=dict(trt_fp16_enable=1). When will the fix be incorporated in a new release?

In Transformers, I'm using low_cpu_mem_usage. Is this also available here?

Do you recommend Cuda lazy loading?

Yes, the error LoadLibrary failed with error 126 is solved. The problem was that TensorRT was not correctly installed.

fxmarty · 2022-12-30T15:28:49Z

Hi, the PR is ready, and should be merged soon in main.

Unfortunately low_cpu_mem_usage is not available when using Optimum/ONNX Runtime.

For Cuda lazy loading, I'm not sure. Given that you get the warning I mentioned above, it's likely CUDAExecutionProvider is actually used.

I'll close this issue for now then, feel free to open one for the cuda lazy loading warning message!

Eichhof · 2023-01-10T16:32:41Z

@fxmarty I'm still waiting for the merge of the PR. Do you have any updates when this will be the case?

fxmarty · 2023-01-10T17:23:34Z

Hi @Eichhof , it is merged: #653 and you should be able to pass provider_options=dict(trt_fp16_enable=1) . But you will need to use the version from main for this to work, there hasn't been a release yet.

If there is any other problem you encounter, feel free to open an issue, it's helpful for us to improve the lib and keep track!

Eichhof added the bug Something isn't working label Dec 19, 2022

fxmarty mentioned this issue Dec 20, 2022

Add GPT-J normalized config #623

Merged

fxmarty closed this as completed Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

Eichhof commented Dec 19, 2022

michaelbenayoun commented Dec 20, 2022

JingyaHuang commented Dec 20, 2022

fxmarty commented Dec 26, 2022

Eichhof commented Dec 26, 2022

Eichhof commented Dec 29, 2022

Eichhof commented Dec 29, 2022 •

edited

Loading

fxmarty commented Dec 29, 2022 •

edited

Loading

fxmarty commented Dec 29, 2022 •

edited

Loading

Eichhof commented Dec 29, 2022

fxmarty commented Dec 30, 2022

Eichhof commented Jan 10, 2023

fxmarty commented Jan 10, 2023 •

edited

Loading

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

(ONNXRuntimeError) LoadLibrary failed with error 126 #618

Comments

Eichhof commented Dec 19, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

michaelbenayoun commented Dec 20, 2022

JingyaHuang commented Dec 20, 2022

fxmarty commented Dec 26, 2022

Eichhof commented Dec 26, 2022

Eichhof commented Dec 29, 2022

Eichhof commented Dec 29, 2022 • edited Loading

fxmarty commented Dec 29, 2022 • edited Loading

fxmarty commented Dec 29, 2022 • edited Loading

Eichhof commented Dec 29, 2022

fxmarty commented Dec 30, 2022

Eichhof commented Jan 10, 2023

fxmarty commented Jan 10, 2023 • edited Loading

Eichhof commented Dec 29, 2022 •

edited

Loading

fxmarty commented Dec 29, 2022 •

edited

Loading

fxmarty commented Dec 29, 2022 •

edited

Loading

fxmarty commented Jan 10, 2023 •

edited

Loading