fix: update/pin dependencies to get ONNX runtime working again #107

tjohnson31415 · 2024-07-31T21:09:57Z

Motivation

Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN:

Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory

I found that ORT 1.18.1 started to build against cudnn 9 (included in the release notes). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing.

Modifications

pin the onnxruntime version to 1.18.0
pin transformers to 4.40.2 (and remove separate pip install for it)
pin optimum to 1.20
run poetry update to update poetry.lock

Result

DEPLOYMENT_FRAMEWORK=hf_optimum_ort will start working again and internal tests will be passing.

In 1.18.1, the runtime packages are built against cudnn 9. PyTorch does not use cudnn 9 until 2.4.0, so we hold back onnxruntime for now Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 changed the title ~~fix:~~ fix: onnxruntime usage has broken dependency on cudnn Jul 31, 2024

tjohnson31415 changed the title ~~fix: onnxruntime usage has broken dependency on cudnn~~ fix: onnxruntime is broken due to dependency on cudnn Jul 31, 2024

joerunde approved these changes Jul 31, 2024

View reviewed changes

tjohnson31415 added 6 commits August 1, 2024 14:00

fix: set onnxruntime version to 1.18.0

dac11c5

In 1.18.1, the runtime packages are built against cudnn 9. PyTorch does not use cudnn 9 until 2.4.0, so we hold back onnxruntime for now Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

deps: update poetry deps

4833f12

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

cleanup: remove transformers version override

3100c86

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

deps: hold back numpy from 2.0

d71d679

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

deps: pin transformers to prevent breakage form 4.41

5b8a283

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

deps: pin optimum too...

1c33e9d

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

tjohnson31415 force-pushed the set-onnx-version branch from 6ee954d to 1c33e9d Compare August 1, 2024 20:00

tjohnson31415 changed the title ~~fix: onnxruntime is broken due to dependency on cudnn~~ fix: update/pin dependencies to get ONNX runtime working again Aug 1, 2024

tjohnson31415 requested a review from joerunde August 1, 2024 21:22

tjohnson31415 merged commit 015070b into main Aug 5, 2024
7 checks passed

tjohnson31415 deleted the set-onnx-version branch August 5, 2024 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update/pin dependencies to get ONNX runtime working again #107

fix: update/pin dependencies to get ONNX runtime working again #107

tjohnson31415 commented Jul 31, 2024 •

edited

Loading

fix: update/pin dependencies to get ONNX runtime working again #107

fix: update/pin dependencies to get ONNX runtime working again #107

Conversation

tjohnson31415 commented Jul 31, 2024 • edited Loading

Motivation

Modifications

Result

tjohnson31415 commented Jul 31, 2024 •

edited

Loading