-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update/pin dependencies to get ONNX runtime working again #107
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tjohnson31415
changed the title
fix:
fix: onnxruntime usage has broken dependency on cudnn
Jul 31, 2024
tjohnson31415
changed the title
fix: onnxruntime usage has broken dependency on cudnn
fix: onnxruntime is broken due to dependency on cudnn
Jul 31, 2024
joerunde
approved these changes
Jul 31, 2024
In 1.18.1, the runtime packages are built against cudnn 9. PyTorch does not use cudnn 9 until 2.4.0, so we hold back onnxruntime for now Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
tjohnson31415
force-pushed
the
set-onnx-version
branch
from
August 1, 2024 20:00
6ee954d
to
1c33e9d
Compare
tjohnson31415
changed the title
fix: onnxruntime is broken due to dependency on cudnn
fix: update/pin dependencies to get ONNX runtime working again
Aug 1, 2024
dtrifiro
pushed a commit
to dtrifiro/text-generation-inference
that referenced
this pull request
Sep 13, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro
pushed a commit
to dtrifiro/text-generation-inference
that referenced
this pull request
Sep 13, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro
pushed a commit
to opendatahub-io/text-generation-inference
that referenced
this pull request
Sep 16, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro
pushed a commit
to opendatahub-io/text-generation-inference
that referenced
this pull request
Sep 16, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro
pushed a commit
to opendatahub-io/text-generation-inference
that referenced
this pull request
Sep 17, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
dtrifiro
pushed a commit
to opendatahub-io/text-generation-inference
that referenced
this pull request
Sep 17, 2024
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN: ``` Shard 0: 2024-07-31 19:38:04.423164988 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.9: cannot open shared object file: No such file or directory ``` I found that ORT 1.18.1 started to build against cudnn 9 (included in the [release notes](https://github.com/Microsoft/onnxruntime/releases/tag/v1.18.1)). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing. - pin the onnxruntime version to 1.18.0 - pin transformers to 4.40.2 (and remove separate `pip install` for it) - pin optimum to 1.20 - run `poetry update` to update poetry.lock `DEPLOYMENT_FRAMEWORK=hf_optimum_ort` will start working again and internal tests will be passing. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Internal regression tests are failing when using the ONNX Runtime with an error indicating a dependency issue with ONNX Runtime and cuDNN:
I found that ORT 1.18.1 started to build against cudnn 9 (included in the release notes). However, PyTorch does not use cudnn 9 until 2.4.0, so I pinned in to 1.18.0. In updating poetry.lock, I let other deps update as well, but found other compatibility issue and had to pin transformers and optimum as well to get internal tests passing.
Modifications
pip install
for it)poetry update
to update poetry.lockResult
DEPLOYMENT_FRAMEWORK=hf_optimum_ort
will start working again and internal tests will be passing.