TensorFlow and PyTorch Compatibility Issue in Docker Environment #513

rbt-c · 2025-02-04T16:24:44Z

rbt-c
Feb 4, 2025

First of all, great materials and contributions to the community. Hats off to @rasbt!

Purpose of the discussion

Confirm that the issue I encountered is not singular
Find a permanent or better solution than my workaround

I tried the option of an isolated Docker environment.

Docker environment setup

The base image is specified in Dockerfile originally within the folder setup/03_optional-docker-environment/.devcontainer, which is then moved to the project folder, as FROM pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
TensorFlow version is specified in requirements.txt as tensorflow >= 2.15.0 # ch05. It ends up with installing the up-to-date version tensorflow 2.18.

Issues after importing TensorFlow

TensorFlow can't find GPU. Torch can access and use GPU normally before importing tensorflow, but not afterwards.

More environment context

Windows 10, RTX 3060, WSL 2 (Ubuntu 20.04), Nvidia driver, CUDA, cuDNN, Nvidia container toolkit, Docker Desktop, etc. are all set up and tested.

GPU is accessible to TensorFlow in the native Windows environment
GPU is accessible to TensorFlow in WSL 2
GPU is accessible to TensorFlow in other Docker containers

Workaround

Select compatible PyTorch base image and TensorFlow version

TensorFlow 2.18 is tested with CUDA12.5+cuDNN9.3

After searching the PyTorch/PyTorch image repo, a compatible combination is selected as,

# base image in Dockerfile
FROM pytorch/pytorch:2.2.0-cuda11.8-cudnn8-runtime

# Tensorflow package in requirements.txt
tensorflow == 2.14.0  # ch05

Moreover, an env LD_LIBRARY_PATH is added to Dockerfile for TensorFlow to find where the existing CUDA and cuDNN libraries are.

ENV LD_LIBRARY_PATH=/opt/conda/lib:/opt/conda/lib/python3.10/site-packages/torch/lib:${LD_LIBRARY_PATH}

With the above updated configuration, both Torch and TensorFlow can access GPU properly.

Install TensorFlow with GPU supporting packages

Install TensorFlow outside of requirements.txt (comment out the tensorflow line), but in Dockerfile:

RUN pip install tensorflow[and-cuda]==2.14

It seems working, but the resulted image size is too big. Not worth it.

rasbt · 2025-02-04T19:40:39Z

rasbt
Feb 4, 2025
Maintainer

Thanks for sharing this report in so much detail. That's really awesome, I wish everyone would do that when reporting issues :).

Regarding

TensorFlow can't find/access GPU

Hm, I don't have an explanation for that. But the good news is that you don't need to run TensorFlow on the GPU as we just use it to load the weights. Would knowing that you don't need to use TensorFlow on the GPU help with your issue? (Tbh I haven't really used TensorFlow for model training or inference since 2020 or so)

I think the solution you added at the bottom also sounds reasonable

RUN pip install tensorflow[and-cuda]==2.14
It seems working, but the resulted image size is too big. Not worth it.

Since we don't need to run it on the GPU, as mentioned above, we should add this to the docker file but as a suggestion so that it doesn't unnecessarily create these large image sizes but it lets people know what's going on so they don't have to spend time on trying out a bunch of things if they want to make it work on a GPU. What do you think?

12 replies

d-kleine Feb 5, 2025

Windows 10, RTX 3060, WSL 2 (Ubuntu 20.04), Nvidia driver, CUDA, cuDNN, Nvidia container toolkit, Docker Desktop, etc. are all set up and tested.

Please also make sure to update Docker to the newest version and add the Nvidia Container Toolkit to the Docker daemon config as described here:

{
  "builder": {
    "gc": {
      "defaultKeepStorage": "20GB",
      "enabled": true
    }
  },
  "experimental": false,
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

rbt-c Feb 5, 2025
Author

More tests are conducted. My guessed conclusion first: it seems to be a singular case due to the specific combination of different software packages/libraries/frameworks on my Windows 10 laptop.

I used another Windows desktop computer (Windows 11, RTX 3060, WSL2 (Ubuntu 22.04) with Nvidia CUDA toolkit installed directly on the distro - as @d-kleine mentioned in his message, ...). The ch05.ipynb works smoothly with the original image configuration for PyTorch2.5+CUDA12.4+cuDNN9 and TensorFlow 2.18. I will try @d-kleine's suggestion to work directly on WSL distro instead of the isolated Docker environment to save some overhead resources.

Just for curiosity: pip uninstall/install torch torhcvision torchaudio didn't seem to change the result. Anyway, we can move forward with this issue - it shouldn't be there if everything is up to date.

Thank you for your time @rasbt and @d-kleine!

rasbt Feb 5, 2025
Maintainer

Awesome, glad to hear that this works for you now!

@d-kleine I would say let's not change the image with TensorFlow CUDA support because it's not needed to run TensorFlow on GPUs, and it's just so much larger in terms of storage requirements.

d-kleine Feb 6, 2025

Alright, thanks. Yeah, I think RuntimeError: CUDA error: named symbol not found is a CUDA-related issue. In the container, you could run

nvcc --version

to check which CUDA version is installed.

I have also updated the Docker readme with #518

rbt-c Feb 6, 2025
Author

thanks again, gentlemen @rasbt and @d-kleine. I am going to close the discussion and mark it as resolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow and PyTorch Compatibility Issue in Docker Environment #513

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

TensorFlow and PyTorch Compatibility Issue in Docker Environment #513

rbt-c Feb 4, 2025

Purpose of the discussion

Docker environment setup

Issues after importing TensorFlow

More environment context

Workaround

Replies: 1 comment · 12 replies

rasbt Feb 4, 2025 Maintainer

d-kleine Feb 5, 2025

rbt-c Feb 5, 2025 Author

rasbt Feb 5, 2025 Maintainer

d-kleine Feb 6, 2025

rbt-c Feb 6, 2025 Author

rbt-c
Feb 4, 2025

Replies: 1 comment 12 replies

rasbt
Feb 4, 2025
Maintainer

rbt-c Feb 5, 2025
Author

rasbt Feb 5, 2025
Maintainer

rbt-c Feb 6, 2025
Author