Replies: 1 comment 12 replies
-
Thanks for sharing this report in so much detail. That's really awesome, I wish everyone would do that when reporting issues :). Regarding
Hm, I don't have an explanation for that. But the good news is that you don't need to run TensorFlow on the GPU as we just use it to load the weights. Would knowing that you don't need to use TensorFlow on the GPU help with your issue? (Tbh I haven't really used TensorFlow for model training or inference since 2020 or so) I think the solution you added at the bottom also sounds reasonable
Since we don't need to run it on the GPU, as mentioned above, we should add this to the docker file but as a suggestion so that it doesn't unnecessarily create these large image sizes but it lets people know what's going on so they don't have to spend time on trying out a bunch of things if they want to make it work on a GPU. What do you think? |
Beta Was this translation helpful? Give feedback.
-
First of all, great materials and contributions to the community. Hats off to @rasbt!
Purpose of the discussion
I tried the option of an isolated Docker environment.
Docker environment setup
Dockerfile
originally within the foldersetup/03_optional-docker-environment/.devcontainer
, which is then moved to the project folder, asFROM pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
requirements.txt
astensorflow >= 2.15.0 # ch05
. It ends up with installing the up-to-date versiontensorflow 2.18
.Issues after importing TensorFlow
TensorFlow can't find GPU. Torch can access and use GPU normally before importing tensorflow, but not afterwards.
More environment context
Windows 10, RTX 3060, WSL 2 (Ubuntu 20.04), Nvidia driver, CUDA, cuDNN, Nvidia container toolkit, Docker Desktop, etc. are all set up and tested.
Workaround
TensorFlow 2.18 is tested with CUDA12.5+cuDNN9.3
![image](https://private-user-images.githubusercontent.com/42779903/409635363-087729c6-0fd7-4a03-9400-091e8a366a89.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0MTQyNjUsIm5iZiI6MTczOTQxMzk2NSwicGF0aCI6Ii80Mjc3OTkwMy80MDk2MzUzNjMtMDg3NzI5YzYtMGZkNy00YTAzLTk0MDAtMDkxZThhMzY2YTg5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDAyMzI0NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWEyOWYzZjAyODJiNWY4MzhiNjI1ZjJkNmI3NWZkYmEzOTE1NDhlN2JlODg2MzhmMzM4YjdhNzY5ZTZiYjczYWYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ejtw-JwJhT-OcDvjTJBI51V-gPuwLKlJrXrVAI9awYY)
After searching the
PyTorch/PyTorch
image repo, a compatible combination is selected as,Moreover, an env
LD_LIBRARY_PATH
is added toDockerfile
for TensorFlow to find where the existing CUDA and cuDNN libraries are.With the above updated configuration, both Torch and TensorFlow can access GPU properly.
Install TensorFlow outside of requirements.txt (comment out the tensorflow line), but in Dockerfile:
It seems working, but the resulted image size is too big. Not worth it.
Beta Was this translation helpful? Give feedback.
All reactions