Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow GPU install #1697

Closed
ernimd opened this issue Jul 30, 2024 · 5 comments · Fixed by #1704
Closed

Tensorflow GPU install #1697

ernimd opened this issue Jul 30, 2024 · 5 comments · Fixed by #1704
Labels
✨ enhancement Feature request

Comments

@ernimd
Copy link

ernimd commented Jul 30, 2024

Problem description

Hi, in the docs of tensorflow they advice to use:

python3 -m pip install tensorflow[and-cuda]

Any idea how to reproduce this with pixi ?

@ernimd ernimd added the ✨ enhancement Feature request label Jul 30, 2024
@ruben-arts
Copy link
Contributor

ruben-arts commented Jul 30, 2024

The exact equivalent is:

pixi init project
cd project
pixi add python
pixi add "tensorflow[and-cuda]" --pypi

You could use conda dependencies for Tensorflow as well.

Start project:

pixi init project
cd project
pixi add python

Add a system requirement on cuda to the pixi.toml, e.g.:

[system-requirements]
cuda = "12.0"

Add tensorflow

pixi add tensorflow

Because of the cuda requirement pixi will find the GPU version of Tensorflow.

@ernimd
Copy link
Author

ernimd commented Jul 30, 2024

This doesn't work in my case... There is a GPU on the machine and the drivers are installed. I can work around this with simply initializing the environment with pip in it and running python3 -m pip install tensorflow[and-cuda] manually. That works and tensorflow recognizes the GPU.

[something]$ pixi add "tensorflow[and-cuda]" --pypi
 WARN Defined custom mapping channel https://conda.anaconda.org/nvidia/ is missing from project channels
✔ Added tensorflow[and-cuda]
Added these as pypi-dependencies.

[something]$ cat pixi.toml 
[project]
name = "something"
version = "0.1.0"
description = "Add a short description here"
channels = ["conda-forge"]
platforms = ["linux-64"]

[dependencies]
python = "<3.12"

[pypi-dependencies]
tensorflow = ">=2.17.0, <3"

[something]$ pixi shell
 . "/tmp/pixi_env_0XQ.sh"
[something]$  . "/tmp/pixi_env_0XQ.sh"

(something) [something]$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2024-07-30 13:30:05.468202: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-30 13:30:05.522137: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-30 13:30:05.585031: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-30 13:30:05.644926: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-30 13:30:05.660807: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-30 13:30:05.751123: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-30 13:30:13.635456: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-07-30 13:30:19.458826: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
(something) [something]$ 

@ruben-arts
Copy link
Contributor

I see what went wrong, you have pointed out an error. We don't parse the pypi dependencies correct anymore. Thus the extras we're not added.

This seems to work for me:

[project]
channels = ["conda-forge"]
name = "tensorflow-cuda"
platforms = ["linux-64"]

[dependencies]
python = "3.11.*"

[system-requirements]
cuda = "12.0"

[pypi-dependencies]
# Note the extras field
tensorflow = { version= ">=2.17.0, <3", extras = ["and-cuda"] }

But to make it non pypi dependent you can do:

[project]
channels = ["conda-forge"]
name = "tensorflow-cuda"
platforms = ["linux-64"]

[dependencies]
python = "3.11.*"
tensorflow = ">=2.17.0, <3"

[system-requirements]
# Because of this requirement, pixi will make sure the cuda version of tensorflow is installed.
cuda = "12.0"

Which is what I would advice.

Ps. this is the resulting command:

❯ pixi run python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
2024-07-30 14:43:39.884398: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-30 14:43:39.894619: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-30 14:43:39.906955: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-30 14:43:39.911209: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-30 14:43:39.920241: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1722343421.099386  282735 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722343421.130422  282735 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1722343421.130589  282735 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Not sure what all this output means but it seems to find my physical_device:GPU:0, which is an nvidia gpu

@ernimd
Copy link
Author

ernimd commented Jul 30, 2024

We don't parse the pypi dependencies correct anymore.

I could give it a shot and fix it ?

@ruben-arts
Copy link
Contributor

Ah sorry missed your proposal! @ernimd Thanks anyway!

@ernimd ernimd closed this as completed Jul 31, 2024
ruben-arts added a commit that referenced this issue Aug 1, 2024
Fixes #1697 
We broke `pixi add --pypi pytest[dev]` as it didn't add the [dev]
anymore. Now that is fixed.

---------

Co-authored-by: Tim de Jager <tdejager89@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants