Initializing GPU device takes very long #149

edwinRNDR · 2020-11-15T18:30:28Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): 0.2.0 (2.3.1)
Python version: n/a
Bazel version (if compiling from source): n/a
GCC/Compiler version (if compiling from source): n/a
CUDA/cuDNN version: 10.1
GPU model and memory: nvidia GTX 1050 (mobile)

You can collect some of this information using our environment capture script
You can also obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior
Opening a GPU device takes very long (close to 10 minutes), I am guessing it is compiling CUDA kernels.

Describe the expected behavior
Shorter waiting time when opening the GPU device.

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

2020-11-15 19:12:00.155650: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-15 19:12:00.180012: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-11-15 19:12:00.242239: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.493GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-11-15 19:12:00.242301: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-11-15 19:12:01.568631: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-11-15 19:12:01.643519: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-11-15 19:12:01.712294: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-11-15 19:12:02.439515: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-11-15 19:12:03.066608: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-11-15 19:12:03.722016: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-11-15 19:12:03.722852: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-15 19:19:37.815717: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-15 19:19:37.815971: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-11-15 19:19:37.815998: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-11-15 19:19:37.818334: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2975 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-11-15 19:19:41.439187: W external/org_tensorflow/tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2020-11-15 19:19:41.590266: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll

The text was updated successfully, but these errors were encountered:

rnett · 2021-01-30T02:24:13Z

Running into this on Linux as well, for 0.2.0 and 0.3.0-SNAPSHOT.

Craigacp · 2021-01-30T03:13:32Z

@rnett what device do you have?

Craigacp · 2021-01-30T03:19:30Z

TF compiles the kernels for compute levels that it doesn't ship with, and we only compile for 3.5 and 7.0 because the build times out on Github Actions otherwise. You can change this line https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/build.sh#L27 to whatever compute levels you want and then do a full rebuild to get binaries for your specific use case.

rnett · 2021-01-30T03:33:51Z

Yeah, I just figured that out (#200). If it is build times I guess there's not much we can do.

I'm using a 1070.

edwinRNDR · 2021-01-30T11:11:01Z

I am just wondering if compute levels 3.5 and 7.0 are the most sensible choices then. I'd expect by now 6.1 is much more common than 3.5.

I also doubt building tensorflow/java yourself is far from the nicest user experience one can offer. Is it possible to break-up the build into multiple actions? Github actions have a 6 hour limit, a workflow has a 72 hour limit.

Craigacp · 2021-01-30T14:04:51Z

We've been trying to get more build resources for over a year, when we do it'll be simple to build for more GPU targets. At the moment this gives us coverage over the things tf python supports at the cost of slow initialization for many users.

rnett mentioned this issue Jan 30, 2021

Update GPU Compute Capacity support to match tensorflow #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initializing GPU device takes very long #149

Initializing GPU device takes very long #149

edwinRNDR commented Nov 15, 2020

rnett commented Jan 30, 2021

Craigacp commented Jan 30, 2021

Craigacp commented Jan 30, 2021

rnett commented Jan 30, 2021 •

edited

Loading

edwinRNDR commented Jan 30, 2021

Craigacp commented Jan 30, 2021

Initializing GPU device takes very long #149

Initializing GPU device takes very long #149

Comments

edwinRNDR commented Nov 15, 2020

rnett commented Jan 30, 2021

Craigacp commented Jan 30, 2021

Craigacp commented Jan 30, 2021

rnett commented Jan 30, 2021 • edited Loading

edwinRNDR commented Jan 30, 2021

Craigacp commented Jan 30, 2021

rnett commented Jan 30, 2021 •

edited

Loading