-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low GPU utilization with tfjs-node-gpu #468
Comments
Hi Brannon, Apologies for the delay - I was out on holiday. So for some neural networks, GPU can actually be slower than regular CPU usage. This happens because there is a cost to copy tensor data from local storage over to GPU memory. The baseball network is a simple 4-layer network of nothing more than relus and a sigmoid at the end. These type of networks are slower on GPU because of all the copying to GPU memory. If you want to take advantage of GPU, using a network that has some level of pooling and/or convolutions. For example, in the tfjs-examples repo, we have a MNIST example that runs entirely on Node: https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node This runs super fast on the GPU since convolutions are nice and optimized for CUDA operations. Trying running that example with your watch of nvidia-sim tool. |
Ah, I see. Running the If my memory serves me correctly Python Tensorflow gives the option to specify CPU/GPU device specifically. Does no such functionality exist in |
Python + Keras will use a graph-based execution which can run faster on GPU. We use the Eager-style API from TensorFlow which does not actually have a graph of placeholders - it allocates new Tensors for op-output. This is probably why you see Keras utilizing the GPU more. We do not have a device API yet - it's something we're considering down the road once we introduce TPU support. For now, we default to Tensor placement using the default settings in TensorFlow eager (i.e. copy all non-int32 tensors). |
Gotcha. Thanks for that clarification. I've revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I'm understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct? Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I'm missing something, this "Eager mode only" behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras? I ask because I'm writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I'm curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow? |
@nkreeger, any thoughts on a few of these last questions? |
@brannondorsey My opinion on the last question: tfjs won't be as fast as tfpy. |
@nkreeger something wrong with training using TFJS Node GPU. It's still training on CPU. I'm using 2080ti. This is MNIST example from tfjs repository
|
I'm seeing epoch cycles take over double the time on my GPU when compared to my CPU. Is there any way to improve this? |
Sorry to piggyback on this issue, but I think am having a similar problem. I'm trying to use I see this in my terminal when I run my script (https://gist.github.com/jeffcrouse/750f26afdaedb4d6cd0a523ed591dccc):
And I see a spike in my GPU usage, but it's still SUPER SLOW. To be totally honest, I can't follow most of the discussion above, but I'm curious if anyone can explain to me why the browser WebGL performance would be 130x better than GPU-backed node? I understand the idea of copying data to the GPU being slow, but why isn't this an issue for the browser? Thanks in advance! |
We actually experience the same. Running our model on CPU takes ~400ms, running it on GPU takes ~3000ms. This happens on a server with two NVIDIA GeForce RTX 3090 and cuda 11.6 with cudnn 8.3. Relevant logs:
I can confrim that cuda is installed well as I am able to utilize it with several other tools correctly. This does not happen in the browser though, running on WebGL is way faster than CPU inference. UPDATE: I actually have to admit, that I was only testing these by only doing 1 inference instead of 100s or 1000s. I created test suites for larger magnitudes of inference, and it's actually true that copying the model to GPU memory is what takes a lot of time. After that's done, GPU inference is way faster than CPU inference: GPU info:
CPU info:
Following were the results for averaging 100 inferences on a hot GPU (model is loaded to GPU memory and not disposed between
|
TensorFlow.js version
Browser version
N/A. Node v8.9.4. Ubuntu 16.04
Describe the problem or feature request
Using
tfjs-node-gpu
, I can't seem to get GPU utilization above ~0-3%. I have CUDA 9 and CuDNN 7.1 installed, am importing@tensorflow/tfjs-node-gpu
, and am setting the "tensorflow" backend withtf.setBackend('tensorflow')
. CPU usage is at 100% on one core, but GPU utilization is practically none. I've triedtfjs-examples/baseball-node
(replacingimport'@tensorflow/tfjs-node'
withimport'@tensorflow/tfjs-node-gpu'
of course) as well as my own custom LSTM code. Doestfjs-node-gpu
actually run processes on the GPU?Code to reproduce the bug / link to feature request
Now open another terminal and watch GPU usage. Note that if you are running the process on the same GPU as an X window server GPU usage will likely be greater than 3% because of that process. I've tested this on a dedicated GPU running no other processes using the
CUDA_VISIBLE_DEVICES
env var.# monitor GPU utilization watch -n 0.1 nvidia-smi
The text was updated successfully, but these errors were encountered: