-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drastic drop in GPU speed after approximately 10 games #80
Comments
Hi! Speed drop after first several games is normal, as in the beginning, no
training is done, as we're populating replay buffer. Training is the most
heavy operation, so, 5-10x slowdown is ok.
But, of course it doesn't explain why you're getting slower speed than in
the book. It could be wide variety of reasons for that: slower card (I've
used gtx 1080ti), wrong drivers setup or just overheating.
пт, 3 июл. 2020 г., 8:37 alanballard <notifications@github.com>:
… I'm getting an unexpected, but reliable, drop in GPU speed after running
11-ish games of Chapter07/01_dqn_basic.py using the --cuda option. For the
first 10 games, I get speeds comparable with the textbook, but speed halves
in game 11, and then down to a third in games 12+. I get similar behavior
when running Chapter06/02_dqn.pong.py on the GPU too. This happens every
time I run the code.
I'm running Python 3.6.10 on Windows 10. I'm using all the textbook's
required packages, with the exception of Pytorch. Textbook recommends
pytorch==0.4.0, but I couldn't get it to run so I installed pytorch==1.1.0
with cudatoolkits==10.0. I'm using an NVIDIA TITAN RTX, and my PC has 128GB
ram.
I *also* get similar speeds (~400 f/s) when I run the code with*OUT* the
--cuda option for the first 10 or so games but then it drops to around 40
f/s after that.
I have no idea what could be the cause of this. I'm relatively new to RL,
CUDA and Python, so I'm not sure if it's a problem with the example code or
something on my end.
Any ideas? Has anyone else reported this, or is it just me?
[image: 01_dqn_basic py cuda performance]
<https://user-images.githubusercontent.com/39736471/86434283-f5f61580-bcb1-11ea-85ec-723734860c57.PNG>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#80>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAQE2VSO7O7YJEVPOFWXWLRZVVC3ANCNFSM4OPPWUTQ>
.
|
Thank you for your reply. It's been a struggle since I'm learning Python and RL at the same time (not too mention debugging GPU issues), but I've really enjoyed your book so far. I'm using a NVIDIA TITAN RTX, so I would expect the performance to be at least as good as a GTX 1080ti, if not better. I've updated the drivers but there is no change in performance. If I use the --cuda option, my GPU usage never exceeds 3% and my CPU usage is approximately 60% for all games. However, the speed drops from ~400 f/s to ~80 f/s after (approximately) game 10 or 11. If I do not use the --cuda option, then my GPU usage is about 1% (same as internet browsing) and my CPU goes to 100% after game 10. As before, the speed for games #1-#10 is ~400 f/s, but without the cuda option, the speed drops to ~10 f/s after game 10 or 11. So, for the first 10 games, I can achieve performance close to the textbook's whether I use the --cuda option or not. In either case, the speed dramatically drops after game 10, and my GPU usage is never greater than 3% regardless of which option I choose. This behavior is almost identical to that reported here: Maxim (or anyone else who might be reading this), when you have time, would you mind running Chapter07/01_dqn_basic.py --cuda from the 1st edition of the book and answer these questions:
I'm not convinced that it's a bad thing that my GPU usage is so low. It's possible that even with maximum parallelization using the current code, I simply can't use more than 3% of my available GPU resources playing pong. If that's the case, then there may be another reason that the speed is so slow during training. I don't think there's any problem with your code, but it may be an issue of old packages vs. new Windows/GPU specs, or maybe old packages doing something unwelcome in my environment (like unnecessarily copying tensors). I'm going to clone the git for your 2nd edition book, create a Python environment with the new package requirements and re-test the 2nd-edition version of the code there. That should at least let me know whether it's a package version issue. Thank you for your help. |
Below are benchmarks on my hardware (1080Ti, nvidia drivers 440.100, cuda 10.2, ubuntu) of the first edition code:
During the training, GPU utilisation is about 35%
Without I see your hardware is much better than mine (my CPU is i5-6600k, system has 32GB of RAM, so, numbers should be better. I'd start with general system/gpu troubleshooting by running standard deep learning benchmarks (like this one: https://github.com/ryujaehun/pytorch-gpu-benchmark) and comparing the numbers on them. You might also try to take Chapter09 examples from the second edition. This chapter is devoted to apply GPU/PyTorch tricks to speed up pong game, so, it has plenty of numbers to compare with. |
I'm getting an unexpected, but reliable, drop in GPU speed after running 11-ish games of Chapter07/01_dqn_basic.py using the --cuda option. For the first 10 games, I get speeds comparable with the textbook, but speed halves in game 11, and then down to a third in games 12+. I get similar behavior when running Chapter06/02_dqn.pong.py on the GPU too. This happens every time I run the code.
I'm running Python 3.6.10 on Windows 10. I'm using all the textbook's required packages, with the exception of Pytorch. Textbook recommends pytorch==0.4.0, but I couldn't get it to run so I installed pytorch==1.1.0 with cudatoolkits==10.0. I'm using an NVIDIA TITAN RTX, and my PC has 128GB ram.
I also get similar speeds (~400 f/s) when I run the code withOUT the --cuda option for the first 10 or so games but then it drops to around 10 f/s after that.
I have no idea what could be the cause of this. I'm relatively new to RL, CUDA and Python, so I'm not sure if it's a problem with the example code or something on my end.
Any ideas? Has anyone else reported this, or is it just me?
The text was updated successfully, but these errors were encountered: