What hardware is everyone using? Any system bottlenecks? #291
RABITtranscription
started this conversation in
General
Replies: 1 comment
-
You can try cloud servers on Windows Server with different Tesla GPU's. It's not just the cores, Tesla GPUs are generally faster in CUDA tasks. According to my tests, the best price-performance ratio was demonstrated by Tesla T4. Also note that computer specifications such as CPU, RAM, SSD drive also have an impact. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
I'm wondering what hardware people are running this program on and what their experiences have been with system bottlenecks. I'm trying to solve a specific issue I'm running into, so I'll give the background on my setup.
I'm using whisper-standalone to do legal transcription for myself and coworkers. It's really been a lifesaver because our workflow used to involve having to watch criminal trials and type our own transcripts. I have written some simple Python code for a GUI that calls whisper-standalone from a command line to do the heavy lifting of the transcription, then takes the SRT output and makes it into a usable transcript with timestamps (by reference to the filenames, which contain the date and start time of the video).
When I discovered this project, I didn't have a machine with an NVIDIA GPU, so I had to build one. I ended up with an Intel i9-12900K CPU, 32 gb of DDR5 and a 4070 for the GPU. This is my home machine, but I also built one for the office with old parts. It started life with a AMD FX-8350 and a used mining GPU from eBay (a P102, which specs out similar to a Titan XP, I think). So I had a fast machine and a slow machine. I eventually upgraded the slow machine, first to a Xeon 2680 v4 on a Chinese motherboard.
I'm using a 13-hour trial record as my "benchmark" record. The P102 (3200 cuda cores) machine takes 3 hours, 40 minutes to transcribe, while the machine with the 4070 (5888 cuda cores) takes just 39 minutes.
I decided that I wanted the office machine to be faster, so upgraded to a 4070 super (7168 cuda cores). I expected the GPU upgrade to make a big difference, but it became clear that the old Xeon 2680 v4 was a bottleneck. I dropped from 3 hours 40 minutes to 2 hours, 23 minutes, but nowhere close to the 39 minutes I was getting at home on my benchmark. I was maxing out CPU usage on one or two cores and so I assumed that the processor must be the bottleneck. So I upgraded to an AM5 chip, specifically the 7600X. It's less cores than the Intel i9-12900K, but I figured the single core speeds might be similar enough. With the 7600X, the 4070 super now does the benchmark transcript in 1 hour, 40 minutes. Still a big improvement, but not as fast as the 4070 on the Intel chip, and the 4070 has less cuda cores. The 7600X is maxing out 100% usage on 3 cores during the benchmark. To get as much data as possible, I swapped the 4070 super into the machine with the Intel i9-12900K, and it did the benchmark in the same 39 minutes as the 4070.
(I did also try swapping the 16gb of DDR5 (at 5200 mt/s) with the 32gb of DDR5 (6000 mt/s) to see if it made any difference and it didn't.)
So my current working theory is that the 7600X is just not enough processor to keep the 4070 super fed with data to crunch. But oddly, it seems like the i9-12900K might also not be fast enough to give the 4070 super enough data to beat the 4070. But if that's the case, then it seems like going with a GPU with a ton more cuda cores wouldn't actually give me any speed improvements if there's not a processor fast enough to keep it fed. That seems wrong somehow? Surely there must be a way to use all the cuda cores of a 4090, for instance, with a consumer-grade CPU.
But here's the thing: I don't know. I'm not an expert on this, and so I wanted to see what other people were using for hardware and what their experiences were with system bottlenecks. I don't have unlimited funds to just throw at this problem, otherwise I'd pop in a 9950X when it's released later this month to see if that made the 4070 super faster than the 4070.
I'm really interested in any thoughts that anyone has.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions