When selecting hardware components such as CPU, GPU, memory (RAM), and storage for deep learning tasks, it's essential to balance performance and cost-effectiveness. Let's talk about what makes a powerful workstation for deep learning. Important components of a workstation are:
Hardware Component | Description |
---|---|
CPU (Processor) |
The CPU is crucial for handling tasks such as data preprocessing, managing I/O operations, and coordinating between different hardware components like the GPU and memory. Although the GPU handles most of the heavy lifting in deep learning, the CPU's role should not be underestimated. Important aspects while choosing a CPU for Deep Learning are: ✦ `CPU cores`: More cores allow for better multitasking and parallel processing. A higher core count (6 to 16 cores) is beneficial for tasks like data loading and preprocessing that run in parallel with the training process. Higher number of cores allows more parallelization and deep Learning processes like pre-processing, batch-processing, reading in data etc are dependent on the number of CPU cores. We are going for 64 cores. Simultaneous Multithreading (SMT): Technologies like Intel’s Hyper-Threading or AMD’s SMT allow each core to handle two threads, effectively doubling the number of threads that can run simultaneously. This is helpful in scenarios where many lightweight tasks need to be handled in parallel. ✦ `Clock Speed`: How fast to crank the computations on these data are dependent on the clock speed of the CPU. Beyond 2.9 Ghz is a good speed. Priority for deep learning is number of cores over clock speed. A higher clock speed (measured in GHz) results in faster processing of individual tasks. Aim for a CPU with a clock speed of 3.0 GHz or higher. ✦ `PCI Express`: Are generally considered highway between CPU RAM and GPU RAM. PCIe 3.0 has speed of 1000 MB/s and PCIe 4.0 has speed of 2000 MB/s. PCIe 4.0 or PCIe 5.0 compatibility ensures faster data transfer rates between the CPU, GPU(s), and storage devices, reducing latency in data loading and model inference. ✦ `Cache`: Cache memory is important because it improves the efficiency of data retrieval. It stores program instructions and data that are used repeatedly in the operation of programs or information that the CPU is likely to need next. Higher Cache the better. A larger cache allows faster access to frequently used data, which can improve performance during training. Look for CPUs with at least 12 MB of cache. Here is an example of 4 core CPU: + Architecture: Modern architectures (e.g., Intel Core i7/i9, AMD Ryzen 7/9) offer better power efficiency and performance. Choosing the latest generation can provide better optimization for AI workloads. Modern architectures (e.g., AMD Zen 4, Intel Alder Lake) offer better performance per watt and more efficient processing per core, improving computation efficiency for deep learning. Support for Vectorized Instructions: CPUs that support SIMD (Single Instruction, Multiple Data) instructions, such as AVX2, AVX-512 (for Intel), or AMD’s equivalent, are better for deep learning. These instructions accelerate matrix operations and other vectorized calculations commonly used in neural networks. Thermal Design Power (TDP): CPUs with high TDP ratings often deliver better performance but require more robust cooling solutions. Efficient power management becomes crucial if the system is intended for continuous operation in a workstation environment. Recommendations:
We will go for the Threadripper Pro 5000 series instead of 3000 series. Check for yourself: ![]() New: Threadripper PRO 7000 WX-Series | AMD Ryzen™ 9 7950X3D | Intel® Core™ i9 processor 14900K. ![]() |
GPU |
The ![]() GPUs provides thousands of additional cores (CUDA cores / Tensor cores) for fast computation and parallelization. NVIDIA is currently leading the GPU market with their commercial GPU series (GeForce) and professional GPU series (RTX) along CUDA and cuDNN deep learning ecosystem. GPUs follow a programming model called single-instruction-multiple-threads (SIMT), where the same instruction executes concurrently on different cores/threads, each on its own portion of data as dictated by its assigned thread ID. All cores run the threads synchronously in lock-step, which greatly simplifies the control flow, and works great for domains like dense linear algebra, which neural network applications heavily rely on.
GPU Memory : VRAM + GDDR6 (DRAM). GPU has capacitors that regulate the voltage to various components and PCIe bus connects to CPU. GDDR is GDDR SDRAM - Graphics Double Data Rate. For those scaling up, multi-GPU setups are valuable. NVIDIA GPUs support NVLink, which enhances data transfer speeds between GPUs. This feature is particularly useful for large-scale models and distributed deep learning.
Comercial GPU (GeForce) : NVIDIA GeForce 4090 : The RTX 4090 is NVIDIA's latest consumer-grade powerhouse based on the Ada Lovelace architecture. With 24 GB of GDDR6X memory, massive CUDA core counts, and high clock speeds, the RTX 4090 is well-suited for advanced deep learning, 3D rendering, and video processing. However, the RTX 4090 does not support NVLink. Unlike previous models, NVIDIA removed NVLink support from the Ada Lovelace architecture, including the RTX 4090, which means it cannot be linked with another GPU for direct memory sharing. This is a shift from previous generations like the RTX 3090, which did support NVLink. For users needing multi-GPU configurations with NVLink, the A100, H100, or certain Quadro models remain options NVIDIA GeForce 3090 : Remember 3090 is the GPU and there comes a variety of graphics card with 3090 GPU from different manufacturers. Here is a cooling efficiency chart of different graphics card with 3090 GPU: ![]() Asus Strix Quiet 390W seems to be a good and quiet commercial graphics card with 3090 GPU. [ Update : NVIDIA GeForce - 40 series GPUs, lambdalabs/gpu-benchmarks ]. Professional GPU (RTX) : NVIDIA RTX A5500
New: GeForce RTX 4090, NVIDIA Blackwell. |
[ graphics-cards ] ![]() Professional AI NVIDIA Workstations: ![]()
|
Memory (RAM) |
Considering 64 cores, it is wise to have 4 GB memory per core and that takes us to 64 x 4 = 256 GB RAM. If we go for 32 cores then 32 x 4 = 128 GB RAM. ECC memory will protect our system from a potential crash by correcting any errors in the data, while non-ECC memory doesn't correct such errors. We will go for DDR4 due to DDR4 slots in motherboard.
|
Storage | We need SSD! Choice : M.2 SSD via NVMe. SSD comes in 2 size : 2.5 " SSD uses SATA interface and M.2 SSD are plugged into an M.2 SLOT. M.2 NVMe SSD are good storage devices.
External Storage : NAS RAID 5 calculation: Total: 64 TB; Available: 48 TB; for protection: 16 TB. Extra information:
|
If you wish to build your own system from scratch, watch this video. We also have the option to order pre-built workstations or configure deep learning workstations from Lambda, Exxact, Puget, Bizon, Mifcom.de etc.
Extra Tips:
- Enable XMP profile in BIOS/UEFI to leverage memory's full speed.
- Good coolers : Air cooler - ID-COOLING SE ARGB & Liquid cooler - Hydro Series™ H150i PRO RGB 360mm. Good rule of thumb while installing AIO, air pressure inside equals air pressure outside. In air cooling, air goes from the direction fan faces to its back. Place the fans in such a way that the back air goes through GPU and have atleast 2 fans = 1 for intake and the other for exhaust.
- PSU cables: 24 pin ATX cable (power to the entire motherboard), EPS (power to CPU socket), PCIe (power to graphics card).
- How they work (video) : SSD, M.2 NVMe SSD, Computer Memory, GPU.
I am currently working in a 4090 D [2025 update]:
Extra Tools: power supply calculator, windows 10 media tool, AMD drivers, NVIDIA drivers