VRAM requrements #57

dp289m · 2024-03-18T04:20:44Z

dp289m
Mar 18, 2024

What are the GPU RAM requrements please?
Can it be ported to CPU maybe?

Congratulations to all of us. Cautiously curious what will come out of this...

Konard · 2024-03-18T06:16:04Z

Konard
Mar 18, 2024

I had created an issue about hardware requirements, maybe it will be possible to get answers there: #62

At the moment as it have to use ~300 GB model, looks like it is unlikely that it will work on CPU without changes to code and to the model, it also stated that it requires multi-GPU machine, and I guess more than 300 GB VRAM will be required.

4 replies

ChainsDW Mar 18, 2024

damn!! whatever, it's great!!

yxc0915 Mar 18, 2024

OMG，300G！！i only have 24gb vram......😭😭😭

1511878618 Mar 18, 2024

only 4060, ash.........

AdaptiveStep Mar 19, 2024

I bet you could run it on the cpu, at a speed of 1token/day :D

goeielewe · 2024-03-18T07:53:21Z

goeielewe
Mar 18, 2024

https://aws.amazon.com/ec2/instance-types/p5/

2 replies

elysiumplain Mar 18, 2024

Sales reps crawling GH discussion pages now? \s

goeielewe Mar 19, 2024

👀

vsbn · 2024-03-18T21:40:12Z

vsbn
Mar 18, 2024

I know this is obvious, but I have tried this on my 48 GB modified VRAM 3090, and it did not have enough vram.

0 replies

Whytehorse2022 · 2024-03-18T21:44:56Z

Whytehorse2022
Mar 18, 2024

Running a large language model (LLM) with the specifications you provided would require substantial hardware resources. Here's an overview of the hardware requirements you might need:

Graphics Processing Units (GPUs): LLMs of this scale typically require multiple high-end GPUs to handle the computational load. You would likely need at least 8-16 GPUs with a minimum of 32-40 GB of VRAM each, such as NVIDIA A100 or H100 GPUs. These GPUs would need to be interconnected using high-bandwidth interconnects like NVLink or NVSwitch.
CPU: You would need a powerful CPU or multiple CPUs to handle data preprocessing, input/output operations, and other auxiliary tasks. A modern server-grade CPU with at least 32 cores and 64 threads would be recommended, such as AMD EPYC or Intel Xeon Scalable processors.
Memory (RAM): With 314B parameters and a 6,144 embedding size, you would need a significant amount of RAM to store the model weights and intermediate activations. Depending on the specific implementation and optimization techniques, you might require at least 1-2 TB of high-performance DDR4 or DDR5 RAM.
Storage: You would need a high-speed storage system, such as NVMe SSDs or a distributed file system, to store the model checkpoints and data. The storage requirements would depend on the number of checkpoints you plan to store and the amount of data you need to process, but you should plan for at least several terabytes of storage.
Networking: If you plan to distribute the model across multiple GPUs or machines, you would need a high-speed network infrastructure, such as 100 Gbps Ethernet or InfiniBand, to facilitate communication between the devices.
Power and Cooling: Running multiple high-end GPUs and CPUs will generate a significant amount of heat, so you would need a robust power supply and cooling system to ensure stable operation.

It's worth noting that running an LLM of this scale is a significant undertaking and would likely require a dedicated high-performance computing (HPC) cluster or a cloud-based solution from providers like AWS, Google Cloud, or Microsoft Azure. Additionally, you might need to consider techniques like model parallelism, tensor parallelism, or pipeline parallelism to distribute the workload across multiple devices effectively.

5 replies

TechGym Mar 19, 2024

Really appreciate the details you provided.

AdaptiveStep Mar 19, 2024

There has to be some smarter way to cache the calculations, so it won't go so slow.. Can't we just put a bunch of calculations on 100TB of nvme M2s? And thereby dilute the model into something more manageable?

Whytehorse2022 Mar 19, 2024

Yes you can take a large model and make it smaller but first you have to be able to run it. You could probably run it very slowly using virtual shared memory. It seems like lots of the bugs reported were people exceeding /dev/shm(shared memory). Maybe just add a ton of disk space to /dev/shm?

CorniiDog Apr 14, 2024

At that rate, why not allow Grok-1 to cache onto VRAM while the rest uses dedicated RAM? That would significantly be more cost effective would it not?

StefanOjanen May 9, 2024

You don't split between VRAM and RAM because you'll spend all of your compute time just waiting for data to move around. This (time it takes to update weights stored across all the GPUs) is the general bottleneck in multi-node GPU clusters in the first place. To run such a model on a "home PC", feasible setup would be like a Threadripper CPU with enough RAM to store the whole model there and forget about GPUs altogether.

StefanOjanen · 2024-03-18T22:40:43Z

StefanOjanen
Mar 18, 2024

grok-1 GPU memory requirements
VRAM for inference: 314 GB
VRAM for training: 41270 GB
VRAM for fine-tuning (LoRa): 39389 GB
VRAM for fine-tuning (QLoRa): 27573 GB

Metrics Training | Memory in Gigabyte
Memory Consumption Training | 41270
Memory Using Full Activations Recomputation | 3612

Metrics Inference | Memory in Gigabyte
Memory Consumption Inference | 314

Finetuning Configuration |
r_lora | 64

Metrics Finetuning | Memory in Gigabyte
LoRa | 39389
QLoRa | 27573

3 replies

PenutChen Mar 19, 2024

Are you inference in 8k sequence length?

TechGym Mar 19, 2024

Sounds more reachable than what Why... said.

StefanOjanen Apr 8, 2024

Grok-1 has 8k context window, only Grok-1.5 extended this to 128k

dp289m · 2024-03-20T03:08:23Z

dp289m
Mar 20, 2024
Author

So even RTX 4090 owners can not yet run GROK locally. Wonder how far can optimization go without considerable loss of quality...

1 reply

MrRaja23 Mar 20, 2024

It will take one or more weeks before we get a locally inferable grok-1 (Assuming they Prune and Quantize it without to much loss which any loss could be recovered via Finetuning hopefully) overall wishful thinking :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM requrements #57

{{title}}

Replies: 6 comments 15 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

VRAM requrements #57

Replies: 6 comments · 15 replies

dp289m Mar 20, 2024 Author

Replies: 6 comments 15 replies

dp289m
Mar 20, 2024
Author