Pretraining Cuda Out of Memory Issue

I have a device containing 4 Nvidia L40 GPUs. I am trying to use the full_finetune_distributed llama3_1/8B_full recipe. My configuration for dataset in the config file is given below:
 dataset:
  _component_: torchtune.datasets.text_completion_dataset
  source: "text"
  column: "text"
  packed: false
  split: "train"
  data_files: "pretrain-data-batch1-quartered/*.txt"

M
The data is all txt files. Initially I had planned to use 256M tokens to start the pretraining job but I got the Cuda Out Of Memory error. I have now reduced my files to 1/4th and still I am getting the same error for both on full_finetune_distributed as well as on lora_finetune_distributed as well. 
I have also reduced my batch size to 1. Still no success. 
I have following questions in my mind:
- **Is this error because of some issue with how I have set up the data?**  My data is all .txt files now about 5k txt files in a single folder. with above config in the yaml file. 
- If I have the files properly set up, is this because my resources are not sufficient? How much resources will I need to pretrain  using either full or lora based recepie? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pretraining Cuda Out of Memory Issue #1932

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pretraining Cuda Out of Memory Issue #1932

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions