-
Couldn't load subscription status.
- Fork 680
Description
I have a device containing 4 Nvidia L40 GPUs. I am trying to use the full_finetune_distributed llama3_1/8B_full recipe. My configuration for dataset in the config file is given below:
dataset:
component: torchtune.datasets.text_completion_dataset
source: "text"
column: "text"
packed: false
split: "train"
data_files: "pretrain-data-batch1-quartered/*.txt"
M
The data is all txt files. Initially I had planned to use 256M tokens to start the pretraining job but I got the Cuda Out Of Memory error. I have now reduced my files to 1/4th and still I am getting the same error for both on full_finetune_distributed as well as on lora_finetune_distributed as well.
I have also reduced my batch size to 1. Still no success.
I have following questions in my mind:
- Is this error because of some issue with how I have set up the data? My data is all .txt files now about 5k txt files in a single folder. with above config in the yaml file.
- If I have the files properly set up, is this because my resources are not sufficient? How much resources will I need to pretrain using either full or lora based recepie?