Skip to content

Pretraining Cuda Out of Memory Issue #1932

@muniefht

Description

@muniefht

I have a device containing 4 Nvidia L40 GPUs. I am trying to use the full_finetune_distributed llama3_1/8B_full recipe. My configuration for dataset in the config file is given below:
dataset:
component: torchtune.datasets.text_completion_dataset
source: "text"
column: "text"
packed: false
split: "train"
data_files: "pretrain-data-batch1-quartered/*.txt"

M
The data is all txt files. Initially I had planned to use 256M tokens to start the pretraining job but I got the Cuda Out Of Memory error. I have now reduced my files to 1/4th and still I am getting the same error for both on full_finetune_distributed as well as on lora_finetune_distributed as well.
I have also reduced my batch size to 1. Still no success.
I have following questions in my mind:

  • Is this error because of some issue with how I have set up the data? My data is all .txt files now about 5k txt files in a single folder. with above config in the yaml file.
  • If I have the files properly set up, is this because my resources are not sufficient? How much resources will I need to pretrain using either full or lora based recepie?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions