[BUG] OOM errors during fine-tuning with Polyglot 12.8B models #330

yum-yeom · 2023-08-02T00:34:34Z

hello. I'm trying to train Polyglot 12.8B model using h2o llm-studio, but I'm getting an OOM error.

The GPU I am using is Nvidia A5000 24GB 4 sheets, and the params I used are as follows zip file.

Is there any solution?

pascal-pfeiffer · 2023-08-02T06:17:31Z

There are multiple ways to bring down memory consumption.
Mainly:

There is also an ongoing effort to split the model weights across multiple GPUs during training in this PR: #288

yum-yeom added the type/bug Bug in code label Aug 2, 2023

yum-yeom closed this as completed Aug 4, 2023

yum-yeom reopened this Aug 4, 2023

psinger closed this as completed Aug 18, 2023

Provide feedback