larger model did not achieve better performance #14

WalkerRusher · 2025-01-16T10:29:34Z

Thanks for your inspiring work.

I train the basic model and the lager model (i.e., the medium model with over 400M parameters) using the following scripts:

accelerate launch train_gpt.py
--exp_name bair_llama_ft --output_dir log_trm --seed 0 --mixed_precision bf16
--vqgan_type ctx_vqgan
--pretrained_model_name_or_path {log directory of finetuned tokenizer}/unwrapped_model
--config_name configs/llama/config.json --load_internal_llm --action_conditioned --action_dim 4
--pretrained_transformer_path pretrained_models/ivideogpt-oxe-64-act-free/transformer
--per_device_train_batch_size 16 --gradient_accumulation_steps 1
--learning_rate 1e-4 --lr_scheduler_type cosine
--oxe_data_mixes_type bair --resolution 64 --dataloader_num_workers 16
--video_stepsize 1 --segment_length 16 --context_length 1
--use_eval_dataset --use_fvd --use_frame_metrics
--weight_decay 0.01 --llama_attn_drop 0.1 --embed_no_wd
--max_train_steps 100005

I modify the "pretrained_model_name_or_path" & "config_name" and "pretrained_transformer_path" to train models with different sizes.

However, larger model did not achieve better performance.

The results are as follows:
for basic model (100M), the evaluation results are:
{'eval/eval_loss': 1.9217126369476318, 'eval/perplexity': 6.832650303859652, 'eval/mse': 0.004142001271247864, 'eval/fvd': 58.050864571721064, 'eval/psnr': 24.546920776367188, 'eval/ssim': 0.9035875797271729, 'eval/lpips': 0.04944797605276108}

for larger model(400M), the details are:
{'eval/eval_loss': 2.080933094024658, 'eval/perplexity': 8.011941322098217, 'eval/mse': 0.0048113251104950905, 'eval/fvd': 60.50564319177382, 'eval/psnr': 23.930469512939453, 'eval/ssim': 0.8932170867919922, 'eval/lpips': 0.0540812723338604}

Did I miss anything?

Any suggestions would be deeply appreciated!

Manchery · 2025-01-17T02:07:25Z

We are very grateful for your contribution to evaluating the larger model on downstream tasks, which was previously under-explored by us.

Regarding your situation, in my experience, iVideoGPT is prone to overfitting on BAIR. Are the above evaluation results from the last checkpoint or the checkpoint with the smallest eval loss (i.e., with an early stop)?

By the way, did you observe if the downstream training procedure benefits from larger models? E.g., lower training loss starting point, faster loss descent rate?

WalkerRusher · 2025-01-17T02:58:15Z

Regarding your situation, in my experience, iVideoGPT is prone to overfitting on BAIR. Are the above evaluation results from the last checkpoint or the checkpoint with the smallest eval loss (i.e., with an early stop)?
both are evaluated using the last checkpoint.

did you observe if the downstream training procedure benefits from larger models?
The training loss results are as follows, where green line means larger model.

WalkerRusher · 2025-01-17T03:01:04Z

As shown in the above figure, some spikes happened while training larger model.

Manchery · 2025-01-17T03:18:56Z

I see, the problem seems that you experienced a numerical precision issue. I recommend you try turning off mixed precision training by --mixed_precision no. Note that this may need more GPU memory.

WalkerRusher · 2025-01-17T04:49:56Z

Thanks for you suggestion. I will try it and feedback as soon as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

larger model did not achieve better performance #14

larger model did not achieve better performance #14

WalkerRusher commented Jan 16, 2025

Manchery commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025

Manchery commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025

larger model did not achieve better performance #14

larger model did not achieve better performance #14

Comments

WalkerRusher commented Jan 16, 2025

Manchery commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025

Manchery commented Jan 17, 2025

WalkerRusher commented Jan 17, 2025