-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
larger model did not achieve better performance #14
Comments
We are very grateful for your contribution to evaluating the larger model on downstream tasks, which was previously under-explored by us. Regarding your situation, in my experience, iVideoGPT is prone to overfitting on BAIR. Are the above evaluation results from the last checkpoint or the checkpoint with the smallest eval loss (i.e., with an early stop)? By the way, did you observe if the downstream training procedure benefits from larger models? E.g., lower training loss starting point, faster loss descent rate? |
As shown in the above figure, some spikes happened while training larger model. |
I see, the problem seems that you experienced a numerical precision issue. I recommend you try turning off mixed precision training by |
Thanks for you suggestion. I will try it and feedback as soon as possible. |
Thanks for your inspiring work.
I train the basic model and the lager model (i.e., the medium model with over 400M parameters) using the following scripts:
accelerate launch train_gpt.py
--exp_name bair_llama_ft --output_dir log_trm --seed 0 --mixed_precision bf16
--vqgan_type ctx_vqgan
--pretrained_model_name_or_path {log directory of finetuned tokenizer}/unwrapped_model
--config_name configs/llama/config.json --load_internal_llm --action_conditioned --action_dim 4
--pretrained_transformer_path pretrained_models/ivideogpt-oxe-64-act-free/transformer
--per_device_train_batch_size 16 --gradient_accumulation_steps 1
--learning_rate 1e-4 --lr_scheduler_type cosine
--oxe_data_mixes_type bair --resolution 64 --dataloader_num_workers 16
--video_stepsize 1 --segment_length 16 --context_length 1
--use_eval_dataset --use_fvd --use_frame_metrics
--weight_decay 0.01 --llama_attn_drop 0.1 --embed_no_wd
--max_train_steps 100005
I modify the "pretrained_model_name_or_path" & "config_name" and "pretrained_transformer_path" to train models with different sizes.
However, larger model did not achieve better performance.
The results are as follows:
for basic model (100M), the evaluation results are:
{'eval/eval_loss': 1.9217126369476318, 'eval/perplexity': 6.832650303859652, 'eval/mse': 0.004142001271247864, 'eval/fvd': 58.050864571721064, 'eval/psnr': 24.546920776367188, 'eval/ssim': 0.9035875797271729, 'eval/lpips': 0.04944797605276108}
for larger model(400M), the details are:
{'eval/eval_loss': 2.080933094024658, 'eval/perplexity': 8.011941322098217, 'eval/mse': 0.0048113251104950905, 'eval/fvd': 60.50564319177382, 'eval/psnr': 23.930469512939453, 'eval/ssim': 0.8932170867919922, 'eval/lpips': 0.0540812723338604}
Did I miss anything?
Any suggestions would be deeply appreciated!
The text was updated successfully, but these errors were encountered: