Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

larger model did not achieve better performance #14

Open
WalkerRusher opened this issue Jan 16, 2025 · 5 comments
Open

larger model did not achieve better performance #14

WalkerRusher opened this issue Jan 16, 2025 · 5 comments

Comments

@WalkerRusher
Copy link

Thanks for your inspiring work.

I train the basic model and the lager model (i.e., the medium model with over 400M parameters) using the following scripts:

accelerate launch train_gpt.py
--exp_name bair_llama_ft --output_dir log_trm --seed 0 --mixed_precision bf16
--vqgan_type ctx_vqgan
--pretrained_model_name_or_path {log directory of finetuned tokenizer}/unwrapped_model
--config_name configs/llama/config.json --load_internal_llm --action_conditioned --action_dim 4
--pretrained_transformer_path pretrained_models/ivideogpt-oxe-64-act-free/transformer
--per_device_train_batch_size 16 --gradient_accumulation_steps 1
--learning_rate 1e-4 --lr_scheduler_type cosine
--oxe_data_mixes_type bair --resolution 64 --dataloader_num_workers 16
--video_stepsize 1 --segment_length 16 --context_length 1
--use_eval_dataset --use_fvd --use_frame_metrics
--weight_decay 0.01 --llama_attn_drop 0.1 --embed_no_wd
--max_train_steps 100005

I modify the "pretrained_model_name_or_path" & "config_name" and "pretrained_transformer_path" to train models with different sizes.

However, larger model did not achieve better performance.

The results are as follows:
for basic model (100M), the evaluation results are:
{'eval/eval_loss': 1.9217126369476318, 'eval/perplexity': 6.832650303859652, 'eval/mse': 0.004142001271247864, 'eval/fvd': 58.050864571721064, 'eval/psnr': 24.546920776367188, 'eval/ssim': 0.9035875797271729, 'eval/lpips': 0.04944797605276108}

for larger model(400M), the details are:
{'eval/eval_loss': 2.080933094024658, 'eval/perplexity': 8.011941322098217, 'eval/mse': 0.0048113251104950905, 'eval/fvd': 60.50564319177382, 'eval/psnr': 23.930469512939453, 'eval/ssim': 0.8932170867919922, 'eval/lpips': 0.0540812723338604}

Did I miss anything?

Any suggestions would be deeply appreciated!

@Manchery
Copy link
Collaborator

We are very grateful for your contribution to evaluating the larger model on downstream tasks, which was previously under-explored by us.

Regarding your situation, in my experience, iVideoGPT is prone to overfitting on BAIR. Are the above evaluation results from the last checkpoint or the checkpoint with the smallest eval loss (i.e., with an early stop)?

By the way, did you observe if the downstream training procedure benefits from larger models? E.g., lower training loss starting point, faster loss descent rate?

@WalkerRusher
Copy link
Author

Regarding your situation, in my experience, iVideoGPT is prone to overfitting on BAIR. Are the above evaluation results from the last checkpoint or the checkpoint with the smallest eval loss (i.e., with an early stop)?
both are evaluated using the last checkpoint.

did you observe if the downstream training procedure benefits from larger models?
The training loss results are as follows, where green line means larger model.
Image

@WalkerRusher
Copy link
Author

As shown in the above figure, some spikes happened while training larger model.

@Manchery
Copy link
Collaborator

I see, the problem seems that you experienced a numerical precision issue. I recommend you try turning off mixed precision training by --mixed_precision no. Note that this may need more GPU memory.

@WalkerRusher
Copy link
Author

Thanks for you suggestion. I will try it and feedback as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants