Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

tgcandido · 2024-11-03T21:23:30Z

Description:
I'm fine-tuning a VideoMAE model for binary classification on home camera footage to distinguish between two actions. Here’s a summary of my setup and the issues I’m facing:

Dataset & Variations:
I have two primary datasets:

Small Dataset: ~120 clips for quicker iteration.
Full Dataset: ~3k clips.
All videos are 6 seconds long, though I've also tested with 3-second clips.
I've also created variations with blurred or blacked-out backgrounds to help with recognition.

Model & Configuration:
The model classifies actions using 16 uniformly sampled frames per video.
I’ve tried various base models, including small, base, large, and models fine-tuned on SSV2 and Kinect.
Hyperparameters tested:
Batch sizes of 2, 4, and 8.
Epochs ranging from 4 to 16.
Learning rate set to 5e-5.

I removed the RandomCrop transformation since it entirely removes the person from the video.

I'm using the Hugging Face Video Classification Colab Notebook as a starting point: Training Notebook.

Problem: Despite these variations, the model overfits immediately. I’ve also tested using the UCF101 dataset to rule out dataset-specific issues and got similar results to the Hugging Face VideoMAE colab, so the code seems fine.

Request: Any advice on addressing this overfitting issue would be greatly appreciated. Specifically, I'm looking for guidance on:

Additional hyperparameter adjustments.
Potential model architecture changes (if applicable).
Dataset augmentation techniques that might improve generalization.

Thank you for any help or insights you can provide!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

tgcandido commented Nov 3, 2024

Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

Overfitting in VideoMAE Model Fine-Tuning for Binary Classification on Home Camera Footage #129

Comments

tgcandido commented Nov 3, 2024