Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Question on UCF101 Accuracy #1

Open
hubaak opened this issue Oct 30, 2024 · 2 comments
Open

A Question on UCF101 Accuracy #1

hubaak opened this issue Oct 30, 2024 · 2 comments

Comments

@hubaak
Copy link

hubaak commented Oct 30, 2024

In the paper, an image-net pretrained resnet18 model can achieve a score of 77.2 with only RGB modality. However, there is no code for UCF101 in the repo. I tried to train a resnet18 according to the settings in the paper and its accuracy is 0.43 with a setting of (batch_size, lr, epoch) = (32, 1E-3, 800). So I'm confused by such a performance gap. Can you provide some implementation details or the code for UCF101?
image
BTW, 3D resnet18 with a lot of tricks has a score of 74.1 in https://arxiv.org/pdf/2103.05905v2, so I think it's a little bit wield a resnet18 with only RGB modality to achieve a performance that easily.

@echo0409
Copy link
Collaborator

echo0409 commented Nov 6, 2024

Thank you for the question.

Here are our settings:  batch size=64, lr=1e-4,scheduler = step_LR, step=40, decay_ratio=0.1, optimizer = sgd, weiht_decay = 1e-4

We use imagenet pre-trained ResNet18 as backbone.
For RGB modality, we evenly pick 3 frames for each sample.
For optical flow modality, we stack the horizontal vector u and vertical vector v in the way of [u,v,u] to form three channels as one frame and select 3 frames in total.

@hubaak
Copy link
Author

hubaak commented Nov 6, 2024

Thank you for the question.

Here are our settings:  batch size=64, lr=1e-4,scheduler = step_LR, step=40, decay_ratio=0.1, optimizer = sgd, weiht_decay = 1e-4

We use imagenet pre-trained ResNet18 as backbone. For RGB modality, we evenly pick 3 frames for each sample. For optical flow modality, we stack the horizontal vector u and vertical vector v in the way of [u,v,u] to form three channels as one frame and select 3 frames in total.

Thanks a lot for providing your settings! I'll try this again with the setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants