You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your excellent work and contributions to the community!
I was wondering if you have any plans to release the code or provide guidance on how to use OpenVCLIP to extract features and combine them with AWT for zero-shot video recognition?
Best regards
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work! AWT comprises three key components: augment, weight, and transportation. The only difference between zero-shot image classification and video classification lies in the augmentation step. For videos, in addition to randomly cropped and flipped images, frames retrieved from different video timestamps are also used.
You can download the Open-VCLIP pre-trained checkpoint and directly perform inference. The only manual effort required is organizing the image features of each video in the specified format, as outlined here. Once organized, you can use AWT_zero_shot/evaluate.py for AWT inference.
Thank you for your excellent work and contributions to the community!
I was wondering if you have any plans to release the code or provide guidance on how to use OpenVCLIP to extract features and combine them with AWT for zero-shot video recognition?
Best regards
The text was updated successfully, but these errors were encountered: