Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misclassification of Out-of-Domain Videos in TC-CLIP #7

Open
ooza opened this issue Dec 4, 2024 · 1 comment
Open

Misclassification of Out-of-Domain Videos in TC-CLIP #7

ooza opened this issue Dec 4, 2024 · 1 comment

Comments

@ooza
Copy link

ooza commented Dec 4, 2024

Hi
Thanks again for this interesting model. I tested the demo notebook file on a small custom dataset.
(tc_clip_model_path = "pretrained/zero_shot_k400_llm_tc_clip.pth" # pretrained model path)
And I'm encountering an issue where TC-CLIP misclassifies videos that do not belong to any of the defined action classes. For example, I added a neutral video of a puppy (completely unrelated to the action classes) to my dataset, which consists of the following classes:

  • stealing
  • robbery
  • violence

Despite the video's irrelevance, the model assigns it a label (stealing) based on the highest logit value.
{'stealing': 24.42, 'robbery': 22.50, 'violence': 23.47}
This behavior is problematic because it suggests that the model always outputs one of the predefined classes, even when the input does not fit any of them.

What I Tried

  • Rejection Threshold: I implemented a threshold to reject predictions where the highest logit is below a certain value. However, this approach did not generalize well and led to poor performance when legitimate action videos had logits close to the threshold.

  • Neutral Class: I added an "other" class. Yet, this approach was not efficient:

Screenshot 2024-12-04 at 12 05 52

Expected Behavior
The model should ideally: Provide an "unknown" or "no action" output for videos that do not belong to any defined class.
Avoid forcing a prediction into one of the predefined classes when the input is irrelevant.

Could you please provide guidance or suggest strategies to handle out-of-distribution inputs effectively in TC-CLIP?

@byminji
Copy link

byminji commented Dec 9, 2024

Hi @ooza, Thank you so much for your interest in our work! 🤗

This is an expected behavior, as our model wasn't explicitly trained to reject OOD samples as an <unknown> class. Simply adding an "other" class might not help, as it will assign the video embeddings closely to the text embeddings, with the semantics of the word "other" (literally).

The most straightforward way to achieve the goal of OOD detection is to fine-tune our model with tasks like anomaly detection and teach it to reject OOD samples. If you want to achieve this goal without fine-tuning, you can try using an "easy and familiar" bag of words seen during the K400 training as a criterion for the neutral classes.

Another minor suggestion is changing the weight averaging ratio $wise_ft. If you think your data distribution is fairly similar to the training source distribution (in our case, kinetics400), increasing $wise_ft might help. For more details, refer to Appendix E in our paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants