You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Thanks again for this interesting model. I tested the demo notebook file on a small custom dataset.
(tc_clip_model_path = "pretrained/zero_shot_k400_llm_tc_clip.pth" # pretrained model path)
And I'm encountering an issue where TC-CLIP misclassifies videos that do not belong to any of the defined action classes. For example, I added a neutral video of a puppy (completely unrelated to the action classes) to my dataset, which consists of the following classes:
stealing
robbery
violence
Despite the video's irrelevance, the model assigns it a label (stealing) based on the highest logit value. {'stealing': 24.42, 'robbery': 22.50, 'violence': 23.47}
This behavior is problematic because it suggests that the model always outputs one of the predefined classes, even when the input does not fit any of them.
What I Tried
Rejection Threshold: I implemented a threshold to reject predictions where the highest logit is below a certain value. However, this approach did not generalize well and led to poor performance when legitimate action videos had logits close to the threshold.
Neutral Class: I added an "other" class. Yet, this approach was not efficient:
Expected Behavior
The model should ideally: Provide an "unknown" or "no action" output for videos that do not belong to any defined class.
Avoid forcing a prediction into one of the predefined classes when the input is irrelevant.
Could you please provide guidance or suggest strategies to handle out-of-distribution inputs effectively in TC-CLIP?
The text was updated successfully, but these errors were encountered:
Hi @ooza, Thank you so much for your interest in our work! 🤗
This is an expected behavior, as our model wasn't explicitly trained to reject OOD samples as an <unknown> class. Simply adding an "other" class might not help, as it will assign the video embeddings closely to the text embeddings, with the semantics of the word "other" (literally).
The most straightforward way to achieve the goal of OOD detection is to fine-tune our model with tasks like anomaly detection and teach it to reject OOD samples. If you want to achieve this goal without fine-tuning, you can try using an "easy and familiar" bag of words seen during the K400 training as a criterion for the neutral classes.
Another minor suggestion is changing the weight averaging ratio $wise_ft. If you think your data distribution is fairly similar to the training source distribution (in our case, kinetics400), increasing $wise_ft might help. For more details, refer to Appendix E in our paper.
Hi
Thanks again for this interesting model. I tested the demo notebook file on a small custom dataset.
(
tc_clip_model_path = "pretrained/zero_shot_k400_llm_tc_clip.pth" # pretrained model path
)And I'm encountering an issue where TC-CLIP misclassifies videos that do not belong to any of the defined action classes. For example, I added a neutral video of a puppy (completely unrelated to the action classes) to my dataset, which consists of the following classes:
Despite the video's irrelevance, the model assigns it a label (stealing) based on the highest logit value.
{'stealing': 24.42, 'robbery': 22.50, 'violence': 23.47}
This behavior is problematic because it suggests that the model always outputs one of the predefined classes, even when the input does not fit any of them.
What I Tried
Rejection Threshold: I implemented a threshold to reject predictions where the highest logit is below a certain value. However, this approach did not generalize well and led to poor performance when legitimate action videos had logits close to the threshold.
Neutral Class: I added an "other" class. Yet, this approach was not efficient:
Expected Behavior
The model should ideally: Provide an "unknown" or "no action" output for videos that do not belong to any defined class.
Avoid forcing a prediction into one of the predefined classes when the input is irrelevant.
Could you please provide guidance or suggest strategies to handle out-of-distribution inputs effectively in TC-CLIP?
The text was updated successfully, but these errors were encountered: