Why is Cosine Similarity Scaled for Zero-Shot Image Classifcation? #763
-
Hi All, I have a simple question based on the zero-shot image classification provided in the README. Why is the Cosine Similarity multiplied by 100? Is to reverse the normalization done in the previous steps? The line I am referring to is pasted below. Thank you very much in advance! text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @rsorbello. The 100 comes from the learnable logit scaling parameter used in the original paper, which they clip at 100. In general many models have a |
Beta Was this translation helpful? Give feedback.
Hi @rsorbello. The 100 comes from the learnable logit scaling parameter used in the original paper, which they clip at 100. In general many models have a
logit_scale
which is learned during training and used to scale logits