Skip to content

Why is Cosine Similarity Scaled for Zero-Shot Image Classifcation? #763

Answered by gabrielilharco
rsorbello asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @rsorbello. The 100 comes from the learnable logit scaling parameter used in the original paper, which they clip at 100. In general many models have a logit_scale which is learned during training and used to scale logits

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@rsorbello
Comment options

@rwightman
Comment options

Answer selected by rsorbello
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants