missing docuementation for encode for image embedding models #3118

KennethEnevoldsen · 2024-12-04T20:25:32Z

I can't seem to find the documentation for encode when encoding images:

model = SentenceTransformer('clip-ViT-B-32') #Load CLIP model
img_emb = model.encode(Image.open('two_dogs_in_snow.jpg')) # no documentation for this step

I am asking for this because we want to make a compatible interface for image embeddings for mteb.

We are also working on the multimodal interface (e.g. for models like https://huggingface.co/TIGER-Lab/VLM2Vec-Full).

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-12-05T16:39:09Z

Hello!

Indeed, this is not documented very nicely because I'm considering deprecating the current CLIPModel module in favor of making the much more common Transformer module multimodal.

I did some experiments with this today, and I think there's potential. We would move towards AutoProcessing instead of AutoTokenizer. We can then feed the tokenizer/processor/feature extractor, etc., with whatever inputs the user has, and then feed that directly into the model.

We do then have to be careful what the model returns. For text-based models, we always grab the last_hidden_state and then do Pooling in a separate pooler module, but with multi-modal systems (CLIP, CLAP) it seems to be more common to rely on the model's own pooling. This certainly simplifies things as we otherwise have to feed multiple token/patch embeddings to the pooler, sometimes even with different dimensionalities, etc.

I have to be quite wary as I rely on transformers fully here.

Either way, the interface will always remain the same, regardless of how it's implemented behind the scenes, and your snippet is correct, you can pass PIL.Image instances to model.encode.

Tom Aarsen

KennethEnevoldsen mentioned this issue Dec 4, 2024

Discussing a standard for ImageEncoders embeddings-benchmark/mteb#1551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing docuementation for encode for image embedding models #3118

missing docuementation for encode for image embedding models #3118

KennethEnevoldsen commented Dec 4, 2024 •

edited

Loading

tomaarsen commented Dec 5, 2024

missing docuementation for encode for image embedding models #3118

missing docuementation for encode for image embedding models #3118

Comments

KennethEnevoldsen commented Dec 4, 2024 • edited Loading

tomaarsen commented Dec 5, 2024

KennethEnevoldsen commented Dec 4, 2024 •

edited

Loading