Add support for export SigLIP models #1897

aliencaocao · 2024-06-06T16:05:59Z

Feature request

Add support for export SigLIP models

Motivation

As used by many SOTA VLMs, SigLIP is gaining traction and supporting it can be the step 1 to supporting many VLMs.

Your contribution

Not at the moment

aliencaocao · 2024-06-06T16:10:31Z

Hi @xenova I see that you have done it already in https://huggingface.co/Xenova/siglip-large-patch16-384, may I know how did you export it since it is not supported in Optimum yet?

xenova · 2024-06-06T23:43:51Z

Here are my custom configs: https://github.com/xenova/transformers.js/blob/main/scripts/extra/siglip.py. Hope that helps!

aliencaocao · 2024-06-07T16:42:40Z

thanks. do you know if this can be used with HF pipeline?

xenova · 2024-06-07T23:37:46Z

The python library? Not too sure. It does work with Transformers.js though. See model card:
Example: Zero-shot image classification w/ Xenova/siglip-large-patch16-384:

import { pipeline } from '@xenova/transformers';

const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-large-patch16-384');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
    hypothesis_template: 'a photo of {}',
});
console.log(output);
// [
//   { score: 0.4783420264720917, label: '2 cats' },
//   { score: 0.00022271279885899276, label: '2 dogs' }
// ]

Example: Compute text embeddings with SiglipTextModel.

import { AutoTokenizer, SiglipTextModel } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/siglip-large-patch16-384');
const text_model = await SiglipTextModel.from_pretrained('Xenova/siglip-large-patch16-384');

// Run tokenization
const texts = ['a photo of 2 cats', 'a photo of 2 dogs'];
const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });

// Compute embeddings
const { pooler_output } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 768 ],
//   type: 'float32',
//   data: Float32Array(1536) [ ... ],
//   size: 1536
// }

Example: Compute vision embeddings with SiglipVisionModel.

import { AutoProcessor, SiglipVisionModel, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/siglip-large-patch16-384');
const vision_model = await SiglipVisionModel.from_pretrained('Xenova/siglip-large-patch16-384');

// Read image and run processor
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);

// Compute embeddings
const { pooler_output } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 768 ],
//   type: 'float32',
//   data: Float32Array(768) [ ... ],
//   size: 768
// }

aliencaocao · 2024-06-08T03:38:08Z

Alright, thank you

I will still keep this issue open so you or someone else may make a PR to add the config into the repo.

bhavika · 2024-06-27T15:39:53Z

I'd like to take this if no one has picked it up yet @aliencaocao!

aliencaocao · 2024-06-27T15:47:43Z

Sure, actually I do have a working siglip to tensorrt conversion and inference script using torch2trt, I dont know how much it overlaps with what optimum has/uses. Maybe a maintainer can chip in so we dont rebuild the wheels?

bhavika · 2024-06-27T16:12:57Z

@aliencaocao Optimum only does the conversion of models to ONNX afaik, not TensorRT. So the work we do for this PR would stop just short of the ONNX to TensorRT conversion. That said, I will let another maintainer chime in!

rhysdg mentioned this issue Jun 25, 2024

Request for ONNX Conversion Script rhysdg/vision-at-a-clip#6

Closed

bhavika linked a pull request Jun 27, 2024 that will close this issue

Add support for SigLIP #1926

Draft

3 tasks

xenova mentioned this issue Aug 30, 2024

🚨 Fix torch.jit.trace for interpolate_pos_encoding in all vision models huggingface/transformers#33226

Merged

5 tasks

tengomucho added the feature-request New feature or request label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for export SigLIP models #1897

Add support for export SigLIP models #1897

aliencaocao commented Jun 6, 2024

aliencaocao commented Jun 6, 2024

xenova commented Jun 6, 2024

aliencaocao commented Jun 7, 2024 •

edited

Loading

xenova commented Jun 7, 2024

aliencaocao commented Jun 8, 2024

bhavika commented Jun 27, 2024

aliencaocao commented Jun 27, 2024

bhavika commented Jun 27, 2024

Add support for export SigLIP models #1897

Add support for export SigLIP models #1897

Comments

aliencaocao commented Jun 6, 2024

Feature request

Motivation

Your contribution

aliencaocao commented Jun 6, 2024

xenova commented Jun 6, 2024

aliencaocao commented Jun 7, 2024 • edited Loading

xenova commented Jun 7, 2024

aliencaocao commented Jun 8, 2024

bhavika commented Jun 27, 2024

aliencaocao commented Jun 27, 2024

bhavika commented Jun 27, 2024

aliencaocao commented Jun 7, 2024 •

edited

Loading