Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for export SigLIP models #1897

Open
aliencaocao opened this issue Jun 6, 2024 · 8 comments · May be fixed by #1926
Open

Add support for export SigLIP models #1897

aliencaocao opened this issue Jun 6, 2024 · 8 comments · May be fixed by #1926
Labels
feature-request New feature or request

Comments

@aliencaocao
Copy link

Feature request

Add support for export SigLIP models

Motivation

As used by many SOTA VLMs, SigLIP is gaining traction and supporting it can be the step 1 to supporting many VLMs.

Your contribution

Not at the moment

@aliencaocao
Copy link
Author

Hi @xenova I see that you have done it already in https://huggingface.co/Xenova/siglip-large-patch16-384, may I know how did you export it since it is not supported in Optimum yet?

@xenova
Copy link
Contributor

xenova commented Jun 6, 2024

Here are my custom configs: https://github.com/xenova/transformers.js/blob/main/scripts/extra/siglip.py. Hope that helps!

@aliencaocao
Copy link
Author

aliencaocao commented Jun 7, 2024

thanks. do you know if this can be used with HF pipeline?

@xenova
Copy link
Contributor

xenova commented Jun 7, 2024

The python library? Not too sure. It does work with Transformers.js though. See model card:
Example: Zero-shot image classification w/ Xenova/siglip-large-patch16-384:

import { pipeline } from '@xenova/transformers';

const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-large-patch16-384');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
    hypothesis_template: 'a photo of {}',
});
console.log(output);
// [
//   { score: 0.4783420264720917, label: '2 cats' },
//   { score: 0.00022271279885899276, label: '2 dogs' }
// ]

Example: Compute text embeddings with SiglipTextModel.

import { AutoTokenizer, SiglipTextModel } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/siglip-large-patch16-384');
const text_model = await SiglipTextModel.from_pretrained('Xenova/siglip-large-patch16-384');

// Run tokenization
const texts = ['a photo of 2 cats', 'a photo of 2 dogs'];
const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });

// Compute embeddings
const { pooler_output } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 768 ],
//   type: 'float32',
//   data: Float32Array(1536) [ ... ],
//   size: 1536
// }

Example: Compute vision embeddings with SiglipVisionModel.

import { AutoProcessor, SiglipVisionModel, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/siglip-large-patch16-384');
const vision_model = await SiglipVisionModel.from_pretrained('Xenova/siglip-large-patch16-384');

// Read image and run processor
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);

// Compute embeddings
const { pooler_output } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 768 ],
//   type: 'float32',
//   data: Float32Array(768) [ ... ],
//   size: 768
// }

@aliencaocao
Copy link
Author

Alright, thank you

I will still keep this issue open so you or someone else may make a PR to add the config into the repo.

@bhavika
Copy link

bhavika commented Jun 27, 2024

I'd like to take this if no one has picked it up yet @aliencaocao!

@aliencaocao
Copy link
Author

Sure, actually I do have a working siglip to tensorrt conversion and inference script using torch2trt, I dont know how much it overlaps with what optimum has/uses. Maybe a maintainer can chip in so we dont rebuild the wheels?

@bhavika
Copy link

bhavika commented Jun 27, 2024

@aliencaocao Optimum only does the conversion of models to ONNX afaik, not TensorRT. So the work we do for this PR would stop just short of the ONNX to TensorRT conversion. That said, I will let another maintainer chime in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants