-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for export SigLIP models #1897
Comments
Hi @xenova I see that you have done it already in https://huggingface.co/Xenova/siglip-large-patch16-384, may I know how did you export it since it is not supported in Optimum yet? |
Here are my custom configs: https://github.com/xenova/transformers.js/blob/main/scripts/extra/siglip.py. Hope that helps! |
thanks. do you know if this can be used with HF pipeline? |
The python library? Not too sure. It does work with Transformers.js though. See model card: import { pipeline } from '@xenova/transformers';
const classifier = await pipeline('zero-shot-image-classification', 'Xenova/siglip-large-patch16-384');
const url = 'http://images.cocodataset.org/val2017/000000039769.jpg';
const output = await classifier(url, ['2 cats', '2 dogs'], {
hypothesis_template: 'a photo of {}',
});
console.log(output);
// [
// { score: 0.4783420264720917, label: '2 cats' },
// { score: 0.00022271279885899276, label: '2 dogs' }
// ] Example: Compute text embeddings with import { AutoTokenizer, SiglipTextModel } from '@xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/siglip-large-patch16-384');
const text_model = await SiglipTextModel.from_pretrained('Xenova/siglip-large-patch16-384');
// Run tokenization
const texts = ['a photo of 2 cats', 'a photo of 2 dogs'];
const text_inputs = tokenizer(texts, { padding: 'max_length', truncation: true });
// Compute embeddings
const { pooler_output } = await text_model(text_inputs);
// Tensor {
// dims: [ 2, 768 ],
// type: 'float32',
// data: Float32Array(1536) [ ... ],
// size: 1536
// } Example: Compute vision embeddings with import { AutoProcessor, SiglipVisionModel, RawImage} from '@xenova/transformers';
// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/siglip-large-patch16-384');
const vision_model = await SiglipVisionModel.from_pretrained('Xenova/siglip-large-patch16-384');
// Read image and run processor
const image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
const image_inputs = await processor(image);
// Compute embeddings
const { pooler_output } = await vision_model(image_inputs);
// Tensor {
// dims: [ 1, 768 ],
// type: 'float32',
// data: Float32Array(768) [ ... ],
// size: 768
// } |
Alright, thank you I will still keep this issue open so you or someone else may make a PR to add the config into the repo. |
I'd like to take this if no one has picked it up yet @aliencaocao! |
Sure, actually I do have a working siglip to tensorrt conversion and inference script using |
@aliencaocao Optimum only does the conversion of models to ONNX afaik, not TensorRT. So the work we do for this PR would stop just short of the ONNX to TensorRT conversion. That said, I will let another maintainer chime in! |
Feature request
Add support for export SigLIP models
Motivation
As used by many SOTA VLMs, SigLIP is gaining traction and supporting it can be the step 1 to supporting many VLMs.
Your contribution
Not at the moment
The text was updated successfully, but these errors were encountered: