Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JavaScript & Swift SDK #81

Merged
merged 48 commits into from
Apr 25, 2024
Merged

JavaScript & Swift SDK #81

merged 48 commits into from
Apr 25, 2024

Conversation

ashvardanian
Copy link
Contributor

@ashvardanian ashvardanian commented Apr 19, 2024

How many AI models can run on-device out of the box? UForm multimodal embeddings can 🥳

Model Parameters Languages Architecture
uform3-image-text-english-large 🆕 365M 1 6 text layers, ViT-L/14, 6 multimodal layers
uform3-image-text-english-base 143M 1 2 text layers, ViT-B/16, 2 multimodal layers
uform3-image-text-english-small 🆕 79M 1 2 text layers, ViT-S/16, 2 multimodal layers
uform3-image-text-multilingual-base 206M 21 8 text layers, ViT-B/16, 4 multimodal layers

JavaScript

Load the models and preprocessors for different modalities:

import { getModel, Modality } from 'uform';
import { TextProcessor, TextEncoder, ImageEncoder, ImageProcessor } from 'uform';

const { configPath, modalityPaths, tokenizerPath } = await getModel({
    modelId: 'unum-cloud/uform3-image-text-english-small',
    modalities: [Modality.TextEncoder, Modality.ImageEncoder],
});

Embed images:

const imageProcessor = new ImageProcessor(configPath);
await imageProcessor.init();
const processedImages = await imageProcessor.process("path/to/image.png");

const imageEncoder = new ImageEncoder(modalityPaths.image_encoder, imageProcessor);
await imageEncoder.init();
const imageOutput = await imageEncoder.encode(processedImages);
assert(imageOutput.embeddings.dims.length === 2, "Output should be 2D");

Embed queries:

const textProcessor = new TextProcessor(configPath, tokenizerPath);
await textProcessor.init();
const processedTexts = await textProcessor.process("a small red panda in a zoo");

const textEncoder = new TextEncoder(modalityPaths.text_encoder, textProcessor);
await textEncoder.init();
const textOutput = await textEncoder.encode(processedTexts);
assert(textOutput.embeddings.dims.length === 2, "Output should be 2D");
await textEncoder.dispose();

Swift

Embed images:

let imageModel = try await ImageEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let imageURL = "https://github.com/ashvardanian/ashvardanian/blob/master/demos/bbq-on-beach.jpg?raw=true"
guard let url = URL(string: imageURL),
    let imageSource = CGImageSourceCreateWithURL(url as CFURL, nil),
    let cgImage = CGImageSourceCreateImageAtIndex(imageSource, 0, nil) {
    throw Exception("Could not load image from URL: \(imageURL)")
}

var imageEmbedding: Embedding = try imageModel.encode(cgImage)
var imageVector: [Float32] = embedding.asFloats()

Embed queries:

let textModel = try await TextEncoder(modelName: "unum-cloud/uform3-image-text-english-small")
let text = "A group of friends enjoy a barbecue on a sandy beach, with one person grilling over a large black grill, while the other sits nearby, laughing and enjoying the camaraderie."
let textEmbedding: Embedding = try textModel.encode(text)
let textVector: [Float32] = textEmbedding.asFloats()

Python

Load model:

from uform import get_model, Modality

model_name = 'unum-cloud/uform3-image-text-english-small'
modalities = [Modality.TEXT_ENCODER, Modality.IMAGE_ENCODER]
processors, models = get_model(model_name, modalities=modalities)

Embed images:

import requests
from io import BytesIO
from PIL import Image

image_url = 'https://media-cdn.tripadvisor.com/media/photo-s/1b/28/6b/53/lovely-armenia.jpg'
image = Image.open(BytesIO(requests.get(image_url).content))

processor_image = processors[Modality.IMAGE_ENCODER]
model_image = models[Modality.IMAGE_ENCODER]
image_data = processor_image(image)
image_features, image_embedding = model_image.encode(image_data, return_features=True)

Embed queries:

text = 'a cityscape bathed in the warm glow of the sun, with varied architecture and a towering, snow-capped mountain rising majestically in the background'

model_text = models[Modality.TEXT_ENCODER]
processor_text = processors[Modality.TEXT_ENCODER]

text_data = processor_text(text)
text_features, text_embedding = model_text.encode(text_data, return_features=True)

@ashvardanian ashvardanian changed the title JavaScript SDK JavaScript & Swift SDK Apr 24, 2024
Copy link
Contributor

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! 😄 Just some minor suggestions/improvements/comments.

javascript/encoders.mjs Outdated Show resolved Hide resolved
javascript/encoders_test.js Outdated Show resolved Hide resolved
javascript/encoders_test.js Show resolved Hide resolved
javascript/encoders_test.js Outdated Show resolved Hide resolved
@ashvardanian ashvardanian merged commit 641b8c0 into unum-cloud:main-dev Apr 25, 2024
1 check passed
@ashvardanian
Copy link
Contributor Author

🎉 This PR is included in version 3.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants