Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: MultiModal LLM with vector API #6604

Closed
qZhang88 opened this issue Jul 20, 2024 · 1 comment · Fixed by #6613
Closed

[Feature]: MultiModal LLM with vector API #6604

qZhang88 opened this issue Jul 20, 2024 · 1 comment · Fixed by #6613

Comments

@qZhang88
Copy link

🚀 The feature, motivation and pitch

Consider a scenario where a large model is deployed in the cloud, and the application is deployed on a computationally limited embedded device.

If we want to support multimodal dialogue interaction with vision and language, each request would send an image (considering the dialogue history, there would be many images). Given network bandwidth and other factors, this would cause a lot of latency.

Therefore, if the VLM's image encoder and projector are deployed on the embedded device, and if we could send the encoded vector instead during requests, the data transmission volume would be much smaller. This would reduce latency and improve the user experience.

Alternatives

The suggestted usage method is as follow

# Refer to the HuggingFace repo for the correct format to use
prompt = "USER: <vector>\nWhat is the content of this image?\nASSISTANT:"

# Image encoded vector
vector = np.array([x, x,x, x])

# Single prompt inference
outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {"vector": vector},
})

For this usage, deploying only a single-model LLM model could support multi-modal model usage, and the modality is not limited.

Additional context

No response

@ywang96
Copy link
Member

ywang96 commented Jul 20, 2024

Hey @qZhang88 thanks for the issue! Supporting image embeddings as inputs is indeed in our Q3 roadmap that you can check in #4194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants