Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multimodal model #344

Open
spoonbobo opened this issue Feb 21, 2024 · 3 comments
Open

Support for multimodal model #344

spoonbobo opened this issue Feb 21, 2024 · 3 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@spoonbobo
Copy link

Does tensorrtllm_backend supports multimodal LLM like LLaVA like those listed in https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal?

@symphonylyh
Copy link
Collaborator

Hi @spoonbobo , not yet. We're currently working on a general backend for structures like encoder-decoder and multimodal models. Encoder-decoder work is in progress and multimodal follows it. The progress is tracked in NVIDIA/TensorRT-LLM#800.

Meanwhie, if you're referring to a Triton Python backend, do you think it's ok for users to implement a multimodal workflow based on the gpt example

?

@spoonbobo
Copy link
Author

hi @symphonylyh. Appreciated the efforts you've put on providing general encoder-decoder support. Haven't tried implement a workflow based on this example, I think definitely worth a try.

@byshiue byshiue added the triaged Issue has been triaged by maintainers label Feb 27, 2024
@FernandoDorado
Copy link

I'm also very interested in these capabilities. Looking forward to try the examples provided. Is there any reference tutorial or guide?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants