-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a Gradio UI for multi-modal inferencing using Llama 3.2 Vision/ #718
Conversation
…ty, image processing and text generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the super fast PR! I left some requests
recipes/quickstart/inference/local_inference/multi_modal_infer_Gradio_UI.py
Outdated
Show resolved
Hide resolved
…sssing hugigng-face token from the arguments
…ature and top_p sliders there.
@init27 I did the changes you asked, please check and let me know... I'll be happy to make it better |
Modified readme for new code for passing token via argument
Used small case "g" in gradio
@init27 added the changes you asked, please check... |
@init27 please let me know if anything required... |
@init27 can you check the PR, I renamed the filename too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the PR!
@himanshushukla12 sorry for missing this-looks great! Can you please make sure the CI/CD is green and merge? |
@init27 please check, everything is working fine. |
What does this PR do?
This PR introduces multi-modal inference using the Gradio UI for Llama 3.2 vision models. The Gradio UI allows users to upload images and generate descriptive text based on a prompt, with adjustable parameters such as top-k, max-tokens, temperature and top-p for fine-tuning text generation. With chatbox like interface.
Additionally, this PR:
Integrates the transformers and accelerate libraries for efficient model loading and inference.
Implements memory management for releasing GPU resources after inference.
Adds support for Hugging Face tokens to authenticate and access Llama models.