-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Model] Add support for LightOnOCR #26916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
- Updated `supported_models.md` to include `LightOnOCRForConditionalGeneration`. - Implemented `run_lightonocr` function in `vision_language.py` for handling LightOnOCR model requests. - Registered LightOnOCR in the model registry for example usage. Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Documentation preview: https://vllm--26916.org.readthedocs.build/en/26916/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the LightOnOCR model. The implementation is well-structured and follows the existing patterns for multimodal models within the vLLM project. I have identified one high-severity issue in the example script related to an unused function parameter, which could be misleading. My review includes a code suggestion to address this by adhering to a common Python convention for unused variables, which will improve code clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the LightOnOCR model. The changes include updating documentation, adding an example, and implementing the model logic. The implementation is based on the Mistral3/Pixtral architecture. My review identified a critical bug in the input parsing logic that could lead to a crash, and a high-severity issue in token processing that might result in incorrect behavior. I have provided suggestions to address these issues. The other changes appear to be correct.
|
Apologies if this obvious, but I can't seem to find the model on Huggingface. Could you provide a link to where the model is hosted? |
|
That's normal as the model is not released yet! This PR aims to add support for it so it's supported from the launch day. I can also provide more info if needed or provide a dummy model for testing the code. |
|
Thanks for adding your implementation to vLLM! Made some initial comments. |
…rted_models.md Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
…mples Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
… registry Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
|
Thanks for the quick feedback! |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the LightOnOCR model, a 1B parameter OCR Vision Language Model. The changes span documentation, examples, and the core model implementation, which appears to be well-integrated and follows the existing patterns for similar multimodal models in the repository. I've identified one critical issue in the new model implementation that could lead to a server crash under certain conditions and have provided a code suggestion to address it.
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround, have you verified that the model works correctly? If so then we can merge it.
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
|
I have verified by launching the model locally and it still works as before 🚀 |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM then, thanks for adding this!
|
I have one final hf_to_vllm_mapper issue to fix to ensure the Transformers impl also can load the weights from the same repo! |
|
Sure, tell me when that's done |
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com>
|
it should be good now, thanks for waiting! |
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Purpose
This PR adds support for LightOnOCR: a SOTA 1B OCR VLM built on top of Mistral3 ViT and Qwen3 LM decoder.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.