-
Notifications
You must be signed in to change notification settings - Fork 31.1k
Fix missing video inputs for PerceptionLM. #39971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the fix, cc @zucchini-nlp who made the initial change!
For the non-standard image inputs, OK but would be better with a test that goes with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oke, thanks! I think we need to standardize output shapes from the image processor to be consistent though
Maybe we can always return 5D pixels or already flattened 4D pixels? Whichever way looks good, we have models doing both options
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@zucchini-nlp The reason shape unification is done in models rather than image_processing is that i noticed in training model sees a different input shape than in eval/inference.
|
|
@zucchini-nlp Let me split the PR and merge the more urgent fix first? |
This reverts commit 181d87b.
|
[For maintainers] Suggested jobs to run (before merge) run-slow: perception_lm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, lets merge this one first and add it in the next patch release
|
@zucchini-nlp My bad. Just realized this from collate_fn in my training script ( I added one dimension) Let me open another PR for this simple fix for image_preprocessor and update corresponding training script in model card.
|
* Fix missing video inputs for PerceptionLM. * Minor fix for vanilla input image (only C,H,W, no tiles dim). * Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)." This reverts commit 181d87b.
Critical: Fixes missing video input for PerceptionLM (accidentally removed in PR)
Minor: Add support for vanilla image that only has C,H,W dims but not tiles dim.
This is non-default image shapes used in PLM but it's useful in demos and low-resoure devices.
e.g., in just added "PLM Simple Fine-tuning Example" under
https://huggingface.co/facebook/Perception-LM-1B#plm-usage