-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add YOLOS #16848
Add YOLOS #16848
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this new model! There are a few badly named variables left and some docstrings to fix, but overall it's in great shape!
Addressed most comments. The remaining comments are about badly formatted docstrings, however these are all copied from DETR (so I can't change them due to Also pinging @Narsil as the pipeline test for YOLOS is failing. This is because YOLOS doesn't take |
I'd advocate to make the changes in docstrings in DETR to be propagated to YOLOS in this PR, just to make sure we don't forget. |
Then the feature_extractor should not output them. The image pipeline are pretty simple and roughly simply do
|
Yeah the problem is, YOLOS uses the same feature extractor as DETR, which outputs both I think the easiest here is to add |
We're not adding an argument that will be ignored all the time, that's just confusing to users. Especially if they end up passing one and don't get why it's not used. If the feature extractor should not return |
Ok so I created a new YolosFeatureExtractor, however the pipeline test is still failing:
@Narsil could you help me debug this? It's weird cause |
I have checked and the reason is that the tested Feature extractor is actually a detr one, not a Yolo one: The pipeline tests rely on ModelTester to create the base objects |
The documentation is not available anymore as the PR was closed or merged. |
Failing test is unrelated, merging. |
* First draft * Add YolosForObjectDetection * Make forward pass work * Add mid position embeddings * Add interpolation of position encodings * Add expected values * Add YOLOS to tests * Add integration test * Support tiny model as well * Support all models in conversion script * Remove mid_pe_size attribute * Make more tests pass * Add model to README and fix config * Add copied from statements * Rename base_model_prefix to vit * Add missing YOLOS_PRETRAINED_CONFIG_ARCHIVE_MAP * Apply suggestions from code review * Apply more suggestions from code review * Convert remaining checkpoints * Improve docstrings * Add YolosFeatureExtractor * Add feature extractor to docs * Add corresponding tests * Fix style * Fix docs * Apply suggestion from code review * Fix bad rebase * Fix some more bad rebase * Fix missing character * Improve docs and variable names Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* First draft * Add YolosForObjectDetection * Make forward pass work * Add mid position embeddings * Add interpolation of position encodings * Add expected values * Add YOLOS to tests * Add integration test * Support tiny model as well * Support all models in conversion script * Remove mid_pe_size attribute * Make more tests pass * Add model to README and fix config * Add copied from statements * Rename base_model_prefix to vit * Add missing YOLOS_PRETRAINED_CONFIG_ARCHIVE_MAP * Apply suggestions from code review * Apply more suggestions from code review * Convert remaining checkpoints * Improve docstrings * Add YolosFeatureExtractor * Add feature extractor to docs * Add corresponding tests * Fix style * Fix docs * Apply suggestion from code review * Fix bad rebase * Fix some more bad rebase * Fix missing character * Improve docs and variable names Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
What does this PR do?
This PR adds YOLOS, an awesome and simple object detector.
YOLOS is just a single Transformer encoder (ViT), trained using DETR's objective.
For now, I've used "vit" as
base_model_prefix
, in order to easily load weights from ViT and ViTMAE checkpoints on the hub.