-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ViTMatte #25051
Add ViTMatte #25051
Conversation
ae1e7af
to
ed1ac40
Compare
- sections: | ||
- local: tasks/sequence_classification | ||
title: Text classification | ||
- local: tasks/token_classification | ||
title: Token classification | ||
- local: tasks/question_answering | ||
title: Question answering | ||
- local: tasks/language_modeling | ||
title: Causal language modeling | ||
- local: tasks/masked_language_modeling | ||
title: Masked language modeling | ||
- local: tasks/translation | ||
title: Translation | ||
- local: tasks/summarization | ||
title: Summarization | ||
- local: tasks/multiple_choice | ||
title: Multiple choice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what happened here..
My ruff version is 0.0.259, black version is 23.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. There seems to be a bit of work left in the model initializations and some of the docstrings.
- sections: | ||
- local: tasks/sequence_classification | ||
title: Text classification | ||
- local: tasks/token_classification | ||
title: Token classification | ||
- local: tasks/question_answering | ||
title: Question answering | ||
- local: tasks/language_modeling | ||
title: Causal language modeling | ||
- local: tasks/masked_language_modeling | ||
title: Masked language modeling | ||
- local: tasks/translation | ||
title: Translation | ||
- local: tasks/summarization | ||
title: Summarization | ||
- local: tasks/multiple_choice | ||
title: Multiple choice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.
@@ -106,6 +106,7 @@ | |||
("vit_hybrid", "ViTHybridImageProcessor"), | |||
("vit_mae", "ViTImageProcessor"), | |||
("vit_msn", "ViTImageProcessor"), | |||
("vitmatte", "VitMatteImageProcessor"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a default for vitedet
as well (even if it doesn't have its own image processor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VitDet was currently only added for VitDetBackbone
. An image processor can theoretically be added once Mask R-CNN is added (#25348), which uses VitDet as backbone (and then use MaskRCNNImageProcessor
as default image processor).
5692dca
to
48f8863
Compare
@sgugger apart from the toctree issue which I'm still investigating, all comments are addressed. |
Before I start reviewing - could you separate out the addition of VitDet and VitMatte? They should have their own respective PRs. |
What does this PR do?
This PR adds the ViTMatte model, an elegant approach to image matting, entirely relying on the Vision Transformer backbone doing the heavy work, with a lightweight head on top.
Here's a Colab notebook showcasing inference: https://colab.research.google.com/drive/1pWTn3Iur-NR2xUIyDE31dBgng_hXjSsn?usp=sharing.
The model leverages VitDet as backbone, hence this PR adds VitDet as a standalone model as well. It then leverages the AutoBackbone class to use this model as a backbone for image matting.
Fixes #25040.