Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViTMatte #25051

Closed
wants to merge 49 commits into from
Closed

Add ViTMatte #25051

wants to merge 49 commits into from

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Jul 24, 2023

What does this PR do?

This PR adds the ViTMatte model, an elegant approach to image matting, entirely relying on the Vision Transformer backbone doing the heavy work, with a lightweight head on top.

Here's a Colab notebook showcasing inference: https://colab.research.google.com/drive/1pWTn3Iur-NR2xUIyDE31dBgng_hXjSsn?usp=sharing.

The model leverages VitDet as backbone, hence this PR adds VitDet as a standalone model as well. It then leverages the AutoBackbone class to use this model as a backbone for image matting.

Fixes #25040.

@NielsRogge NielsRogge force-pushed the add_vit_matte branch 2 times, most recently from ae1e7af to ed1ac40 Compare August 7, 2023 08:52
@NielsRogge NielsRogge requested a review from sgugger August 7, 2023 09:43
Comment on lines -28 to -46
- sections:
- local: tasks/sequence_classification
title: Text classification
- local: tasks/token_classification
title: Token classification
- local: tasks/question_answering
title: Question answering
- local: tasks/language_modeling
title: Causal language modeling
- local: tasks/masked_language_modeling
title: Masked language modeling
- local: tasks/translation
title: Translation
- local: tasks/summarization
title: Summarization
- local: tasks/multiple_choice
title: Multiple choice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened here..

My ruff version is 0.0.259, black version is 23.1.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. There seems to be a bit of work left in the model initializations and some of the docstrings.

Comment on lines -28 to -46
- sections:
- local: tasks/sequence_classification
title: Text classification
- local: tasks/token_classification
title: Token classification
- local: tasks/question_answering
title: Question answering
- local: tasks/language_modeling
title: Causal language modeling
- local: tasks/masked_language_modeling
title: Masked language modeling
- local: tasks/translation
title: Translation
- local: tasks/summarization
title: Summarization
- local: tasks/multiple_choice
title: Multiple choice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.

@@ -106,6 +106,7 @@
("vit_hybrid", "ViTHybridImageProcessor"),
("vit_mae", "ViTImageProcessor"),
("vit_msn", "ViTImageProcessor"),
("vitmatte", "VitMatteImageProcessor"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a default for vitedet as well (even if it doesn't have its own image processor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VitDet was currently only added for VitDetBackbone. An image processor can theoretically be added once Mask R-CNN is added (#25348), which uses VitDet as backbone (and then use MaskRCNNImageProcessor as default image processor).

src/transformers/models/vitdet/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/vitdet/configuration_vitdet.py Outdated Show resolved Hide resolved
src/transformers/models/vitdet/test.py Outdated Show resolved Hide resolved
src/transformers/models/vitmatte/modeling_vitmatte.py Outdated Show resolved Hide resolved
src/transformers/models/vitmatte/modeling_vitmatte.py Outdated Show resolved Hide resolved
src/transformers/models/vitmatte/test.py Outdated Show resolved Hide resolved
tests/models/vitdet/test_modeling_vitdet.py Outdated Show resolved Hide resolved
tests/models/vitmatte/test_modeling_vitmatte.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

@sgugger apart from the toctree issue which I'm still investigating, all comments are addressed.

@NielsRogge NielsRogge requested a review from amyeroberts August 15, 2023 10:41
@amyeroberts
Copy link
Collaborator

Before I start reviewing - could you separate out the addition of VitDet and VitMatte? They should have their own respective PRs.

@NielsRogge NielsRogge mentioned this pull request Aug 15, 2023
@NielsRogge NielsRogge mentioned this pull request Aug 29, 2023
2 tasks
@NielsRogge NielsRogge closed this Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ViTMatte model
3 participants