Add ViTMatte #25051

NielsRogge · 2023-07-24T14:51:36Z

What does this PR do?

This PR adds the ViTMatte model, an elegant approach to image matting, entirely relying on the Vision Transformer backbone doing the heavy work, with a lightweight head on top.

Here's a Colab notebook showcasing inference: https://colab.research.google.com/drive/1pWTn3Iur-NR2xUIyDE31dBgng_hXjSsn?usp=sharing.

The model leverages VitDet as backbone, hence this PR adds VitDet as a standalone model as well. It then leverages the AutoBackbone class to use this model as a backbone for image matting.

Fixes #25040.

src/transformers/image_utils.py

NielsRogge · 2023-08-07T10:02:07Z

docs/source/en/_toctree.yml

-  - sections:
-      - local: tasks/sequence_classification
-        title: Text classification
-      - local: tasks/token_classification
-        title: Token classification
-      - local: tasks/question_answering
-        title: Question answering
-      - local: tasks/language_modeling
-        title: Causal language modeling
-      - local: tasks/masked_language_modeling
-        title: Masked language modeling
-      - local: tasks/translation
-        title: Translation
-      - local: tasks/summarization
-        title: Summarization
-      - local: tasks/multiple_choice
-        title: Multiple choice


Not sure what happened here..

My ruff version is 0.0.259, black version is 23.1.0

Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.

sgugger

Thanks for adding this. There seems to be a bit of work left in the model initializations and some of the docstrings.

sgugger · 2023-08-07T14:46:40Z

docs/source/en/_toctree.yml

-  - sections:
-      - local: tasks/sequence_classification
-        title: Text classification
-      - local: tasks/token_classification
-        title: Token classification
-      - local: tasks/question_answering
-        title: Question answering
-      - local: tasks/language_modeling
-        title: Causal language modeling
-      - local: tasks/masked_language_modeling
-        title: Masked language modeling
-      - local: tasks/translation
-        title: Translation
-      - local: tasks/summarization
-        title: Summarization
-      - local: tasks/multiple_choice
-        title: Multiple choice


Neither ruff or black will touch a yaml file. Nor sure what happened either but it needs to be reverted.

sgugger · 2023-08-07T14:48:14Z

src/transformers/models/auto/image_processing_auto.py

@@ -106,6 +106,7 @@
        ("vit_hybrid", "ViTHybridImageProcessor"),
        ("vit_mae", "ViTImageProcessor"),
        ("vit_msn", "ViTImageProcessor"),
+        ("vitmatte", "VitMatteImageProcessor"),


Would be nice to have a default for vitedet as well (even if it doesn't have its own image processor).

VitDet was currently only added for VitDetBackbone. An image processor can theoretically be added once Mask R-CNN is added (#25348), which uses VitDet as backbone (and then use MaskRCNNImageProcessor as default image processor).

src/transformers/models/vitdet/__init__.py

src/transformers/models/vitdet/configuration_vitdet.py

src/transformers/models/vitdet/test.py

src/transformers/models/vitmatte/modeling_vitmatte.py

src/transformers/models/vitmatte/test.py

tests/models/vitdet/test_modeling_vitdet.py

tests/models/vitmatte/test_modeling_vitmatte.py

NielsRogge · 2023-08-14T15:01:11Z

@sgugger apart from the toctree issue which I'm still investigating, all comments are addressed.

amyeroberts · 2023-08-15T17:39:49Z

Before I start reviewing - could you separate out the addition of VitDet and VitMatte? They should have their own respective PRs.

NielsRogge commented Jul 24, 2023

View reviewed changes

src/transformers/image_utils.py Outdated Show resolved Hide resolved

NielsRogge force-pushed the add_vit_matte branch 2 times, most recently from ae1e7af to ed1ac40 Compare August 7, 2023 08:52

NielsRogge requested a review from sgugger August 7, 2023 09:43

NielsRogge commented Aug 7, 2023

View reviewed changes

sgugger reviewed Aug 7, 2023

View reviewed changes

NielsRogge added 24 commits August 14, 2023 10:58

First draft

d492d47

More improvements

319d4da

More improvements

e511a6e

More improvements

82174f6

Improve variable names

2561b2e

More improved variable names

0c1fe80

Make activation function configurable

3f64984

Remove copied from

46fcc9d

Add backbone class

2018d1f

Add VitMatte

dab9992

More improvements

750d5d3

More improvements

d7135c9

Add conversion script

a449ab0

Convert backbone weights

0eb913d

More improvements

0c7a67b

Fix docs

a2be1eb

Fix vit matte image processor

5c70c4a

More fixes

05e0375

More fixes

ad81978

Fix image processor

69c72df

Add output class

f668bae

Fix output class

8b8659c

Make more tests pass

37d4198

More improvements

17d62ff

NielsRogge added 21 commits August 14, 2023 11:00

Fix one more test

795c1de

Fix even more tests

a2e72a0

Fix integration test

e4deb8b

Improve tests

bf2a650

More improvements

bdb7beb

Fix inplace operations

af8f918

Fix test_initialization

c41b51c

Fix rebase

318e9f2

Fix test

a13a4ea

Fix one more test

e450ae0

Fix index table

7095368

Fix rebase

16004d4

Fix retain grad of hidden_states

c46b654

Fix style

48964b6

Add integration test, update conversion script, fix copies

248d5f8

Fix toctree

0cce7f6

More fixes

5a138e0

Address comments

551724a

Reduce num_hidden_layers

85ecf7b

Fix init_weights

be04077

Fix rebase

48f8863

NielsRogge force-pushed the add_vit_matte branch from 5692dca to 48f8863 Compare August 14, 2023 09:12

NielsRogge added 2 commits August 14, 2023 11:17

Address comment

3855893

Skip test

42b1eae

NielsRogge requested a review from amyeroberts August 15, 2023 10:41

NielsRogge mentioned this pull request Aug 15, 2023

Add ViTDet #25524

Merged

NielsRogge mentioned this pull request Aug 29, 2023

Add ViTMatte #25843

Merged

2 tasks

NielsRogge closed this Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ViTMatte #25051

Add ViTMatte #25051

NielsRogge commented Jul 24, 2023 •

edited

Loading

NielsRogge Aug 7, 2023

sgugger Aug 7, 2023

sgugger left a comment

sgugger Aug 7, 2023

sgugger Aug 7, 2023

NielsRogge Aug 14, 2023

NielsRogge commented Aug 14, 2023

amyeroberts commented Aug 15, 2023

Add ViTMatte #25051

Add ViTMatte #25051

Conversation

NielsRogge commented Jul 24, 2023 • edited Loading

What does this PR do?

NielsRogge Aug 7, 2023

Choose a reason for hiding this comment

sgugger Aug 7, 2023

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

sgugger Aug 7, 2023

Choose a reason for hiding this comment

sgugger Aug 7, 2023

Choose a reason for hiding this comment

NielsRogge Aug 14, 2023

Choose a reason for hiding this comment

NielsRogge commented Aug 14, 2023

amyeroberts commented Aug 15, 2023

NielsRogge commented Jul 24, 2023 •

edited

Loading