NEW Feature: Mixup transform for Object Detection #6721

ambujpawar · 2022-10-07T12:42:26Z

Closes New Feature: Mixup Transform for Object Detection #6720
Passes tests
Uses transforms V2

Official implementation of the paper: Here

Minimalist code to reproduce:

import PIL
from torchvision import io, utils
from torchvision.prototype import features, transforms as T
from torchvision.prototype.transforms import functional as F


# Defining and wrapping input to appropriate Tensor Subclasses
path = "/Users/ambujpawar/Desktop/Cat03.jpeg"
path2 = "/Users/ambujpawar/Desktop/dog_2.jpeg"

# img = features.Image(io.read_image(path), color_space=features.ColorSpace.RGB)
img = PIL.Image.open(path)
img2 = PIL.Image.open(path2)

bbox_1 = features.BoundingBox(
    [[2, 0, 100, 100], [396, 92, 479, 241]],
    format=features.BoundingBoxFormat.XYXY,
    spatial_size=F.get_spatial_size(img),
)
bbox_2 = features.BoundingBox(
    [ [200, 100, 300, 300], [424, 38, 479, 250]],
    format=features.BoundingBoxFormat.XYXY,
    spatial_size=F.get_spatial_size(img2),
)
label = features.Label([59, 58])


# Defining and applying Transforms V2
trans = T.Compose(
    [
        T.MixupDetection(),
    ]
)

imgs = [img, img2]
bboxes = [bbox_1, bbox_2]
labels= [label, label]

imgs, bboxes, labels = trans(imgs, bboxes, labels)

# Visualizing results
viz = utils.draw_bounding_boxes(F.to_image_tensor(imgs[1]), boxes=bboxes[0])
F.to_pil_image(viz).show()

Examples output:
Please dont pay attention to bounding boxes in this particular image.
I just entered those boxes randomly

pmeier · 2022-11-07T16:11:33Z

Hey @ambujpawar and thanks a lot for the PR! I'll try to help you land it in the near future. As you might have noticed, this transform is not straight forward to implement since it requires a batch of detection samples. In this context this means a list of samples, whereas for classification "batch" usually means an extra batch dimension on a tensor. This makes this implementation a lot harder compared to regular MixUp.

Still, we need to be able to support it. I'll look into how we can streamline the process for example by providing a _DetectionBatchTransform or standalone utilities that makes this easier. I'll get back to you when I found a solution or need your input. Is that ok with you?

ambujpawar · 2022-11-07T21:37:34Z

Hi @pmeier, it sounds perfect to me. Looking forward to your suggestions :)

I agree with your comment regarding MixupforDetection taking batches of detection samples. However, shouldn't it be similar to what we do in CopyPaste transform? Because in copyPaste we also expect a batch of images

pmeier · 2022-11-08T07:34:28Z

However, shouldn't it be similar to what we do in CopyPaste transform? Because in copyPaste we also expect a batch of images

Exactly. Before we operated under the assumption that SimpleCopyPaste is the only batch detection transform and thus a one-off solution for it was good enough. With DetectionMixUp in the picture our assumption is no longer true and we need to look how we can provide utilities to ease the implementation of these transforms.

Right now the largest part of the implementation deals with the "infrastructure", i.e. extracting the right inputs and putting them back afterwards. Only a small part is spent on the actual algorithm. In a best case scenario, I find a solution so you can only write the algorithm and the remainder is handled by a base class or some high level utilities.

ambujpawar · 2022-11-08T08:31:15Z

That clears up all the questions for me. Thanks!

Yeah, a base class is perhaps the best solution in those regards. Please let me know if I my help is needed :)

pmeier

@ambujpawar I took the liberty of pushing a patch to your PR. I've added two functions flatten_and_extract as well as unflatten_and_insert that implement what their name implies. I'm actively looking for your feedback so nothing is fixed yet. Two things that I already noticed:

Both SimpleCopyPaste as well MixUpDetection use a "split" layout for images and targets. Is that by design or could we use one container like a dictionary for both of them. Imagine something like sample = {"image": ..., "boxes": ...}.
The old extraction and insertion logic converted to tensor and back for images and (un-)wrapped the other features. Right now, the new logic does not do this. Instead this is moved inside the _mixup function. We could move that back into the logic as well. What do you prefer?

ambujpawar · 2022-11-08T14:48:33Z

Thanks for adding the patch! :)

I think it looks nice. Regarding your questions:

Both SimpleCopyPaste as well MixUpDetection use a "split" layout for images and targets. Is that by design or could we use one container like a dictionary for both of them. Imagine something like sample = {"image": ..., "boxes": ...}.

Yes, they both use a "split" design, but Mixup Detection but Mixup does not use "Masks". Mixup is only used for Detection not Segmentation.

The old extraction and insertion logic converted to tensor and back for images and (un-)wrapped the other features. Right now, the new logic does not do this. Instead this is moved inside the _mixup function. We could move that back into the logic as well. What do you prefer?

If I had to choose one design, I would chose the former design but I dont have any strong arguments for it.

Shall we also include the developers of SimpleCopyPaste transform as well? They might also have some comments regarding these changes

pmeier

This is based on the example listed in the blogpost rergarding Transforms v2.
How should we call this transform instead?

Well, MixupDetection as well as SimpleCopyPaste are detection batch transforms and thus fall outside of the "regular" transforms. This is why we need so extra stuff to implement them properly. If you look at the other, their implementation is much simpler.

Batch transform here means that the input needs to be batched. For image classification transforms like CutMix or MixUp this simply means an extra batch dimension on the input tensors:

vision/torchvision/prototype/transforms/_augment.py

Lines 135 to 136 in 657c076

    
           if inpt.ndim < expected_ndim: 
        
               raise ValueError("The transform expects a batched input")

However, this is not possible for detection tasks. Each sample can have a different number of bounding boxes and thus we cannot put them into a single tensor. Hence, a "detection batch" is just a sequence of individual samples. For you example, this could be

batch = [(img, bbox_1, label), (img2, bbox_2, label)]
transformed_batch = trans(batch)

Of course you can also do

batch = [{"image": img, "boxes": bbox_1, "label": label}, ...]

or something else as long as the outer container is a sequence.

Figured out SimpleCopyPaste still doesn't work. Working on it

You don't have to. Let's make sure DetectionMixup works as we want it to, and I'll fix SimpleCopyPaste afterwards.

Let's make sure we expand the tests a little. Basically we should have three test cases:

a) and b) Make sure that _mixup is a no-op in case the ratio is == 0 or >= 1.0.
Make sure that we get the correct output for a different ratio, e.g. == 0.5. Right now, we are only doing a smoke test that checks the shapes.

I think when that is done, I can take over and fix the rest.

pmeier · 2022-12-19T12:56:57Z

test/test_prototype_transforms.py

@@ -1436,63 +1437,6 @@ def create_fake_image(self, mocker, image_type):
            return PIL.Image.new("RGB", (32, 32), 123)
        return mocker.MagicMock(spec=image_type)

-    def test__extract_image_targets_assertion(self, mocker):


Why did you delete this?

I was using it to test _extract_image_targets function. However, since we removed those functions I removed them from here as well

Oh sorry! Realized this was for TestSimpleCopyPaste. Undoing the changes, sorry

test/test_prototype_transforms.py

torchvision/prototype/transforms/_augment.py

pmeier · 2022-12-19T12:59:55Z

torchvision/prototype/transforms/_augment.py

+    def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
+        return dict(ratio=float(self._dist.sample()))


I've opted to sample the ratio in the _get_params method. This has two advantages:

People familiar with the other transforms can see at a glance that we are sampling something and this is not buried deep in the implementation.

_mixup is easier to test since it has no random behavior.

I agree, it looks much tidier this way!
However, I have a question: we dont use flat_inputs? Shall we just remove it?

I would keep it for consistency with the other transformations. This is the basic protocol for all Transform._get_params calls. Although we call it manually here, there is some benefit by aligning it. Someone not familiar with this transform, but the others in general might trip over the fact that the parameter is not there.

I just realized we are actually not passing anything here. Let's just pass the flat inputs for completeness.

torchvision/prototype/transforms/_augment.py

torchvision/prototype/transforms/_transform.py

pmeier · 2023-01-16T20:55:12Z

Hey @ambujpawar 👋 I hope you are all right. I wanted to check in on this PR. Are you planning on finishing it or should I take over?

ambujpawar · 2023-01-17T15:30:51Z

Hi @pmeier, thanks for asking! I'm doing good :) just back from a super long christmas and new year vacation so did not have time to work on this PR. I would still like to work on if we are not running on a deadline or something.

I can work on it this weekend and can request you for re-review :)
Does that work with you?

pmeier · 2023-01-17T18:48:26Z

No rush from my side. I thought I check on you after roughly one month of inactivity. In case you didn't plan to finish this, we still would like to have it and I would have taken over. This weekend sounds good.

ambujpawar · 2023-01-18T10:30:59Z

Thanks! I'll update the PR this weekend then! :)

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

ambujpawar · 2023-01-22T13:48:59Z

Hi, I added the test cases for when the ratios for mixup are 0 and 1. However, I still think there is some bug when we are mixing the two images. I am not able to exactly point what causes it though. Perhaps after looking at the code something rings a bell for you.

So, this is the expected output (or something similar). Notice the light appearance of cat in the background

However, after our latest changes it look like this. Notice the picture of cat is completely overwriting the picture of dog.

I am not exactly sure but I suspect something is going wrong when we are mixing images in augment.py Line 376-381. I am not able to solve it but perhaps you can have a look at it please?

pmeier · 2023-01-23T10:56:30Z

I am not exactly sure but I suspect something is going wrong when we are mixing images in augment.py Line 376-381. I am not able to solve it but perhaps you can have a look at it please?

Yup, the problem is that we replace the values in the first image with the ones from the second rather than adding them. To demonstrate, let's establish a visual benchmark first that we both can easily reproduce:

import PIL.Image

import torch

from torchvision.io import read_image
from torchvision.prototype import datapoints, transforms
from torchvision.utils import make_grid


def read_sample(path, label):
    image = datapoints.Image(read_image(path))
    bounding_box = datapoints.BoundingBox(
        [[0, 0, *image.spatial_size[::-1]]], format="xyxy", spatial_size=image.spatial_size
    )
    label = datapoints.Label([label])
    return dict(
        path=path,
        image=image,
        bounding_box=bounding_box,
        label=label,
    )


batch = [
    read_sample("test/assets/encode_jpeg/grace_hopper_517x606.jpg", 0),
    read_sample("test/assets/fakedata/logos/rgb_pytorch.png", 1),
]

transform = transforms.MixupDetection()

torch.manual_seed(0)
output = transform(batch)

image = make_grid([sample["image"] for sample in output])
PIL.Image.fromarray(image.permute(1, 2, 0).numpy()).save("mixup_detection.jpg")

Output with the current implementation is

So, in the left image, the PyTorch logo is the second image and thus we are just pasting it over Grace Hopper. On the right side the PyTorch logo is completely gone, since Grace Hopper is larger and thus completely paints over it.

Applying the first suggestion from below gives us

And thus the behavior we want.

torchvision/prototype/transforms/_augment.py

pmeier · 2023-01-23T11:39:59Z

I'm working on fixing SimpleCopyPaste now.

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

ambujpawar · 2023-01-23T12:10:34Z

And thus the behavior we want.

Yup, exactly! This is the behavior we want.
Thanks for fixing it, my eyes were not able to find it haha

…war/vision into 6720_add_mixup_transform

pmeier · 2023-01-23T16:07:09Z

I've pushed an update to SimpleCopyPaste, but so far I have only done visual checks. The tests for it very much relied on the internals and so I'll need to fix them as well. @ambujpawar is there anything left on your side that you want to do? Otherwise, I'm going to finish over the next few days.

ambujpawar · 2023-01-23T20:57:27Z

@ambujpawar is there anything left on your side that you want to do?

Nope. I think everything is done on my side and this mixupDetection feature is ready. :)

ambujpawar · 2023-04-04T11:45:31Z

Hi @pmeier, Congrats on the torchvision v0.15 release.
I just wanted to checkup on the future regarding the MixupDetection transform.
Is it still waiting on the topic of "How to smoothly support "pairwise" transforms" listed in #7319?

Thanks in advance!! :)

pmeier · 2023-04-04T12:00:26Z

Yes, unfortunately we are blocked by this. Sorry for not informing your earlier. We held off the batch transforms for now for the reason you listed above. I'll ping you here when we have figured it out. Thanks a lot for your patience!

ambujpawar · 2023-04-04T12:39:38Z

Ah sure! No worries
Thanks for the update! :)
BTW, the new transforms_v2 really look good. Thanks for them

Ambuj Pawar added 2 commits October 7, 2022 14:38

ADD: Empty file mixup.py for dummy PR

676a3ba

ADD: Empty transform class

60cdf3b

facebook-github-bot added the cla signed label Oct 7, 2022

ambujpawar marked this pull request as draft October 7, 2022 12:43

datumbox mentioned this pull request Oct 7, 2022

[RFC] Batteries Included - Phase 3 #6323

Open

16 tasks

ambujpawar and others added 8 commits October 22, 2022 15:55

Merge branch 'main' into 6720_add_mixup_transform

fd922ca

WIP: Random Mixup for detection

728c7ca

Merge branch 'main' into 6720_add_mixup_transform

3f204ac

First draft: Mixup detections

f1b70b9

Fix: precommit issues

cdda41b

Fix: failing CI issues

2d0765c

Fix: Tests and ADD: get_params and check_inputs functions

7e82ff2

Fix: Remove usage of soon to be deprecated to_tensor function

b83aedf

ambujpawar marked this pull request as ready for review November 6, 2022 15:50

ambujpawar and others added 3 commits November 6, 2022 16:50

Merge branch 'main' into 6720_add_mixup_transform

50bea74

Remove: get params for mixup

90799b8

Update _mixup_detection.py

248737d

pmeier self-assigned this Nov 7, 2022

Remove unused type: ignore due to failing CI test

26316a4

pmeier added 3 commits November 8, 2022 13:10

Merge branch 'main' into 6720_add_mixup_transform

d7e08d2

add batch detection helpers

04c80d7

use helpers in detection mixup

5667c91

pmeier marked this pull request as draft November 8, 2022 12:23

pmeier reviewed Nov 8, 2022

View reviewed changes

refactor helpers

e0724a3

pmeier added 4 commits December 19, 2022 13:10

create base class

99de232

Merge branch 'main' into 6720_add_mixup_transform

4ceef89

add shortcut for ratio==0

a6b9ae0

fix dtype

fce49b8

pmeier reviewed Dec 19, 2022

View reviewed changes

Merge branch 'main' into 6720_add_mixup_transform

05c0491

ambujpawar and others added 4 commits January 21, 2023 11:40

Apply suggestions from code review

d995471

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

Merge branch 'main' into 6720_add_mixup_transform

914a9ee

Undo removing test_extract_image_target of TestSimpleCopyPaste

cbf09c2

ADD: Test cases when mixup ratio is 0, 0.5, 1

685d042

Fix: was doing wrong asserts. Corrected it

3319215

pmeier reviewed Jan 23, 2023

View reviewed changes

torchvision/prototype/transforms/_augment.py Outdated Show resolved Hide resolved

pmeier added 2 commits January 23, 2023 12:16

fix mixing

02214b6

pass flat_inputs to get_params

4486e78

Update torchvision/prototype/transforms/_augment.py

1b6dbe1

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

pmeier added 2 commits January 23, 2023 17:01

refactor SimpleCopyPaste

8a912ba

Merge branch '6720_add_mixup_transform' of https://github.com/ambujpa…

ebd6bfd

…war/vision into 6720_add_mixup_transform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEW Feature: Mixup transform for Object Detection #6721

NEW Feature: Mixup transform for Object Detection #6721

ambujpawar commented Oct 7, 2022 •

edited

Loading

pmeier commented Nov 7, 2022

ambujpawar commented Nov 7, 2022 •

edited

Loading

pmeier commented Nov 8, 2022

ambujpawar commented Nov 8, 2022 •

edited

Loading

pmeier left a comment

ambujpawar commented Nov 8, 2022 •

edited

Loading

pmeier left a comment

pmeier Dec 19, 2022

ambujpawar Jan 21, 2023

ambujpawar Jan 21, 2023

pmeier Dec 19, 2022

ambujpawar Jan 21, 2023

pmeier Jan 23, 2023

pmeier Jan 23, 2023

pmeier commented Jan 16, 2023

ambujpawar commented Jan 17, 2023

pmeier commented Jan 17, 2023

ambujpawar commented Jan 18, 2023

ambujpawar commented Jan 22, 2023 •

edited

Loading

pmeier commented Jan 23, 2023

pmeier commented Jan 23, 2023

ambujpawar commented Jan 23, 2023

pmeier commented Jan 23, 2023

ambujpawar commented Jan 23, 2023

ambujpawar commented Apr 4, 2023

pmeier commented Apr 4, 2023

ambujpawar commented Apr 4, 2023

	if inpt.ndim < expected_ndim:
	raise ValueError("The transform expects a batched input")

		def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
		return dict(ratio=float(self._dist.sample()))

NEW Feature: Mixup transform for Object Detection #6721

Are you sure you want to change the base?

NEW Feature: Mixup transform for Object Detection #6721

Conversation

ambujpawar commented Oct 7, 2022 • edited Loading

pmeier commented Nov 7, 2022

ambujpawar commented Nov 7, 2022 • edited Loading

pmeier commented Nov 8, 2022

ambujpawar commented Nov 8, 2022 • edited Loading

pmeier left a comment

Choose a reason for hiding this comment

ambujpawar commented Nov 8, 2022 • edited Loading

pmeier left a comment

Choose a reason for hiding this comment

pmeier Dec 19, 2022

Choose a reason for hiding this comment

ambujpawar Jan 21, 2023

Choose a reason for hiding this comment

ambujpawar Jan 21, 2023

Choose a reason for hiding this comment

pmeier Dec 19, 2022

Choose a reason for hiding this comment

ambujpawar Jan 21, 2023

Choose a reason for hiding this comment

pmeier Jan 23, 2023

Choose a reason for hiding this comment

pmeier Jan 23, 2023

Choose a reason for hiding this comment

pmeier commented Jan 16, 2023

ambujpawar commented Jan 17, 2023

pmeier commented Jan 17, 2023

ambujpawar commented Jan 18, 2023

ambujpawar commented Jan 22, 2023 • edited Loading

pmeier commented Jan 23, 2023

pmeier commented Jan 23, 2023

ambujpawar commented Jan 23, 2023

pmeier commented Jan 23, 2023

ambujpawar commented Jan 23, 2023

ambujpawar commented Apr 4, 2023

pmeier commented Apr 4, 2023

ambujpawar commented Apr 4, 2023

ambujpawar commented Oct 7, 2022 •

edited

Loading

ambujpawar commented Nov 7, 2022 •

edited

Loading

ambujpawar commented Nov 8, 2022 •

edited

Loading

ambujpawar commented Nov 8, 2022 •

edited

Loading

ambujpawar commented Jan 22, 2023 •

edited

Loading