Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] New Augmentation techniques in Torchvison #3817

Open
8 of 17 tasks
oke-aditya opened this issue May 12, 2021 · 13 comments · Fixed by #4379
Open
8 of 17 tasks

[RFC] New Augmentation techniques in Torchvison #3817

oke-aditya opened this issue May 12, 2021 · 13 comments · Fixed by #4379

Comments

@oke-aditya
Copy link
Contributor

oke-aditya commented May 12, 2021

🚀 Feature

Inclusion of new Augmentation techniques in torchvision.transforms.

Motivation

Transforms are important for data augmentation 😅

Proposals

Additional context

To visitors
Kindly give a 👍 if you think any of these would help in your work.

Also if you have any transform in mind please provide few details here!

Linked to #3221

cc @vfdev-5 @fmassa

@datumbox datumbox changed the title [RFC] New Transforms in Torchvison [RFC] New Augmentation techniques in Torchvison May 12, 2021
@hassiahk
Copy link
Contributor

Would be nice to have these Augmentations in torchvision.transforms. 😄

Also found this official code implementation for Cutout.

@datumbox
Copy link
Contributor

@oke-aditya Do we need the cutout given we have Random Erasing which can be configured to have the same more or less effect?

@oke-aditya
Copy link
Contributor Author

I think the same. I compared both the implementations. RandomErasing is newer than Cutout, also both the augmentations produce almost similar results.

Also, As per docs RandomErasing does not work for for PIL Images. It works only for torch.Tensor. I am not sure if that is intentional or needs some work.

@tflahaul
Copy link

Not a transform idea but what about adding an optional 'target_transforms' argument in transforms.RandomApply? That way random imgs transforms and their targets equivalents could be applied at the same time.
The current way of doing so (from what I know) is by writing your own class and using functional transforms. For example :

class AugmentExample:
    def __init__(self, p=0.1):
        self.p = p

    def __call__(self, img, box):
        if self.p < torch.rand(1):
            img = img.flip(-1)
            box = box_hflip(box)
        return img, box

Also having a lot more keypoints/bbox transforms would be really great (ideally any image transform that involves a transformation of the targets should be accompanied by one?).

(Sorry if my english isn't right, I speak baguette.)

@oke-aditya
Copy link
Contributor Author

@datumbox is closing this issue intended?

As I understand there is dataset and transforms rework. Which would be a major refactor

Do we plan to migrate all transforms to new ones in near future?

(I had a minor look at the proposal which looks fantastic)

@vfdev-5 vfdev-5 reopened this Sep 15, 2021
@datumbox
Copy link
Contributor

Not at all intended; Github just closed it when I merged the PR. We are certainly not done here :D

@lezwon
Copy link
Contributor

lezwon commented Jun 21, 2022

@datumbox I'd like to take up ReMixMatch augmentation if no one's working on it. Would need some guidance on how to go around it though :)

@datumbox
Copy link
Contributor

@lezwon Thanks a lot for offering to help!

ReMixMatch focuses on learning augmentations and on using unlabelled data. One challenge with that is that the majority of the changes will have to land on references which are outside of TorchVision. Currently the reference scripts are in need of some rework to reduce the amount of duplicate code and improve the overall quality. It's on the top of our todos and until that's done, ideally we would like to avoid introducing significantly complex techniques like ReMixMatch.

I wonder if you would be interested in implementing the AutoAugment Detection algorithm listed above. @vfdev-5 has already added most of the necessary low-level kernels for doing transforms on the BBoxes in torchvision.prototype, so what's needed is to implement the AutoAugment technique itself. Of course since it touches prototype APIs it can be tricky too. Let me know your thoughts and perhaps Victor can also pitch in to see if it makes sense to work together and test the new API. Alternatively we can discuss for another contribution that you find interesting.

BTW I'm currently working on the SimpleCopyPaste contribution trying to if we can train more accurate models using it. I'll let you know when I have the full results. :)

@lezwon
Copy link
Contributor

lezwon commented Jun 21, 2022

@datumbox AutoAugment sounds good. I'll start looking into it. :) Also, I noticed your comment on SimpleCopyPaste PR. Lemme know if I can help in any way :)

@datumbox
Copy link
Contributor

@lezwon Fantastic! Just note that I'm talking about this version designed for detection: AutoAugment Detection. This is different from the already supported algorithm for classification. :)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 22, 2022

Without talking about transforms implementation in the prototype (how input type dispatch would happen etc), the only thing I think we could miss to implement AA Detection is bbox_only_* aug. I do not think that it is something complicated to implement.

@ain-soph
Copy link
Contributor

ain-soph commented Jul 5, 2022

May I ask what's the current plan for the Fast AutoAugment?

I have the implementation of another paper called Faster Autoaugment:
https://github.com/ain-soph/trojanzoo/tree/main/trojanvision/utils/autoaugment.
It's the reimplementation based on autoalbument.

Are maintainers interested in embedding this technique as well? If interested, what's the expected api to use it?
If there is a similar plan for the Fast AutoAugment as the template, I'm glad to follow.

Related issue: #5000

@datumbox
Copy link
Contributor

datumbox commented Jul 6, 2022

@ain-soph The Fast* AutoAugment methods are indeed on our radar. We should examine adding them after the work on the new Transforms API is complete. Let me explain why it's not primary target at this point:

  • We already offer strong auto-augmentation strategies that provide good results on classification (AutoAugment, RandAugment, TrivialAugmentWide and AugMix). Adding another one might not add as much value, though I appreciate the one you propose learns the augmentations from data which can be interesting.
  • The reason why we prioritize AutoAugment Detection and SimpleCopyPaste is because they cover areas for which we didn't have great support, for example Object Detection. We were lagging significantly from SOTA and in v0.13 we closed the gap by improving our accuracy by 8.1 mAP on average. So any technique that is going to improve other non-image classification tasks is priortized.
  • The API of Fast* AutoAugment methods is tricky as they are not just the transforms but also the modules/trainers for doing the learning. This is similar to other techniques such as Greedy Search Policy, which is on our radar but might be tricky to implement at this point. Key problem is that these techniques are tightly coupled with training loops and might require committing to a specific training paradigm. Though fixing our training loops is something on our radar (cc @kartikayk) the discussions are early days on how to achieve this.
  • We are still working on the API of Transforms v2 which will provide native support to all computer vision tasks and primitives for Bounding Boxes, Masks, Multiple Images and Videos. Currently we only support implementations for Images which is very limiting. Adding more non-critical transforms, increases the amount of tech-debt we have and makes it harder for us to migrate users out of them. Some of the transforms also require changes on the APIs and this is the reason some augmentations are placed on our References (like MixUp and CutMix) instead of putting them in main TorchVision. Hopefully all these are going to be resolved soon once we release the new Transforms API.

One area we could use help is models and particularly Video architectures. Have a look at #2707 for some ideas. I hope the current situation won't discourage you from sticking around and continue contributing to TorchVision. We definitely want any help we can get from the community! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants