Image Augmentations on GPU Tests #483

felipecode · 2018-05-01T17:09:50Z

Hello Pytorch vision people !

I am currently working on a project that requires lots of image augmentations
to perform better. And I believe this is not only my case. When reading
about topics such a domain randomization, we see that big variations on images
leads to much more generalization.

I saw that pytorch does not seem to provide a way to perform
any image augmentation on GPU as comment in #45 . In some posts I saw people
not encouraging to do it (https://discuss.pytorch.org/t/preprocess-images-on-gpu/5096) but i really disagree, specially for the cases where several augmentations are applied.
To show this point I provide a gist code showing an example illustrating the
possible speed up gains on a multiplication operation ( brightness augmentation ? )

https://gist.github.com/felipecode/f3531e2d04e846da99053aff16b06028

On the gist, i show a GPU augmentation interface is working as following:

no_aug_trans = transforms.Compose( [transforms.RandomResizedCrop(224), transforms.ToTensor()])  
dataset = datasets.ImageFolder(data_path, transform=no_aug_trans)
multply_gpu = transforms.Compose([ToGPU()] + [Multiply(1.01)] * number_of_multiplications)
for data in data_loader:  
    image, labels = data  
    result = multiply_gpu(image)

Unfortunately the GPU augmentation could not be smoothly interfaced with the dataloader without
sacrificing the multi threading for data reading. However, the speed ups obtained seems promising
The following plot shows up when running the gist code with a TITAN Xp and Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz CPU, note that I remove the loading time when plotting.

The plot shows the time to compute in function of number of multiplications. For this test, on each data point about 500 RGB images of 224x224 are multiplied by a constant.

Of course, there is no clear reason on why should one do 60 multiplications.
However, I implemented an small library where I used imgaug library as reference
and implemented more functions in GPU. For the following augmentation set
used in my project I obtained about 3-4 times speed up.

transforms.Compose( [ToGPU(), Add((-5, 5)), Multiply((0.9, 1.1)), Dropout(0.2),AdditiveGaussianNoise(0.10*255),GaussianBlur(sigma=(0.0, 3.0)),ContrastNormalization((0.5, 1.5))] )

This speed up is even higher if more augmentations are added.

So, how can I improve this API ? How could something like this fit in a pull request ?
How can this be more smoothly merged inside the dataloader , but keeping the
multithreading for data reading ?
I still have to test the training time for the full system, but I don't believe there will
be any overhead since images have to be copyed to GPU anyway.

The text was updated successfully, but these errors were encountered:

fmassa · 2018-05-24T08:53:34Z

I think that if we want to use GPU preprocessing in the data loader we would be restraining our users to use Python 2, which might be a bit too much.

Also, I'm not 100% convinced that in the setup that you showed it would be better to perform operations on the GPU.

The reason why I'm not convinced is that if we perform all the data augmentation on the CPU, then the GPU is free to run (asynchronously!) the network, while the different threads of the data loader will be loading and preprocessing data in the background.
If we have transforms in the GPU, then the data augmentation and the network will be competing for resources, making either the network run slower or the batches with the data not being ready when the network has finished the batch.

Did you have the chance to see if performing the operations on the GPU was actually useful in a training pipeline?

dmenig · 2018-09-10T15:07:10Z

I have the same ideas. In my experiences, I have a multithreading pipeline to train my model, and the training thread (or process) is always waiting for the preprocessing to be over (which includes augmentations). This is especially true on model that have relatively low numbers of computations per image like in video deep learning.

I'm looking into augmentations on gpu. I just found out that opencv python doesn't allow that.

fmassa · 2018-09-11T14:01:49Z

@hyperfraise what kinds of augmentation are you looking for?
Now that we have grid_sample, it's possible to perform rotation/warping/scaling/etc on both the CPU and the GPU in a very efficient manner, and could cover a number of use-cases.

dmenig · 2018-09-11T14:36:10Z

Well of course I use a little bit more but even that would be very appreciable.I wonder though if it would be that much more efficient for say 720p images, or even 224,224,3 like I use (since there is the transfer to take into account, and maybe python would be slower even on gpu than the optimized code of opencv that has C underlying).

Anyways I'm using :
-flipping
-cropping
-padding
-pixel value reassignment (with LUT matrices in cv2.LUT)
-blurring and sharpening
-resizing (not an augmentation though)
-pixel wise differences of multiple images (not as an augmentation either)
-channels permutation
-noise
-inverting noise (= doing x:1-x for a random subsample of the pixel)
-salt and pepper noise
-shape drawing (not really doable easily on pytorch I guess)
-text writing on the images

I'm actually working with video, so doing all of this on cpu is pretty costly.

fmassa · 2018-09-11T14:41:28Z

BTW, have you looked at https://github.com/NVIDIA/nvvl ?

Note that you can combine flipping / cropping / padding / resizing as a single kernel launch using grid_sample, so I'd recommend you having a look at that.

ksnzh · 2019-01-15T08:26:33Z

Like https://github.com/NVIDIA/DALI ？

JC-S · 2019-04-20T14:31:49Z

I think that if we want to use GPU preprocessing in the data loader we would be restraining our users to use Python 2, which might be a bit too much.

Also, I'm not 100% convinced that in the setup that you showed it would be better to perform operations on the GPU.

The reason why I'm not convinced is that if we perform all the data augmentation on the CPU, then the GPU is free to run (asynchronously!) the network, while the different threads of the data loader will be loading and preprocessing data in the background.
If we have transforms in the GPU, then the data augmentation and the network will be competing for resources, making either the network run slower or the batches with the data not being ready when the network has finished the batch.

Did you have the chance to see if performing the operations on the GPU was actually useful in a training pipeline?

I'm very curious about how python 2 could help preprocessing on GPU. Could you elaborate on that please?

stmax82 · 2019-10-25T10:47:21Z

@fmassa

The reason why I'm not convinced is that if we perform all the data augmentation on the CPU, then the GPU is free to run (asynchronously!) the network, while the different threads of the data loader will be loading and preprocessing data in the background.

Here is me trying to make my computer tell cats from dogs using a simple CNN and image augmentation using PIL transfomers:

If I use image augmentation CPU usage is almost 100%, while GPU is bored..
If I remove the image augmentation, GPU usage goes up to almost 100%.

Also it makes almost no difference if I train the net on the GPU or CPU because augmentation takes way longer anyway...

dmenig · 2019-10-25T10:57:17Z

OpenCV 4 provides the possibility to do augmentations on the GPU in python, I was told. I haven't tested it.

rohit-gupta · 2019-11-13T21:42:42Z

@stmax82 Maybe you could look into Nvidia's Dali framework

as discussed in the WG meeting this morning, we want to fix this typo, so merging.

fmassa added awaiting response needs discussion labels May 24, 2018

rajveerb pushed a commit to rajveerb/vision that referenced this issue Nov 30, 2023

[BERT] fix typos (pytorch#483)

9947bdf

as discussed in the WG meeting this morning, we want to fix this typo, so merging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Augmentations on GPU Tests #483

Image Augmentations on GPU Tests #483

felipecode commented May 1, 2018

fmassa commented May 24, 2018

dmenig commented Sep 10, 2018

fmassa commented Sep 11, 2018

dmenig commented Sep 11, 2018 •

edited

Loading

fmassa commented Sep 11, 2018

ksnzh commented Jan 15, 2019 •

edited

Loading

JC-S commented Apr 20, 2019

stmax82 commented Oct 25, 2019 •

edited

Loading

dmenig commented Oct 25, 2019

rohit-gupta commented Nov 13, 2019

Image Augmentations on GPU Tests #483

Image Augmentations on GPU Tests #483

Comments

felipecode commented May 1, 2018

fmassa commented May 24, 2018

dmenig commented Sep 10, 2018

fmassa commented Sep 11, 2018

dmenig commented Sep 11, 2018 • edited Loading

fmassa commented Sep 11, 2018

ksnzh commented Jan 15, 2019 • edited Loading

JC-S commented Apr 20, 2019

stmax82 commented Oct 25, 2019 • edited Loading

dmenig commented Oct 25, 2019

rohit-gupta commented Nov 13, 2019

dmenig commented Sep 11, 2018 •

edited

Loading

ksnzh commented Jan 15, 2019 •

edited

Loading

stmax82 commented Oct 25, 2019 •

edited

Loading