Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two or more crops for a single image #77

Closed
vturrisi opened this issue Jan 21, 2022 · 31 comments
Closed

Two or more crops for a single image #77

vturrisi opened this issue Jan 21, 2022 · 31 comments
Labels
enhancement New feature or request

Comments

@vturrisi
Copy link

Hey! Thank you for the great work.

Is it possible to apply the same pipeline multiple times to the same image? From what I checked, this is currently not possible as it seems that the images are loaded and cropped within a single operation. Is there any way to go around this, loading the image only once and applying the augmentations pipelines multiple times?

@GuillaumeLeclerc
Copy link
Collaborator

Hello,

Can you clarify exactly what you are trying to do ? I'm sure we can find a way to work it out.

The reason they are currently fused is because we are working with batches so all images have to have the same size as they exit the first operation of the pipeline

@vturrisi
Copy link
Author

For self-supervised learning you usually need 2 or more crops of the same image. With torchvision this is pretty easy as you can do something like wrapping the transformations with a custom class that calls the underlying transformations multiple times. For example

class TwoCropsTransform:
    def __init__(self, base_transform):
        self.base_transform = base_transform
    def __call__(self, x):
        q = self.base_transform(x)
        k = self.base_transform(x)
        return [q, k]

where base_transform is something like this

augmentation = transforms.Compose(
    [
        transforms.RandomResizedCrop(224, scale=(0.2, 1.0)),
        transforms.RandomApply(
            [transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8  # not strengthened
        ),
        transforms.RandomGrayscale(p=0.2),
        transforms.RandomApply([moco.loader.GaussianBlur([0.1, 2.0])], p=0.5),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize,
    ]
)

Can the same be accomplished with ffcv?

@GuillaumeLeclerc
Copy link
Collaborator

GuillaumeLeclerc commented Jan 21, 2022 via email

@vturrisi
Copy link
Author

Exactly, I think so.

@GuillaumeLeclerc
Copy link
Collaborator

GuillaumeLeclerc commented Jan 21, 2022 via email

@vturrisi
Copy link
Author

I see, but I still think this wouldn't solve it since the random parameters for the other augmentations are generated twice (one for each crop). If there's anything that I could help let me know. Looking forward to using ffcv :)

@andrewilyas
Copy link
Contributor

Hi @vturrisi ! One thing that might work for now is to just create two Loader objects, point them at the same dataset file, initialize them with the same seed (the seed is just for batch ordering so it won't mess with the augmentations), and then zip them together---I can make a minimal example if it helps!

@vturrisi
Copy link
Author

vturrisi commented Jan 21, 2022

Hey @andrewilyas, yeah, what you suggested will for sure work. However, wouldn't this be too slow since the images are loaded once for each crop?

EDIT: I'll try it tomorrow and see how it scales with the number of crops just to be sure.

@andrewilyas
Copy link
Contributor

Yeah this will be a little slower but I think it could still be fast enough to saturate most GPUs, especially if it's a simple 2-crop algorithm. Data on how it scales with # crops would be very much appreciated!! We'll keep you updated as we figure out how to support feeding the same field through multiple pipelines!

@vturrisi
Copy link
Author

There will also be an impact on ram usage right? I'll get to you tomorrow with some data regarding how it scales in time and memory.

@GuillaumeLeclerc
Copy link
Collaborator

GuillaumeLeclerc commented Jan 22, 2022

Definitely no impact with os caching

@GuillaumeLeclerc
Copy link
Collaborator

I'm going to close this issue in favor of #82 which is about implementing the actual clean solution to what is faced by this issue. If you want to discuss the temporary solution we can reopen this one

@davidrzs
Copy link

Had the same issue this afternoon; this is how I solved it:
(thought I would post it as others seem to need something similar so they can come and build from this)

Make CIFAR10 return the same image twice:

class DoubleCifar10Dataset(Dataset):
    def __init__(self, root='data', train=True, download=True):
        self.cifar10_dataset = torchvision.datasets.CIFAR10(root=root, train=train, download=download)

    def __len__(self):
        return len(self.cifar10_dataset)

    def __getitem__(self, idx):
        return self.cifar10_dataset[idx][0], self.cifar10_dataset[idx][0]

Create the dataset

train_data = utils.DoubleCifar10Dataset(root='data', train=True, download=True)

train_writer = DatasetWriter(f'tmp/cifar10_train_data.beton', {
    'image1': RGBImageField(),
    'image2': RGBImageField(),
})
train_writer.from_indexed_dataset(train_data)

Create train loader:

train_loader = Loader(f'./tmp/cifar10_train_data.beton',
                    batch_size=batch_size,
                    num_workers=10,
                    order=OrderOption.RANDOM,
                    os_cache=True,
                    drop_last=False,
                    pipelines={
                        'image1': utils.Cifar10TransformFFCV.train_pipeline(gpu),
                        'image2': utils.Cifar10TransformFFCV.train_pipeline(gpu)
                    },
                    distributed=False)

Now you get two augmentations of the same image!

@vturrisi
Copy link
Author

@davidrzs thanks for sharing. I'm going the route of creating multiple loaders and wrapping them. I'll share once it's working 100%.

@saarthak-kapse
Copy link

Hello,

I am trying to have 4 images out of pipeline. I saved them like :-

writer = DatasetWriter(write_path, { # Roughly 25% of the images will be stored in raw and the other in jpeg 'image': RGBImageField( write_mode='jpg', # Randomly compress jpeg_quality=60 # Use 50% quality when compressing an image using JPG ), 'image_high': RGBImageField( write_mode='jpg', # Randomly compress jpeg_quality=60 # Use 50% quality when compressing an image using JPG ), 'image_teacher': RGBImageField( write_mode='jpg', # Randomly compress jpeg_quality=60 # Use 50% quality when compressing an image using JPG ), 'image_high_teacher': RGBImageField( write_mode='jpg', # Randomly compress jpeg_quality=60 # Use 50% quality when compressing an image using JPG ) }, num_workers=2 )

Then I am reading that like this :-

` image_pipeline: List[Operation] = [
# ffcv.fields.decoders.RandomResizedCropRGBImageDecoder((input_image_size, input_image_size)),
ffcv.fields.decoders.SimpleRGBImageDecoder(),
# ffcv.transforms.RandomHorizontalFlip(),
ffcv.transforms.ToTensor(),
ffcv.transforms.ToDevice('cuda:'+str(args.gpu), non_blocking=True),
ffcv.transforms.ToTorchImage(),
ffcv.transforms.Convert(torch.float16),
# torchvision.transforms.RandomApply([color_jitter], p=0.8),
# torchvision.transforms.RandomGrayscale(p=0.2),
# torchvision.transforms.RandomApply([Gaussian_Blur], p=0.5)
# torchvision.transforms.Normalize(0.2, 0.2), # Normalize using image statistics
]

    image_pipeline_high: List[Operation] = [
        # ffcv.fields.decoders.RandomResizedCropRGBImageDecoder((input_image_size*args.magnification, input_image_size*args.magnification)),
        ffcv.fields.decoders.SimpleRGBImageDecoder(),
        # ffcv.transforms.RandomHorizontalFlip(),
        ffcv.transforms.ToTensor(),
        ffcv.transforms.ToDevice('cuda:'+str(args.gpu), non_blocking=True),
        ffcv.transforms.ToTorchImage(),
        ffcv.transforms.Convert(torch.float16),
        # torchvision.transforms.RandomApply([color_jitter], p=0.8),
        # torchvision.transforms.RandomGrayscale(p=0.2),
        # torchvision.transforms.RandomApply([Gaussian_Blur], p=0.5)
        # torchvision.transforms.Normalize(0.2, 0.2), # Normalize using image statistics
    ]
    
    data_loader = Loader(args.data_path,
                    batch_size = args.batch_size_per_gpu,
                    num_workers = args.num_workers,
                    os_cache = True,
                    distributed = True,
                    batches_ahead = 1,
                    order = OrderOption.RANDOM,
                    pipelines = {
              'image': image_pipeline,
              'image_high': image_pipeline_high,
              'image_teacher': image_pipeline,
              'image_high_teacher': image_pipeline_high
                    })`

But the output is coming as :-

cuda:0 tensor(174., device='cuda:0', dtype=torch.float16) torch.Size([80, 3, 224, 224])
cuda:0 tensor(209., device='cuda:0', dtype=torch.float16) torch.Size([80, 3, 896, 896])
cuda:0 tensor(174., device='cuda:0', dtype=torch.float16) torch.Size([80, 3, 224, 224])
cpu tensor(209, dtype=torch.uint8) torch.Size([80, 896, 896, 3])

It seems the pipeline is not able to go beyond 3rd image for augmentation. Can you please let me know how to fix this? It is a bit urgent as I need to get my experiments running.

Thanks a lot!

@GuillaumeLeclerc
Copy link
Collaborator

@saarthak02 Can you create new issue ?

@jyhong836
Copy link

Hi @davidrzs
I tried your solution in contrastive learning with the below transforms.

ffcv_trns.RandomResizedCrop(scale=(0.08, 1.0), ratio=np.array((3. / 4., 4. / 3.)), size=size),
ffcv_trns.RandomHorizontalFlip(),
ffcv_trns.ToTensor(),
ffcv_trns.Convert(torch.float32),
ffcv_trns.ToTorchImage(),
ffcv_trns.ToDevice(device, non_blocking=True),

But I found that the data loading is much slower than using a single image with one pipeline. Even worse. The loading is much slower than using PyTorch standard loader with more complicated transforms:

transforms.RandomResizedCrop(size=size),
transforms.RandomHorizontalFlip(),
transforms.RandomApply([color_jitter], p=0.8),
transforms.RandomGrayscale(p=0.2),
GaussianBlur(kernel_size=int(0.1 * size)),
transforms.ToTensor()

Do you have the same issue?

@GuillaumeLeclerc
Copy link
Collaborator

GuillaumeLeclerc commented Feb 3, 2022

Yeah it's completely expected. You are using pytorch augmentations. They are written in python, which means a single one can run at a given time per process.

Pytorch spawns sub-processes which means they can run them in parallel, but it's also very inefficient because processes can't share memory and have to communicate with IPC instead of shared

FFCV supports Pytorch augmentations just to experiment and see if an augmentation actually helps your model. You should never use them in situations where you care about performance. If you actually want speed you have to get rid of them. If you look at our examples we never use any Pytorch augmentation on the CPU.

Two solutions:

  • Implement your own augmentations using the FFCV API. It's really easy if you know numpy
  • Move your augmentations on the GPU. Because GPU code runs in the background you will be able to run multiple in parallel so you should have reasonable performance (not as good as FFCV augmentations because you will steal GPU compute power from your neural network)

@GuillaumeLeclerc
Copy link
Collaborator

(I created an issue #127 to issue a warning in this situation so that users don't expect good performance in this case)

@jyhong836
Copy link

Thanks for your reply.
When I use only FFCV augmentations, the two pipeline implementation with FFCV loading is still slower (though comparable) than the PyTorch loading with two-view transforms.
I suppose the problem is because loading two images is slower than loading one image (even by PyTorch).

train_loader = Loader(ffcv_cache_path,
                      batch_size=train_batch_size,
                      num_workers=num_workers,
                      order=OrderOption.RANDOM,
                      drop_last=True,
                      pipelines={'image0': image_pipeline,
                                 'image1': image_pipeline1,
                                 'attribute': label_pipeline})

Yeah it's completely expected. You are using pytorch augmentations. They are written in python, which means a single one can run at a given time per process.

Pytorch spawns sub-processes which means they can run them in parallel, but it's also very inefficient because processes can't share memory and have to communicate with IPC instead of shared

FFCV supports Pytorch augmentations just to experiment and see if an augmentation actually helps your model. You should never use them in situations where you care about performance. If you actually want speed you have to get rid of them. If you look at our examples we never use any Pytorch augmentation on the CPU.

Two solutions:

  • Implement your own augmentations using the FFCV API. It's really easy if you know numpy
  • Move your augmentations on the GPU. Because GPU code runs in the background you will be able to run multiple in parallel so you should have reasonable performance (not as good as FFCV augmentations because you will steal GPU compute power from your neural network)

@GuillaumeLeclerc
Copy link
Collaborator

GuillaumeLeclerc commented Feb 4, 2022

Can I see your image_pipeline, image_pipeline1, and label_pipeline.

How is your CPU usage ?

How are you storing your two images in your dataset ? Are you using RAW or JPEG ? How big are your images ?

(I haven't yet sen a situation where FFCV is not at least 10x faster than Pytorch, so I still strongly suspect there is a misconfiguration somewhere in your code)

@GuillaumeLeclerc
Copy link
Collaborator

(also @andrewilyas was suggesting zip any reason why you are not using this instead of saving your images twice ?)

@jyhong836
Copy link

zip is a good idea. I will try it and come back to you later.

Here is my imagepipeline (the same for imagepipeline1),

SimpleRGBImageDecoder()
ffcv_trns.RandomResizedCrop(scale=(0.08, 1.0), ratio=np.array((3. / 4., 4. / 3.)),
                            size=size),
ffcv_trns.RandomHorizontalFlip(),
ffcv_trns.ToTensor(),
ffcv_trns.Convert(torch.float32),
ffcv_trns.ToTorchImage(),
ffcv_trns.ToDevice(device, non_blocking=True),

and label_pipeline: List[Operation] = [NDArrayDecoder(), trns.ToTensor(), Squeeze()].
My CPU use
image

Other information. It takes me 2min 2secs to finish one epoch with FFCV (as I described above). In comparison, it takes me 2min 4secs to finish one epoch without FFCV (I additionally add more PyTorch transforms). So the time complexity is barely different.

For comparison, the pure PyTorch transforms w/o FFCV is

transforms.RandomResizedCrop(size=size),
transforms.RandomHorizontalFlip(),
transforms.RandomApply([color_jitter], p=0.8),
transforms.RandomGrayscale(p=0.2),
GaussianBlur(kernel_size=int(0.1 * size)),
transforms.ToTensor()

@GuillaumeLeclerc
Copy link
Collaborator

How are you storing your two images in your dataset ? Are you using RAW or JPEG ? How big are your images ?

Also:
I see a lot of red on your htop (kernel time) so it seems that you have a lot of overhead (probably scheduling overhead). How many workers are you using?

@GuillaumeLeclerc
Copy link
Collaborator

Also you seem to be CPU bottleneck. Why not having the Convert after the ToDevice instead of doing it on the CPU (and also transmitting more data over the PCI bus) ?

@jyhong836
Copy link

I store the images by using the default fields. The images are from CelebA dataset of size (178, 218), stored in 'raw' format. attribute is a 40-dimensional vector per sample.

writer = DatasetWriter(ffcv_cache_path, {
            'image0': RGBImageField(),
            'image1': RGBImageField(),
            'attribute': NDArrayField(full_train_set.attr.dtype, (full_train_set.attr.shape[1],)),
        })

I used 8 workers.

Following your advice, here is some methods that works

  • Move images to GPU as early as possible. Improve by 10 seconds.

Some methods that do not work

  • Use jpeg instead of raw on writing images.
  • zip two duplicated loaders which load from the same file.

@GuillaumeLeclerc
Copy link
Collaborator

I you have an hyper-threaded processor. I would recommend using as many workers as you have physical cores on your processor (usually what htop shows / 2). And divide by 2 again when you use zip since you have two loaders.

I don't think the attribute is causing any slowdown. I think the main bottleneck right now is image resizing and right now with your pipeline you are doing it twice. I'm currently working hard on the next release that will make the pipelines much more flexible this is how it will look like in your case:

decoder =   ffcv_trns.RandomResizedCrop(scale=(0.08, 1.0), ratio=np.array((3. / 4., 4. / 3.)), size=size)

pipeline_1 = [
  ffcv_trns.RandomHorizontalFlip(),
  ffcv_trns.ToTensor(),
  ffcv_trns.Convert(torch.float32),
  ffcv_trns.ToTorchImage(),
  ffcv_trns.ToDevice(device, non_blocking=True),
]

pipeline_2 = [
  ffcv_trns.RandomHorizontalFlip(),
  ffcv_trns.ToTensor(),
  ffcv_trns.Convert(torch.float32),
  ffcv_trns.ToTorchImage(),
  ffcv_trns.ToDevice(device, non_blocking=True),
]

train_loader = Loader(ffcv_cache_path,
                      batch_size=train_batch_size,
                      num_workers=num_workers,
                      order=OrderOption.RANDOM,
                      drop_last=True,
                      pipelines={'image': [decoder],
                                       'image1': PipelineSpec(source=decoder, transforms=pipeline_1),
                                       'image2': PipelineSpec(source=decoder, transforms=pipeline_1),
                                       'attribute': label_pipeline})

And you would have a single image field in your FFCV dataset

@davidrzs
Copy link

davidrzs commented Feb 5, 2022

Hi @davidrzs I tried your solution in contrastive learning with the below transforms.

ffcv_trns.RandomResizedCrop(scale=(0.08, 1.0), ratio=np.array((3. / 4., 4. / 3.)), size=size),
ffcv_trns.RandomHorizontalFlip(),
ffcv_trns.ToTensor(),
ffcv_trns.Convert(torch.float32),
ffcv_trns.ToTorchImage(),
ffcv_trns.ToDevice(device, non_blocking=True),

But I found that the data loading is much slower than using a single image with one pipeline. Even worse. The loading is much slower than using PyTorch standard loader with more complicated transforms:

transforms.RandomResizedCrop(size=size),
transforms.RandomHorizontalFlip(),
transforms.RandomApply([color_jitter], p=0.8),
transforms.RandomGrayscale(p=0.2),
GaussianBlur(kernel_size=int(0.1 * size)),
transforms.ToTensor()

Do you have the same issue?

Yeah, I had to do this trick to get performance parity with the old pipeline #64 (comment) . Though it is hacky! I have reverted to the old standard PyTorch pipeline as the speed was already decent enough, so I cannot give you more information.

@EvgeniaAR
Copy link

EvgeniaAR commented Mar 28, 2022

Hi there, I would also like to use ffcv dataloaders with SimCLR training. @GuillaumeLeclerc what is the current state of #82? I saw others commenting on going back to standard Pytorch dataloading due to speed issues. Have these been resolved, i.e. is SimCLR training with ffcv dataloading viable?

@arnaghosh
Copy link

Thanks for the suggestions in the thread. I was trying out another approach to generate two different augmentations of a single image in order to do Self-supervised learning, e.g. SimCLR. Although I used the PyTorch transforms in the image pipeline, I was able to see speedup using ffcv comapred to using the standard PyTorch dataloader. However, the final model accuracy was inferior compared to the one trained using the standard PyTorch dataloader.

Additional details: The PyTorch dataloader uses the PyTorch files to load the images, as opposed to the ffcv dataloader that uses the beton files. Both pipelines use the same transforms (a rescaling was added in the ffcv pipeline because the SimpleRGBImageDecoder followed by ToTensor returns a torch tensor with values between 0 to 255).

Has anyone else faced this issue?

@realliyifei
Copy link

@GuillaumeLeclerc it seems that the PipelineSpec is still not already yet? Is there any beta version supporting it? Given the fact that all the workarounds mentioned here have different limitations, it would be sweet to do two crops by just one image via PipelineSpec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants