Using ffcv for a "pairwise" dataset in stereo vision #53

jatentaki · 2022-01-19T13:33:46Z

jatentaki
Jan 19, 2022

Thanks for your work! I am working on a project where data loading seems to be a substantial bottleneck and ffcv could be of great help but I'm not sure if it's easy to apply to my use case.

I'm learning stereo vision, and the setup is that each image in a scene (say Notre Dame in Paris) is represented as a struct:

bitmap (the image itself)
camera rotation R, translation t and intrinsic parameters K
(optionally) depth map

This is what I store on the disc, but the actual examples for the network consist of pairs of such images, subject to some pre-computed criterion if their field of view overlaps sufficiently. With pytorch dataloaders it's quite easy: I define an ImageDataset which fetches the Image struct from the disc and a PairDataset which wraps the former and joins the individual pairs into training examples according to a precomputed list of valid pair indices. If I understand the design of FFCV correctly, it would be easy to precompute and store all the training examples (image pairs), but that's a quadratic increase over storing just the images themselves and joining them in pairs on the fly. Do you think FFCV can be used/extended to support such a usecase?

Answered by andrewilyas

Jan 19, 2022

Ok @jatentaki ! Let me know if this is what you had in mind. The only key principle below is that instead of making a single loader, just make two loaders and for the second pass indices to be a permutation of np.arange(len(dataset)) that properly lines up the pairs. Zipping the two loaders together should then give you the same functionality as your PairedDataset:

import numpy as np
from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import NDArrayDecoder
from tempfile import NamedTemporaryFile

# Really simple dataset of 100 examples, each of the form [i, i+1, ... i+9]
class SimpleDataset:
    def

View full answer

andrewilyas · 2022-01-19T19:03:45Z

andrewilyas
Jan 19, 2022
Maintainer

Hi! There should actually be an extremely easy way to accomplish this in FFCV! I'm preparing an example right now and will post it shortly :)

0 replies

andrewilyas · 2022-01-19T19:44:58Z

andrewilyas
Jan 19, 2022
Maintainer

Ok @jatentaki ! Let me know if this is what you had in mind. The only key principle below is that instead of making a single loader, just make two loaders and for the second pass indices to be a permutation of np.arange(len(dataset)) that properly lines up the pairs. Zipping the two loaders together should then give you the same functionality as your PairedDataset:

import numpy as np
from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import NDArrayDecoder
from tempfile import NamedTemporaryFile

# Really simple dataset of 100 examples, each of the form [i, i+1, ... i+9]
class SimpleDataset:
    def __getitem__(self, idx):
        np.random.seed(idx)
        return (np.arange(idx*10, idx*10 + 10).astype('int'),)

    def __len__(self):
        return 100

dataset = SimpleDataset()

# Make a permutation
permutation = np.arange(100)
np.random.shuffle(permutation)

# For testing purposes (to make sure the pairing is consistent)
correct_pairing = {i: j for i, j in enumerate(permutation)}

with NamedTemporaryFile() as handle:
    writer = DatasetWriter(handle.name, {
        'data': NDArrayField(np.dtype('int'), (10,))
    }, num_workers=1)

    writer.from_indexed_dataset(dataset)

    # We make two dataloaders: one with the standard ordering, the other we pass
    # indices=permutation so that it loads the permuted order. Note that this
    # still works even with order=OrderOption.RANDOM (as long as the seed is the same)!
    loader_a = Loader(handle.name,
                batch_size=10,
                num_workers=2,
                seed=0,
                order=OrderOption.RANDOM,
                pipelines={
                    'data': [NDArrayDecoder()]
                })
    loader_b = Loader(handle.name,
                batch_size=10,
                num_workers=2,
                seed=0,
                order=OrderOption.RANDOM,
                pipelines={
                    'data': [NDArrayDecoder()]
                }, indices=permutation)

    for (x,), (y,) in zip(loader_a, loader_b):
        for pt_a, pt_b in zip(x, y):
            assert correct_pairing[pt_a[0].item()//10] == pt_b[0].item()//10

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ffcv for a "pairwise" dataset in stereo vision #53

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using ffcv for a "pairwise" dataset in stereo vision #53

jatentaki Jan 19, 2022

Replies: 2 comments

andrewilyas Jan 19, 2022 Maintainer

andrewilyas Jan 19, 2022 Maintainer

jatentaki
Jan 19, 2022

andrewilyas
Jan 19, 2022
Maintainer

andrewilyas
Jan 19, 2022
Maintainer