Using ffcv for a "pairwise" dataset in stereo vision #53
-
Thanks for your work! I am working on a project where data loading seems to be a substantial bottleneck and ffcv could be of great help but I'm not sure if it's easy to apply to my use case. I'm learning stereo vision, and the setup is that each image in a scene (say Notre Dame in Paris) is represented as a struct:
This is what I store on the disc, but the actual examples for the network consist of pairs of such images, subject to some pre-computed criterion if their field of view overlaps sufficiently. With pytorch dataloaders it's quite easy: I define an |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi! There should actually be an extremely easy way to accomplish this in FFCV! I'm preparing an example right now and will post it shortly :) |
Beta Was this translation helpful? Give feedback.
-
Ok @jatentaki ! Let me know if this is what you had in mind. The only key principle below is that instead of making a single loader, just make two loaders and for the second pass import numpy as np
from ffcv.writer import DatasetWriter
from ffcv.fields import NDArrayField
from ffcv.loader import Loader, OrderOption
from ffcv.fields.decoders import NDArrayDecoder
from tempfile import NamedTemporaryFile
# Really simple dataset of 100 examples, each of the form [i, i+1, ... i+9]
class SimpleDataset:
def __getitem__(self, idx):
np.random.seed(idx)
return (np.arange(idx*10, idx*10 + 10).astype('int'),)
def __len__(self):
return 100
dataset = SimpleDataset()
# Make a permutation
permutation = np.arange(100)
np.random.shuffle(permutation)
# For testing purposes (to make sure the pairing is consistent)
correct_pairing = {i: j for i, j in enumerate(permutation)}
with NamedTemporaryFile() as handle:
writer = DatasetWriter(handle.name, {
'data': NDArrayField(np.dtype('int'), (10,))
}, num_workers=1)
writer.from_indexed_dataset(dataset)
# We make two dataloaders: one with the standard ordering, the other we pass
# indices=permutation so that it loads the permuted order. Note that this
# still works even with order=OrderOption.RANDOM (as long as the seed is the same)!
loader_a = Loader(handle.name,
batch_size=10,
num_workers=2,
seed=0,
order=OrderOption.RANDOM,
pipelines={
'data': [NDArrayDecoder()]
})
loader_b = Loader(handle.name,
batch_size=10,
num_workers=2,
seed=0,
order=OrderOption.RANDOM,
pipelines={
'data': [NDArrayDecoder()]
}, indices=permutation)
for (x,), (y,) in zip(loader_a, loader_b):
for pt_a, pt_b in zip(x, y):
assert correct_pairing[pt_a[0].item()//10] == pt_b[0].item()//10 |
Beta Was this translation helpful? Give feedback.
Ok @jatentaki ! Let me know if this is what you had in mind. The only key principle below is that instead of making a single loader, just make two loaders and for the second pass
indices
to be a permutation ofnp.arange(len(dataset))
that properly lines up the pairs. Zipping the two loaders together should then give you the same functionality as your PairedDataset: