-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DALI support #608
Comments
Hi, |
Thank you. These days I found image preprocessing parts are the bottlenecks. I'll try DALI by myself and report how it will make the processing fast. |
albumentations is also a contender for faster image augmentation. In my experience IO is actually worse than a "slow pre-processing" library. SSDs and NVMes(!) help a lot. |
Hi @datumbox it's been a while since this PR had any discussions, I'm curious if there are any plans to make this happen? |
@msaroufim we are currently working to improve the Data loading process using PyTorch Data. We do not have immediate plans for integrating DALI directly at the moment but we can review this on the future. As we have very little resources, I think it's more realistic that such an investigation can happen after the release of the new Datasets API. ccing @NicolasHug and @pmeier who lead the work on datasets. |
Oh interesting so the way you'd integrate new backends in the future is to integrate them within |
Not sure what you mean by "backends" here. In general you are right though.
There is no public document yet. However, we already have quite a large collection of datasets ported to the new structure. You can access them with from torchvision.prototype import datasets
dataset = datasets.load("voc") The from torchvision.prototype import transforms
transform = transforms.Compose(
transforms.DecodeImage(),
transforms.Resize(256),
transforms.CenterCrop(256),
)
for sample in dataset.map(transform):
... For everything else, please also have a look at the |
@pmeier to clarify by backend I mean one of these https://github.com/pytorch/vision#image-backend - i.e: pillow, accimage, pillow simd etc.. Overall the new interface for adding datasets looks good but I'm more curious about adding new backends like DALI. In particular DALI has some accelerated image processing kernels, accelerated image decoding which I think would be very useful to integrate in vision directly, feels too domain specific to be in torch.data IMHO and is similar enough to other backends like accimage to be in vision. What's the process like for adding a new backend? If it's similar to the one for accimage https://github.com/pytorch/vision/blob/main/torchvision/transforms/functional.py#L13 I can make a PR for this The other option is to integrate the DALI data loader as a data pipe in torch.data Here's a good primer on DALI and its value proposition https://cceyda.github.io/blog/dali/cv/image_processing/2020/11/10/nvidia_dali.html @VitalyFedyunin @wenleix please chime in on where you think the most natural place for a DALI integration is |
Thanks @msaroufim, I had the same feeling about making it as a separate DataPipe because it requires different behavior compared with |
Seems like there's a good workaround too NVIDIA/DALI#3081 (comment) - I'll take a more thorough look |
The new datasets will return a vision/torchvision/prototype/transforms/_type_conversion.py Lines 11 to 17 in a8f563d
but you can use arbitrary backends there. |
Similar issue on torchdata repo - pytorch/data#761 |
Hi, any plan to integrate DALI (https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/index.html) to
torchvision
for faster preprocessing? I foundchainer
tries to integrate it (chainer/chainer#5067).The text was updated successfully, but these errors were encountered: