Skip to content

Dataloader does not stop if one of the zipped DataPipes has a perpetual cycle #865

@magehrig

Description

@magehrig

🐛 Describe the bug

The Zipper is supposed to stop as soon as the shortest input DataPipe is exhausted. This is not the case if one of the DataPipes is constructed with a perpetual cycle.

Minimal Example:

from torch.utils.data import DataLoader
from torchdata.datapipes.iter import IterableWrapper, IterDataPipe


class TestDataset(IterDataPipe):
    def __init__(self):
        self.data = IterableWrapper(list(range(10)))

    def __iter__(self):
        inf_cycle = IterableWrapper([42]).cycle(count=None)
        datapipe = self.data.zip(inf_cycle)
        return iter(datapipe)


if __name__ == '__main__':
    dataset = TestDataset()
    loader = DataLoader(dataset=dataset, batch_size=None, num_workers=0)

    for batch in loader:
        print(batch)

Prints the following but does not exit the program:

[0, 42]
[1, 42]
[2, 42]
[3, 42]
[4, 42]
[5, 42]
[6, 42]
[7, 42]
[8, 42]
[9, 42]

Versions

PyTorch version: 1.14.0.dev20221026+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
Clang version: 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
CMake version: version 3.20.0
Libc version: glibc-2.27

Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-194-generic-x86_64-with-glibc2.27
Is CUDA available: False
CUDA runtime version: 11.1.105
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration:
GPU 0: Quadro RTX 8000
GPU 1: Quadro RTX 8000
GPU 2: Quadro RTX 8000
GPU 3: Quadro RTX 8000
GPU 4: Quadro RTX 8000
GPU 5: Quadro RTX 8000
GPU 6: Quadro RTX 8000
GPU 7: Quadro RTX 8000

Nvidia driver version: 515.65.01
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.14.0.dev20221026+cpu
[pip3] torchdata==0.6.0.dev20221026
[pip3] torchvision==0.13.1
[conda] blas                      2.116                       mkl    conda-forge
[conda] blas-devel                3.9.0            16_linux64_mkl    conda-forge
[conda] cudatoolkit               10.2.89             h713d32c_10    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libblas                   3.9.0            16_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            16_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            16_linux64_mkl    conda-forge
[conda] liblapacke                3.9.0            16_linux64_mkl    conda-forge
[conda] magma                     2.5.4                h5da55e3_2    conda-forge
[conda] mkl                       2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-devel                 2022.1.0           ha770c72_916    conda-forge
[conda] mkl-include               2022.1.0           h84fe81f_915    conda-forge
[conda] numpy                     1.22.4           py39hc58783e_0    conda-forge
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.14.0.dev20221026+cpu          pypi_0    pypi
[conda] torchdata                 0.6.0.dev20221026          pypi_0    pypi
[conda] torchvision               0.13.1               py39_cu102    pytorch

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions