Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ShuffleBuffer not returning all patches #7986

Open
CH4LLENG3R opened this issue Aug 2, 2024 · 3 comments
Open

ShuffleBuffer not returning all patches #7986

CH4LLENG3R opened this issue Aug 2, 2024 · 3 comments

Comments

@CH4LLENG3R
Copy link

CH4LLENG3R commented Aug 2, 2024

Describe the bug
While following the tutorial https://github.com/Project-MONAI/tutorials/blob/main/modules/2d_slices_from_3d_training.ipynb and implementing parts of it to my project especially when it comes to transforming Dataset containing 3D to 2D patches I encountered an issue with ShuffleBuffer.

`def create_dataset_2D_ds(ds, keys: list, trans2d: list) -> monai.data.ShuffleBuffer:
# ds = CacheDataset(data=data_dicts, transform=transforms)
patch_func = monai.data.PatchIterd(
keys=keys, patch_size=(None, None, 1), start_pos=(0, 0, 0)
)

patch_transform = Compose([
        SqueezeDimd(keys=keys, dim=-1)
        ] + trans2d)

patch_ds = monai.data.GridPatchDataset( # -> GridPatchDataset contains 531 entries
    data=ds, patch_iter=patch_func, transform=patch_transform, with_coordinates=False
)
sb = monai.data.ShuffleBuffer(patch_ds, buffer_size=200, seed=42, epochs=1) # -> ShuffleBuffer returning only 133 of them
return patch_ds`

The problem is described above in the comments, I know that the GridPatchDataset contains 531 entries, but using ShuffleBuffer in DataLoader will result in 133.

Expected behavior
Being able to Iterate with ShuffleBuffer through entire GridPatchDataset

@DylanHsu
Copy link

I am having the same issue.

@DylanHsu
Copy link

@CH4LLENG3R I believe I discovered the source of the problem, or at least a workaround, in my setup. I was passing the ShuffleBuffer to a DataLoader with num_workers=4 set. Based on your ratio of expected vs. resulting images (about 25%), I am guessing you were also using 4 workers, and they are all fighting for shuffled slices (or something). Using num_workers=1 in the downstream DataLoader gives me the correct number of shuffled images.

@CH4LLENG3R
Copy link
Author

Thank you @DylanHsu, your solution works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants