DDP is incompatible with large datasets #1970

sabetAI · 2020-05-27T14:16:01Z

I'm trying to stream a large data file instead of loading it so it doesn't have to be pickled for multi-processing. However, open-file objects give a TypeError: cannot serialize '_io.TextIOWrapper' object error, so I have to open it within a subprocess instead-- but train_dataloader and val_dataloader methods get called in the main process of pytorch-lightning! How can I bypass issue without changing the source code?

The text was updated successfully, but these errors were encountered:

sabetAI added the help wanted Open to be worked on label May 27, 2020

sabetAI changed the title ~~pytorch-lightning multi-processing is not compatible with large datasets~~ pytorch-lightning multi-processing is incompatible with large datasets May 27, 2020

sabetAI changed the title ~~pytorch-lightning multi-processing is incompatible with large datasets~~ DDP is incompatible with large datasets May 27, 2020

williamFalcon mentioned this issue May 31, 2020

Replaces ddp .spawn with subprocess #2029

Merged

williamFalcon closed this as completed in #2029 Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP is incompatible with large datasets #1970

DDP is incompatible with large datasets #1970

sabetAI commented May 27, 2020 •

edited

Loading

DDP is incompatible with large datasets #1970

DDP is incompatible with large datasets #1970

Comments

sabetAI commented May 27, 2020 • edited Loading

sabetAI commented May 27, 2020 •

edited

Loading