How to training on huge text dataset #4139
Unanswered
Limtle
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
Can you tell which part of your script is failing? PyTorch datasets do not need to load all of the data at once if implemented properly, perhaps you can share your |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is your question?
I have a ddp program running on 8 nodes, and I need to load a very huge text dataset ( >30 GB) in this task. However, when I loaded it to dataloader, the program stuck. The intuition is to split my dataset into fractions with smaller size, so every time the dataloader only need to load a small fraction of dataset. Does pytorch-lightning have any support for it? Or any suggestion for solving such problem?
What's your environment?
OS: [Linux]
Packaging [pip]
Version [0.10.0]
Beta Was this translation helpful? Give feedback.
All reactions