You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I add text data to the training dataset, the training process always becomes pending in the first step. Conversely, if I remove the text data, the training will proceed normally. What could be one possible cause for this issue?
Different situations as follows:
Single GPU with text data: normal (unsure if training will complete)
Multi-GPU with text data: abnormal
Video-LLaVA codebase with ChatUniVi model: pending at 70% instead of at the first step
Text data only : normal (unsure if training will complete)
This error comes from the deepspeed bug (microsoft/DeepSpeed#2223). In our code, it is very easy to hang because the lengths of the text data vary greatly.
When I add text data to the training dataset, the training process always becomes pending in the first step. Conversely, if I remove the text data, the training will proceed normally. What could be one possible cause for this issue?
Different situations as follows:
pip list
The text was updated successfully, but these errors were encountered: