Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stochastic Gradient Descent (SGD) and SGD-like methods (e.g., Adam) are commonly used in PyTorch to train ML models. However, these methods rely on random data order to converge, which usually require a full data shuffle, leading to low I/O performance for disk-based storage.
We proposed a simple but novel two-level data shuffling strategy named CorgiPile (https://link.springer.com/article/10.1007/s00778-024-00845-0), which can avoid a full data shuffle while maintaining comparable convergence rate as if a full shuffle were performed. CorgiPile first samples and shuffles data at the block-level, and then shuffles data at the tuple-level within the sampled data blocks, i.e., firstly shuffling data blocks, and then merging sampled blocks in a small buffer, and finally shuffling tuples in the buffer for SGD. We have implemented CorgiPile inside PyTorch (https://github.com/DS3Lab/CorgiPile-PyTorch), and extensive experimental results show that our CorgiPile can achieve comparable convergence rate with the full-shuffle based SGD, and faster than PyTorch with full data shuffle.