-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🍱 Extra data and pre-batch shuffle on train datapipe #14
Conversation
More sample imagery datasets for training, added in https://huggingface.co/datasets/chabud-team/chabud-extra/commit/7da36fcb240ef39beed1f877acc837b98746f35b.
Randomizing the order of the chips before creating mini-batches, because the train_eval.hdf5 contains all the non-zero labels while california_*.hdf5 contain all zero labels. The shuffling causes a roughly 2x slowdown from 1it/s to 2it/s. Also cherry-picked a9b3b95 to have a buffer_size of -1 in the demux DataPipe.
self.datapipe_train = ( | ||
dp_train.map(fn=_pre_post_mask_tuple) | ||
dp_train.shuffle(buffer_size=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default buffer size of 10000 was too slow (waited for minutes but the model never starts training). @srmsoumya, could you try a few other variations of this buffer_size
and see how performant the model is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I was facing some error with buffers & set it to -1 in my experiment, I will look at other options as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, feel free to merge.
self.datapipe_train = ( | ||
dp_train.map(fn=_pre_post_mask_tuple) | ||
dp_train.shuffle(buffer_size=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I was facing some error with buffers & set it to -1 in my experiment, I will look at other options as well
chabud/datapipe.py
Outdated
"https://huggingface.co/datasets/chabud-team/chabud-extra/resolve/main/california_0.hdf5", | ||
"https://huggingface.co/datasets/chabud-team/chabud-extra/resolve/main/california_1.hdf5", | ||
"https://huggingface.co/datasets/chabud-team/chabud-extra/resolve/main/california_2.hdf5", | ||
"https://huggingface.co/datasets/chabud-team/chabud-extra/resolve/main/california_3.hdf5", | ||
"https://huggingface.co/datasets/chabud-team/chabud-extra/resolve/main/california_4.hdf5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 we can ignore the california_*.hdf5
files for now, as the dataset is currently imbalanced. We can add them back once we implement the mixup & cutmix augmentations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out the extra california_*.hdf5 data for now.
What I am changing
How I did it
.shuffle
instead of.in_batch_shuffle
How you can test it
python trainer.py fit --trainer.max_epochs=30 --data.batch_size=6
locally.Related Issues
Note that the shuffling operation is slower than in-batch shuffling. There is a longer delay at the start as the image chips are added to the shuffle buffer, and each mini-batch is now processing about 2x slower (one iteration used to take ~1s, now it takes ~2s).