You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to ask how one can split a dataset to train/val splits. In the tinystories.py I don't quite understand the comment:
train/test split. let's use only shard 0 for test split, rest train
So how many tokens from train data are selected to be validation split?
It seems that @karpathy uses 10shards and if only 0 shard is used as a test split then it means that 1/10 of the data is used as a test set?
e.g. if I have dataset with 10B tokens then 1B tokens are used for test/val set?
The text was updated successfully, but these errors were encountered:
Hi,
I want to ask how one can split a dataset to train/val splits. In the tinystories.py I don't quite understand the comment:
So how many tokens from train data are selected to be validation split?
It seems that @karpathy uses 10shards and if only 0 shard is used as a test split then it means that 1/10 of the data is used as a test set?
e.g. if I have dataset with 10B tokens then 1B tokens are used for test/val set?
The text was updated successfully, but these errors were encountered: