You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The BERT task is currently being added to MLBench on this branch. Pre-processing works, and all pre-processed data is already on a bucket. However, the pre-training requires scaling the data by 10x, resulting in almost 370GB of data. This amount of data cannot be downloaded by each worker, as it would require huge disk sizes.
One way of going around this, would be to mount the bucket containing all preprocessed shards, and download them on demand
The text was updated successfully, but these errors were encountered:
The BERT task is currently being added to MLBench on this branch. Pre-processing works, and all pre-processed data is already on a bucket. However, the pre-training requires scaling the data by 10x, resulting in almost 370GB of data. This amount of data cannot be downloaded by each worker, as it would require huge disk sizes.
One way of going around this, would be to mount the bucket containing all preprocessed shards, and download them on demand
The text was updated successfully, but these errors were encountered: