BERT Work-in-progress [edoardo] #336

ehoelzl · 2021-02-26T14:50:44Z

The BERT task is currently being added to MLBench on this branch. Pre-processing works, and all pre-processed data is already on a bucket. However, the pre-training requires scaling the data by 10x, resulting in almost 370GB of data. This amount of data cannot be downloaded by each worker, as it would require huge disk sizes.

One way of going around this, would be to mount the bucket containing all preprocessed shards, and download them on demand

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT Work-in-progress [edoardo] #336

BERT Work-in-progress [edoardo] #336

ehoelzl commented Feb 26, 2021

BERT Work-in-progress [edoardo] #336

BERT Work-in-progress [edoardo] #336

Comments

ehoelzl commented Feb 26, 2021