Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run new benchmarks and document costs #70

Open
5 of 8 tasks
ehoelzl opened this issue Dec 4, 2020 · 1 comment
Open
5 of 8 tasks

Run new benchmarks and document costs #70

ehoelzl opened this issue Dec 4, 2020 · 1 comment

Comments

@ehoelzl
Copy link
Contributor

ehoelzl commented Dec 4, 2020

Supersedes mlbench/mlbench-core#82. We can now also use PyTorch 1.7.0

  • CIFAR10, ResNet20, All Reduce, 1 to 16 workers
  • CIFAR10, ResNet20, DDP, 1 to 16 workers
  • Wikitext2, LSTM, All Reduce, 1 to 16 (32 ?) workers
  • Wikitext2, LSTM, DDP, 1 to 16 (32 ?) workers
  • WMT16, LSTM, All Reduce, 1 to 32 workers
  • WMT16, LSTM, DDP, 1 to 32 workers
  • WMT17, Transformer, All Reduce, 1 to 32 workers
  • WMT17, Transformer, DDP, 1 to 32 workers
@ehoelzl
Copy link
Contributor Author

ehoelzl commented Dec 4, 2020

@martinjaggi 's comment
"
there are many use-cases which need to be run

  • the official task results (light and full goals need to run)
  • scaling as part of official results
  • comparing different hardware such as gcloud GPUs K80,P100,P4,T4,V100 etc
  • GPU vs CPU (though official results will be GPU only)
  • comparing backends (probably no scaling the # workers needed)
  • comparing against pytorch DDP or powerSGD, or adam (again this is different than the official results so has different
  • regression testing compared to old versions of our code, and changes in pytorch, TF say
    depending on what's needed for the scientific report (or blog post), that's what we'll run. ideally the code would be easy to adapt
    "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant