Skip to content

Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)

License

Notifications You must be signed in to change notification settings

yeqingli/ml-testing-accelerators

 
 

Repository files navigation

ML Testing Accelerators

A set of tools and examples to run machine learning tests on ML hardware accelerators (TPUs or GPUs) using Google Cloud Platform.

This is not an officially supported Google product.

Getting Started (full-featured standalone mode)

In this mode, your tests and/or models run on an automated schedule in GKE. Results are collected by the "Metrics Handler" and written to BigQuery.

This route is recommended if you have many tests that run for a long time and produce many metrics that you want to monitor for regressions.

  1. Install all of our development prerequisites.
  2. Follow instructions in the deployments directory to set up a Kubernetes Cluster.
  3. Follow instructions in the images directory to set up the Docker image that your tests will run.
  4. Deploy the metrics handler to Google Cloud Functions.
  5. See templates directory for a JSonnet template library to generate test config files.
  6. (Optional) Set up a dashboard to view test results. See dashboard directory for instructions.

Getting Started (lighter-weight Continuous Integration mode)

In this mode, your tests run on GKE but are tied to a CI platform like Github Actions or CircleCI. Tests can run as presubmits for pending PRs, as postsubmit checks on submitted PRs, or on a timed schedule.

This route is recommended if you want some tie-in with Github and your tests are relatively short-running.

  1. Install all of our development prerequisites.
  2. Follow instructions in the deployments directory to set up a Kubernetes Cluster.
  3. See the ci_pytorch directory for the last few setup steps.

Are you interested in using ML Testing Accelerators? E-mail ml-testing-accelerators-users@googlegroups.com and tell us about your use-case. We're happy to help you get started.

About

Testing framework for Deep Learning models (Tensorflow and PyTorch) on Google Cloud hardware accelerators (TPU and GPU)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jsonnet 68.4%
  • Python 26.5%
  • Shell 2.2%
  • Dockerfile 1.2%
  • Other 1.7%