Feature request TPU v4 support #247

rivershah · 2022-10-20T09:14:20Z

With tpu v4, google has really cleaned up the user experience around tpu vms. Does the google-cls-v2 provider allow provisioning of tpu v4 machines? If so, is there any example that can please be shown that illustrates provisioning, and loading up any drivers to make the tpu v4 accelerator types visible to jobs submitted via dsub.

The text was updated successfully, but these errors were encountered:

wnojopra · 2022-10-21T00:42:12Z

Thanks @rivershah ! This is something I'll be looking into more but looking at some of the TPU documentation, it sounds like something we should be able to support - but possibly there might be some Life Sciences API changes needed.

Would you mind clarifying what you mean by how google "cleaned up" the experience? I'd be interested to hear about your experience. Is there any specific documentation you follow for your work with TPUs?

rivershah · 2022-10-21T08:44:53Z

@wnojopra Please take a look here: https://www.youtube.com/watch?v=W7A-9MYvPwI&t=301s

Now tpus follow the same provisioning model as gpus. Root access to host vm with the accelerators on the host. I am not sure how relevant this provisioning model refactor is as far as using the Life Sciences AP, but seems like a unification with existing gpu provisioning model which works very nicely with dsub

wnojopra · 2022-10-24T17:38:16Z

Thanks for sharing. Wanted to highlight an important bit from the video for others to read:

"""
In the past, for using TPUs on Cloud, a network attached architecture was used. The user would connect to a VM and then interact with the TPUs through GRPC calls. This was difficult to debug and sometimes introduced delays in the experience. With all new TPU VM Architecture, you have root access to every TPU VM you create. So you can install and run any software you wish in a tight loop with your TPU accelerators. You can use local storage, execute custom code in your input pipelines, and more easily integrate Cloud TPUs into your research and production workflows.
"""

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request TPU v4 support #247

Feature request TPU v4 support #247

rivershah commented Oct 20, 2022

wnojopra commented Oct 21, 2022

rivershah commented Oct 21, 2022

wnojopra commented Oct 24, 2022 •

edited

Loading

Feature request TPU v4 support #247

Feature request TPU v4 support #247

Comments

rivershah commented Oct 20, 2022

wnojopra commented Oct 21, 2022

rivershah commented Oct 21, 2022

wnojopra commented Oct 24, 2022 • edited Loading

wnojopra commented Oct 24, 2022 •

edited

Loading